From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <3chas3@gmail.com> Received: from mail-io0-f175.google.com (mail-io0-f175.google.com [209.85.223.175]) by dpdk.org (Postfix) with ESMTP id A93E43DC; Tue, 21 Aug 2018 16:58:23 +0200 (CEST) Received: by mail-io0-f175.google.com with SMTP id n18-v6so15624528ioa.9; Tue, 21 Aug 2018 07:58:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Y0BB9ZjgsMMfL8f4pN/rnmupol5Kyo5aGgjP7TrjSm0=; b=P8wy6McRYPrp6eb65jjWn2h78e0h5B7X7vpABj9zit0phwe+zTNzIOpjcb5BwHPN+u gRorQV+m9aluTjnCLD4sMJYou+8QdekYVqZ0BWTIdvbLKz84bdoH99iEVc7LH/YYzrpt zQehnru80SYKmNYsyxus/Oy/tOJVSGmSbaOsBIkfubLWLUnFdR0az36jnSf9LDDNar0I txkK7agV07gqpxzvSBaNr5IHxgGTAaBTYozPAa0AEFSJ3wNmEUnli3SeQ+/MKiOYgEQG +X19gYGo+YqKmWVXvbnJlPf5wyEpYBgHVyAiFnEtQPp8cVT7FXoWYY7mfOBIRZFGlNap D8sA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Y0BB9ZjgsMMfL8f4pN/rnmupol5Kyo5aGgjP7TrjSm0=; b=GdBhjw/OLNbuU3WFUVrYvym1bLdWwkufsNoiuwBVKpk5y4gDyJk1e49hTpITpF8sQL AzyjvXmGr/ar14JCseoSiMfNLSedh+zf5JUkoy3KFIP4Q2BcTN0/Zy3IffcW8iyimuOr cgPgt+G9aAtJFUdEfCnsuz+/ViEbnSR+ph6Y9jDjtUcI2LEQKhtCwcAPmNmK4I8L3Czs l2eEPh6fZ3yx4ibSEo43H/8GzVC6xHipgzDj1sFjM4UwtgV7vTgV/bcbaGLnQmkWf0xD bYHHBUhg4/k4uZ8A2qfqXcRfB6dRWKBKfnFyYHIRaPeTgl38b/IVa6cZQb7cV/My56tf 5Wfg== X-Gm-Message-State: APzg51AsEMJVaYmUIgcnrs2rLz6WqhJtpjh8eg4SS7B0QN2wsGI6o1XO V6PLu9kiyIWcVTC6KyI4MJtVpLun0beOaqFUbIs= X-Google-Smtp-Source: ANB0VdZxpgSLqSOJj6e4nbfDQexMJ3LOrbctDEU7wUuBzm3Z/yCmeoSgG1Tc2xQnPFmtOT4e/eAhm4fFvsk7Ec/c3ME= X-Received: by 2002:a5e:da49:: with SMTP id o9-v6mr5897579iop.215.1534863502914; Tue, 21 Aug 2018 07:58:22 -0700 (PDT) MIME-Version: 1.0 References: <20180816125202.15980-1-bluca@debian.org> <20180816133208.26566-1-bluca@debian.org> In-Reply-To: From: Chas Williams <3chas3@gmail.com> Date: Tue, 21 Aug 2018 10:58:11 -0400 Message-ID: To: Matan Azrad Cc: bluca@debian.org, dev@dpdk.org, Declan Doherty , Chas Williams , stable@dpdk.org, ekinzie@brocade.com Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [PATCH v4] net/bonding: per-slave intermediate rx ring X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Aug 2018 14:58:24 -0000 On Tue, Aug 21, 2018 at 6:56 AM Matan Azrad wrote: > Hi > > From: Chas Williams > > This will need to be implemented for some of the other RX burst methods > at > > some point for other modes to see this performance improvement (with the > > exception of active-backup). > > Yes, I think it should be done at least to > bond_ethdev_rx_burst_8023ad_fast_queue (should be easy) for now. > There is some duplicated code between the various RX paths. I would like to eliminate that as much as possible, so I was going to give that some thought first. > > > On Thu, Aug 16, 2018 at 9:32 AM Luca Boccassi wrote: > > > > > During bond 802.3ad receive, a burst of packets is fetched from each > > > slave into a local array and appended to per-slave ring buffer. > > > Packets are taken from the head of the ring buffer and returned to the > > > caller. The number of mbufs provided to each slave is sufficient to > > > meet the requirements of the ixgbe vector receive. > > Luca, > > Can you explain these requirements of ixgbe? > The ixgbe (and some other Intel PMDs) have vectorized RX routines that are more efficient (if not faster) taking advantage of some advanced CPU instructions. I think you need to be receiving at least 32 packets or more. > Did you check for other vendor PMDs? It may hurt performance there.. > I don't know, but I suspect probably not. For the most part you are typically reading almost up to the vector requirement. But if one slave has just a single packet, then you can't vectorize on the next slave. Empirically, this patch has been around of years in one form or another. No complaints other than "bonding is slow" which is why it was written. > > > > > > This small change improves performances of the bonding driver > > > considerably. Vyatta has been using it for years in production. > > > > > > Cc: stable@dpdk.org > > > > > > Signed-off-by: Eric Kinzie > > > Signed-off-by: Luca Boccassi > > > > > > > Acked-by: Chas Williams > > > > > > > > > --- > > > v2 and v3: fix checkpatch warnings > > > v4: add Eric's original signed-off-by from the Vyatta internal repo > > > > > > drivers/net/bonding/rte_eth_bond_api.c | 13 +++ > > > drivers/net/bonding/rte_eth_bond_pmd.c | 98 +++++++++++++++++-- > > --- > > > drivers/net/bonding/rte_eth_bond_private.h | 4 + > > > 3 files changed, 95 insertions(+), 20 deletions(-) > > > > > > diff --git a/drivers/net/bonding/rte_eth_bond_api.c > > > b/drivers/net/bonding/rte_eth_bond_api.c > > > index 8bc04cfd11..3d22203e91 100644 > > > --- a/drivers/net/bonding/rte_eth_bond_api.c > > > +++ b/drivers/net/bonding/rte_eth_bond_api.c > > > @@ -524,6 +524,10 @@ __eth_bond_slave_remove_lock_free(uint16_t > > > bonded_port_id, > > > > > > sizeof(*(rte_eth_devices[bonded_port_id].data->mac_addrs))); > > > } > > > if (internals->slave_count == 0) { > > > + /* Remove any remaining packets in the receive ring */ > > > + struct rte_mbuf *bufs[PMD_BOND_RECV_PKTS_PER_SLAVE]; > > > + unsigned int j, count; > > > + > > > internals->rx_offload_capa = 0; > > > internals->tx_offload_capa = 0; > > > internals->rx_queue_offload_capa = 0; @@ -532,6 > > > +536,15 @@ __eth_bond_slave_remove_lock_free(uint16_t > > > bonded_port_id, > > > internals->reta_size = 0; > > > internals->candidate_max_rx_pktlen = 0; > > > internals->max_rx_pktlen = 0; > > > + > > > + do { > > > + count = > rte_ring_dequeue_burst(internals->rx_ring, > > > + (void **)bufs, > > > + PMD_BOND_RECV_PKTS_PER_SLAVE, > > > + NULL); > > > + for (j = 0; j < count; j++) > > > + rte_pktmbuf_free(bufs[j]); > > > + } while (count > 0); > > > } > > > return 0; > > > } > > > diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c > > > b/drivers/net/bonding/rte_eth_bond_pmd.c > > > index 58f7377c60..ae756c4e3a 100644 > > > --- a/drivers/net/bonding/rte_eth_bond_pmd.c > > > +++ b/drivers/net/bonding/rte_eth_bond_pmd.c > > > @@ -18,6 +18,8 @@ > > > #include > > > #include > > > #include > > > +#include > > > +#include > > > > > > #include "rte_eth_bond.h" > > > #include "rte_eth_bond_private.h" > > > @@ -402,10 +404,15 @@ bond_ethdev_rx_burst_8023ad(void *queue, > > struct > > > rte_mbuf **bufs, > > > struct bond_dev_private *internals = bd_rx_q->dev_private; > > > struct ether_addr bond_mac; > > > > > > + unsigned int rx_ring_avail = > > > rte_ring_free_count(internals->rx_ring); > > > + struct rte_mbuf > > *mbuf_bounce[PMD_BOND_RECV_PKTS_PER_SLAVE]; > > > + > > > struct ether_hdr *hdr; > > > > > > const uint16_t ether_type_slow_be = > > > rte_be_to_cpu_16(ETHER_TYPE_SLOW); > > > uint16_t num_rx_total = 0; /* Total number of received > > > packets */ > > > + uint16_t num_rx_slave; > > > + uint16_t num_enq_slave; > > > uint16_t slaves[RTE_MAX_ETHPORTS]; > > > uint16_t slave_count, idx; > > > > > > @@ -414,6 +421,9 @@ bond_ethdev_rx_burst_8023ad(void *queue, > > struct > > > rte_mbuf **bufs, > > > uint8_t i, j, k; > > > uint8_t subtype; > > > > > > + if (rx_ring_avail < PMD_BOND_RECV_PKTS_PER_SLAVE) > > > + goto dequeue; > > > + > > > rte_eth_macaddr_get(internals->port_id, &bond_mac); > > > /* Copy slave list to protect against slave up/down changes > > > during tx > > > * bursting */ > > > @@ -426,62 +436,96 @@ bond_ethdev_rx_burst_8023ad(void *queue, > > struct > > > rte_mbuf **bufs, > > > internals->active_slave = 0; > > > idx = 0; > > > } > > > - for (i = 0; i < slave_count && num_rx_total < nb_pkts; i++) { > > > - j = num_rx_total; > > > + for (i = 0; i < slave_count && num_rx_total < rx_ring_avail; > i++) { > > > + j = 0; > > > collecting = > ACTOR_STATE(&mode_8023ad_ports[slaves[idx]], > > > COLLECTING); > > > > > > /* Read packets from this slave */ > > > - num_rx_total += rte_eth_rx_burst(slaves[idx], > > > bd_rx_q->queue_id, > > > - &bufs[num_rx_total], nb_pkts - > > > num_rx_total); > > > + if (unlikely(rx_ring_avail - num_rx_total < > > > + PMD_BOND_RECV_PKTS_PER_SLAVE)) > > > + continue; > > > + num_rx_slave = rte_eth_rx_burst(slaves[idx], > > > bd_rx_q->queue_id, > > > + mbuf_bounce, > > > + PMD_BOND_RECV_PKTS_PER_SLAVE); > > > > > > - for (k = j; k < 2 && k < num_rx_total; k++) > > > - rte_prefetch0(rte_pktmbuf_mtod(bufs[k], void > *)); > > > + for (k = j; k < 2 && k < num_rx_slave; k++) > > > + rte_prefetch0(rte_pktmbuf_mtod(mbuf_bounce[k], > > > void *)); > > > > > > /* Handle slow protocol packets. */ > > > - while (j < num_rx_total) { > > > + while (j < num_rx_slave) { > > > > > > /* If packet is not pure L2 and is known, skip > > > it */ > > > - if ((bufs[j]->packet_type & > ~RTE_PTYPE_L2_ETHER) > > > != 0) { > > > + if ((mbuf_bounce[j]->packet_type & > > > ~RTE_PTYPE_L2_ETHER) != 0) { > > > j++; > > > continue; > > > } > > > > > > - if (j + 3 < num_rx_total) > > > - rte_prefetch0(rte_pktmbuf_mtod(bufs[j + > > > 3], void *)); > > > + if (j + 3 < num_rx_slave) > > > + > > > rte_prefetch0(rte_pktmbuf_mtod(mbuf_bounce[j + 3], > > > + void > > > + *)); > > > > > > - hdr = rte_pktmbuf_mtod(bufs[j], struct > ether_hdr > > > *); > > > + hdr = rte_pktmbuf_mtod(mbuf_bounce[j], > > > + struct ether_hdr *); > > > subtype = ((struct slow_protocol_frame > > > *)hdr)->slow_protocol.subtype; > > > > > > /* Remove packet from array if it is slow > > > packet or slave is not > > > * in collecting state or bonding interface is > > > not in promiscuous > > > * mode and packet address does not match. */ > > > - if (unlikely(is_lacp_packets(hdr->ether_type, > > > subtype, bufs[j]) || > > > + if (unlikely(is_lacp_packets(hdr->ether_type, > > > + subtype, > > > mbuf_bounce[j]) || > > > !collecting || (!promisc && > > > > > > !is_multicast_ether_addr(&hdr->d_addr) && > > > !is_same_ether_addr(&bond_mac, > > > &hdr->d_addr)))) { > > > > > > if (hdr->ether_type == > > > ether_type_slow_be) { > > > > bond_mode_8023ad_handle_slow_pkt( > > > - internals, slaves[idx], > > > bufs[j]); > > > + internals, slaves[idx], > > > + mbuf_bounce[j]); > > > } else > > > - rte_pktmbuf_free(bufs[j]); > > > + > > > + rte_pktmbuf_free(mbuf_bounce[j]); > > > > > > /* Packet is managed by mode 4 or > > > dropped, shift the array */ > > > - num_rx_total--; > > > - if (j < num_rx_total) { > > > - memmove(&bufs[j], &bufs[j + 1], > > > sizeof(bufs[0]) * > > > - (num_rx_total - j)); > > > + num_rx_slave--; > > > + if (j < num_rx_slave) { > > > + memmove(&mbuf_bounce[j], > > > + &mbuf_bounce[j + 1], > > > + sizeof(mbuf_bounce[0]) > * > > > + (num_rx_slave - j)); > > > } > > > - } else > > > + } else { > > > j++; > > > + } > > > } > > > + > > > + if (num_rx_slave > 0) { > > > + if (mbuf_bounce[0] == NULL) > > > + RTE_LOG(ERR, PMD, "%s: Enqueue a > NULL??\n", > > > + __func__); > > > + > > > + num_enq_slave = > > > rte_ring_enqueue_burst(internals->rx_ring, > > > + (void > > > **)mbuf_bounce, > > > + > > > num_rx_slave, > > > + NULL); > > > + > > > + if (num_enq_slave < num_rx_slave) { > > > + RTE_LOG(ERR, PMD, > > > + "%s: failed to enqueue %u > packets", > > > + __func__, > > > + (num_rx_slave - > num_enq_slave)); > > > + for (j = num_enq_slave; j < > > > + num_rx_slave; > > > j++) > > > + > rte_pktmbuf_free(mbuf_bounce[j]); > > > + } > > > + num_rx_total += num_enq_slave; > > > + } > > > + > > > if (unlikely(++idx == slave_count)) > > > idx = 0; > > > } > > > > > > internals->active_slave = idx; > > > - return num_rx_total; > > > +dequeue: > > > + return rte_ring_dequeue_burst(internals->rx_ring, (void > **)bufs, > > > + nb_pkts, NULL); > > > } > > > > > > #if defined(RTE_LIBRTE_BOND_DEBUG_ALB) || > > > defined(RTE_LIBRTE_BOND_DEBUG_ALB_L1) > > > @@ -3065,6 +3109,7 @@ bond_alloc(struct rte_vdev_device *dev, uint8_t > > mode) > > > struct bond_dev_private *internals = NULL; > > > struct rte_eth_dev *eth_dev = NULL; > > > uint32_t vlan_filter_bmp_size; > > > + char mem_name[RTE_ETH_NAME_MAX_LEN]; > > > > > > /* now do all data allocation - for eth_dev structure, dummy > > > pci driver > > > * and internal (private) data @@ -3129,6 +3174,19 @@ > > > bond_alloc(struct rte_vdev_device *dev, uint8_t > > > mode) > > > TAILQ_INIT(&internals->flow_list); > > > internals->flow_isolated_valid = 0; > > > > > > + snprintf(mem_name, RTE_DIM(mem_name), "bond_%u_rx", > > > internals->port_id); > > > + internals->rx_ring = rte_ring_lookup(mem_name); > > > + if (internals->rx_ring == NULL) { > > > + internals->rx_ring = rte_ring_create(mem_name, > > > + > > > rte_align32pow2(PMD_BOND_RECV_RING_PKTS * > > > + > rte_lcore_count()), > > > + socket_id, 0); > > > + if (internals->rx_ring == NULL) > > > + rte_panic("%s: Failed to create rx ring '%s': > > > %s\n", name, > > > + mem_name, rte_strerror(rte_errno)); > > > + } > > > + > > > + > > > /* Set mode 4 default configuration */ > > > bond_mode_8023ad_setup(eth_dev, NULL); > > > if (bond_ethdev_mode_set(eth_dev, mode)) { diff --git > > > a/drivers/net/bonding/rte_eth_bond_private.h > > > b/drivers/net/bonding/rte_eth_bond_private.h > > > index 43e0e448df..80261c6b14 100644 > > > --- a/drivers/net/bonding/rte_eth_bond_private.h > > > +++ b/drivers/net/bonding/rte_eth_bond_private.h > > > @@ -26,6 +26,8 @@ > > > #define PMD_BOND_LSC_POLL_PERIOD_KVARG > > ("lsc_poll_period_ms") > > > #define PMD_BOND_LINK_UP_PROP_DELAY_KVARG ("up_delay") > > > #define PMD_BOND_LINK_DOWN_PROP_DELAY_KVARG > > ("down_delay") > > > +#define PMD_BOND_RECV_RING_PKTS 512 > > > +#define PMD_BOND_RECV_PKTS_PER_SLAVE 32 > > > > > > #define PMD_BOND_XMIT_POLICY_LAYER2_KVARG ("l2") > > > #define PMD_BOND_XMIT_POLICY_LAYER23_KVARG ("l23") > > > @@ -175,6 +177,8 @@ struct bond_dev_private { > > > > > > void *vlan_filter_bmpmem; /* enabled vlan filter > > > bitmap */ > > > struct rte_bitmap *vlan_filter_bmp; > > > + > > > + struct rte_ring *rx_ring; > > > }; > > > > > > extern const struct eth_dev_ops default_dev_ops; > > > -- > > > 2.18.0 > > > > > > >