DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Min Hu (Connor)" <humin29@huawei.com>
To: "Sanford, Robert" <rsanford@akamai.com>,
	Robert Sanford <rsanford2@gmail.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: "chas3@att.com" <chas3@att.com>
Subject: Re: [PATCH 3/7] net/bonding: change mbuf pool and ring allocation
Date: Sat, 18 Dec 2021 11:44:47 +0800	[thread overview]
Message-ID: <ce2c4371-e719-5e1e-2e59-d23697d83fab@huawei.com> (raw)
In-Reply-To: <7CE0C72F-5CFD-4C75-8B03-5739A0339092@akamai.com>

Hi, Sanford,
	Thanks for your detailed description, some questions as follows.

在 2021/12/18 3:49, Sanford, Robert 写道:
> Hello Connor,
> 
> Thank you for the questions and comments. I will repeat the questions, followed by my answers.
> 
> Q: Could you be more detailed, why is mbuf pool caching not needed?
> 
> A: The short answer: under certain conditions, we can run out of
> buffers from that small, LACPDU-mempool. We actually saw this occur
> in production, on mostly-idle links.
> 
> For a long explanation, let's assume the following:
> 1. 1 tx-queue per bond and underlying ethdev ports.
> 2. 256 tx-descriptors (per ethdev port).
> 3. 257 mbufs in each port's LACPDU-pool, as computed by
> bond_mode_8023ad_activate_slave(), and cache-size 32.
> 4. The "app" xmits zero packets to this bond for a long time.
> 5. In EAL intr thread context, LACP tx_machine() allocates 1 mbuf
> (LACPDU) per second from the pool, and puts it into LACP tx-ring.
> 6. Every second, another thread, let's call it the tx-core, calls
> tx-burst (with zero packets to xmit), finds 1 mbuf on LACP tx-ring,
> and underlying ethdev PMD puts mbuf data into a tx-desc.
> 7. PMD tx-burst configured not to clean up used tx-descs until
> there are almost none free, e.g., less than pool's cache-size *
> CACHE_FLUSH_THRESH_MULTIPLIER (1.5).
> 8. When cleaning up tx-descs, we may leave up to 47 mbufs in the
> tx-core's LACPDU-pool cache (not accessible from intr thread).
> 
> When the number of used tx-descs (0..255) + number of mbufs in the
> cache (0..47) reaches 257, then allocation fails.
> 
> If I understand the LACP tx-burst code correctly, it would be
> worse if nb_tx_queues > 1, because (assuming multiple tx-cores)
> any queue/lcore could xmit an LACPDU. Thus, up to nb_tx_queues *
> 47 mbufs could be cached, and not accessible from tx_machine().
> 
> You would not see this problem if the app xmits other (non-LACP)
> mbufs on a regular basis, to expedite the clean-up of tx-descs
> including LACPDU mbufs (unless nb_tx_queues tx-core caches
> could hold all LACPDU mbufs).
> 
I think, we could not see this problem only because the mempool can
offer much more mbufs than cache size on no-LACP circumstance.

> If we make mempool's cache size 0, then allocation will not fail.
How about enlarge the size of mempool, i.e., up to 4096 ? I think
it can also avoid this bug.
> 
> A mempool cache for LACPDUs does not offer much additional speed:
> during alloc, the intr thread does not have default mempool caches
Why? as I know, all the core has its own default mempool caches ?
> (AFAIK); and the average time between frees is either 1 second (LACP
> short timeouts) or 10 seconds (long timeouts), i.e., infrequent.
> 
> --------
> 
> Q: Why reserve one additional slot in the rx and tx rings?
> 
> A: rte_ring_create() requires the ring size N, to be a power of 2,
> but it can only store N-1 items. Thus, if we want to store X items,
Hi, Robert, could you describe it for me?
I cannot understand why it
"only store N -1 items". I check the source code, It writes:
"The real usable ring size is *count-1* instead of *count* to
differentiate a free ring from an empty ring."
But I still can not get what it wrote.

> we need to ask for (at least) X+1. Original code fails when the real
> desired size is a power of 2, because in such a case, align32pow2
> does not round up.
> 
> For example, say we want a ring to hold 4:
> 
>      rte_ring_create(... rte_align32pow2(4) ...)
> 
> rte_align32pow2(4) returns 4, and we end up with a ring that only
> stores 3 items.
> 
>      rte_ring_create(... rte_align32pow2(4+1) ...)
> 
> rte_align32pow2(5) returns 8, and we end up with a ring that
> stores up to 7 items, more than we need, but acceptable.
To fix the bug, how about just setting the flags "RING_F_EXACT_SZ"

> 
> --------
> 
> Q: I found the comment for BOND_MODE_8023AX_SLAVE_RX_PKTS is
> wrong, could you fix it in this patch?
> 
> A: Yes, I will fix it in the next version of the patch.
Thanks.
> 
> --
> Regards,
> Robert Sanford
> 
> 
> On 12/16/21, 4:01 AM, "Min Hu (Connor)" <humin29@huawei.com> wrote:
> 
>      Hi, Robert,
> 
>      在 2021/12/16 2:19, Robert Sanford 写道:
>      > - Turn off mbuf pool caching to avoid mbufs lingering in pool caches.
>      >    At most, we transmit one LACPDU per second, per port.
>      Could you be more detailed, why does mbuf pool caching is not needed?
> 
>      > - Fix calculation of ring sizes, taking into account that a ring of
>      >    size N holds up to N-1 items.
>      Same to that, why should resvere another items ?
>      >
>      By the way, I found the comment for BOND_MODE_8023AX_SLAVE_RX_PKTS is
>      is wrong, could you fix it in this patch?
>      > Signed-off-by: Robert Sanford <rsanford@akamai.com>
>      > ---
>      >   drivers/net/bonding/rte_eth_bond_8023ad.c | 14 ++++++++------
>      >   1 file changed, 8 insertions(+), 6 deletions(-)
>      >
>      > diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c b/drivers/net/bonding/rte_eth_bond_8023ad.c
>      > index 43231bc..83d3938 100644
>      > --- a/drivers/net/bonding/rte_eth_bond_8023ad.c
>      > +++ b/drivers/net/bonding/rte_eth_bond_8023ad.c
>      > @@ -1101,9 +1101,7 @@ bond_mode_8023ad_activate_slave(struct rte_eth_dev *bond_dev,
>      >   	}
>      >
>      >   	snprintf(mem_name, RTE_DIM(mem_name), "slave_port%u_pool", slave_id);
>      > -	port->mbuf_pool = rte_pktmbuf_pool_create(mem_name, total_tx_desc,
>      > -		RTE_MEMPOOL_CACHE_MAX_SIZE >= 32 ?
>      > -			32 : RTE_MEMPOOL_CACHE_MAX_SIZE,
>      > +	port->mbuf_pool = rte_pktmbuf_pool_create(mem_name, total_tx_desc, 0,
>      >   		0, element_size, socket_id);
>      >
>      >   	/* Any memory allocation failure in initialization is critical because
>      > @@ -1113,19 +1111,23 @@ bond_mode_8023ad_activate_slave(struct rte_eth_dev *bond_dev,
>      >   			slave_id, mem_name, rte_strerror(rte_errno));
>      >   	}
>      >
>      > +	/* Add one extra because ring reserves one. */
>      >   	snprintf(mem_name, RTE_DIM(mem_name), "slave_%u_rx", slave_id);
>      >   	port->rx_ring = rte_ring_create(mem_name,
>      > -			rte_align32pow2(BOND_MODE_8023AX_SLAVE_RX_PKTS), socket_id, 0);
>      > +			rte_align32pow2(BOND_MODE_8023AX_SLAVE_RX_PKTS + 1),
>      > +			socket_id, 0);
>      >
>      >   	if (port->rx_ring == NULL) {
>      >   		rte_panic("Slave %u: Failed to create rx ring '%s': %s\n", slave_id,
>      >   			mem_name, rte_strerror(rte_errno));
>      >   	}
>      >
>      > -	/* TX ring is at least one pkt longer to make room for marker packet. */
>      > +	/* TX ring is at least one pkt longer to make room for marker packet.
>      > +	 * Add one extra because ring reserves one. */
>      >   	snprintf(mem_name, RTE_DIM(mem_name), "slave_%u_tx", slave_id);
>      >   	port->tx_ring = rte_ring_create(mem_name,
>      > -			rte_align32pow2(BOND_MODE_8023AX_SLAVE_TX_PKTS + 1), socket_id, 0);
>      > +			rte_align32pow2(BOND_MODE_8023AX_SLAVE_TX_PKTS + 2),
>      > +			socket_id, 0);
>      >
>      >   	if (port->tx_ring == NULL) {
>      >   		rte_panic("Slave %u: Failed to create tx ring '%s': %s\n", slave_id,
>      >
> 

  reply	other threads:[~2021-12-18 10:50 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-15 18:19 [PATCH 0/7] net/bonding: fixes and LACP short timeout Robert Sanford
2021-12-15 18:19 ` [PATCH 1/7] net/bonding: fix typos and whitespace Robert Sanford
2021-12-21 19:57   ` [PATCH v2 0/8] net/bonding: fixes and LACP short timeout Robert Sanford
2021-12-21 19:57     ` [PATCH v2 1/8] net/bonding: fix typos and whitespace Robert Sanford
2022-02-04 15:06       ` Ferruh Yigit
2021-12-21 19:57     ` [PATCH v2 2/8] net/bonding: fix bonded dev configuring slave dev Robert Sanford
2021-12-21 19:57     ` [PATCH v2 3/8] net/bonding: change mbuf pool and ring creation Robert Sanford
2021-12-21 19:57     ` [PATCH v2 4/8] net/bonding: support enabling LACP short timeout Robert Sanford
2022-02-04 14:46       ` Ferruh Yigit
2021-12-21 19:57     ` [PATCH v2 5/8] net/bonding: add bond_8023ad and bond_alb to doc Robert Sanford
2022-02-04 14:48       ` Ferruh Yigit
2021-12-21 19:57     ` [PATCH v2 6/8] remove self from timers maintainers Robert Sanford
2022-03-08 23:26       ` Thomas Monjalon
2021-12-21 19:57     ` [PATCH v2 7/8] net/ring: add promisc and all-MC stubs Robert Sanford
2022-02-04 14:36       ` Ferruh Yigit
2022-02-04 14:49         ` Bruce Richardson
2022-02-11 19:57           ` Ferruh Yigit
2021-12-21 19:57     ` [PATCH v2 8/8] net/bonding: add LACP short timeout tests Robert Sanford
2022-02-04 14:49       ` Ferruh Yigit
2021-12-22  3:27     ` [PATCH v2 0/8] net/bonding: fixes and LACP short timeout Min Hu (Connor)
2022-01-11 16:41     ` Kevin Traynor
2022-02-04 15:09     ` Ferruh Yigit
2021-12-15 18:19 ` [PATCH 2/7] net/bonding: fix bonded dev configuring slave dev Robert Sanford
2021-12-15 18:19 ` [PATCH 3/7] net/bonding: change mbuf pool and ring allocation Robert Sanford
2021-12-16  8:59   ` Min Hu (Connor)
2021-12-17 19:49     ` Sanford, Robert
2021-12-18  3:44       ` Min Hu (Connor) [this message]
2021-12-20 16:47         ` Sanford, Robert
2021-12-21  2:01           ` Min Hu (Connor)
2021-12-21 15:31             ` Sanford, Robert
2021-12-22  3:25               ` Min Hu (Connor)
2021-12-15 18:19 ` [PATCH 4/7] net/bonding: support enabling LACP short timeout Robert Sanford
2021-12-15 18:19 ` [PATCH 5/7] net/bonding: add LACP short timeout to tests Robert Sanford
2021-12-15 18:20 ` [PATCH 6/7] net/bonding: add bond_8023ad and bond_alb to doc Robert Sanford
2021-12-15 18:20 ` [PATCH 7/7] Remove self from Timers maintainers Robert Sanford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ce2c4371-e719-5e1e-2e59-d23697d83fab@huawei.com \
    --to=humin29@huawei.com \
    --cc=chas3@att.com \
    --cc=dev@dpdk.org \
    --cc=rsanford2@gmail.com \
    --cc=rsanford@akamai.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).