DPDK patches and discussions
 help / color / mirror / Atom feed
From: Olivier MATZ <olivier.matz@6wind.com>
To: "Sanford, Robert" <rsanford@akamai.com>, "dev@dpdk.org" <dev@dpdk.org>
Cc: "declan.doherty@intel.com" <declan.doherty@intel.com>,
	"pablo.de.lara.guarch@intel.com" <pablo.de.lara.guarch@intel.com>
Subject: Re: [dpdk-dev] [PATCH 3/4] net/bonding: another fix to LACP mempool size
Date: Wed, 24 Aug 2016 18:14:07 +0200	[thread overview]
Message-ID: <57BDC7CF.9060502@6wind.com> (raw)
In-Reply-To: <08CB1F99-B657-44F3-A7DF-7ED6713EDB5C@akamai.com>

Hi Robert,

On 08/23/2016 10:01 PM, Sanford, Robert wrote:
> Hi Olivier,
>
> On 8/23/16, 11:09 AM, "Olivier MATZ" <olivier.matz@6wind.com> wrote:
>
>      Hi Robert,
>
>      On 08/01/2016 10:42 PM, Robert Sanford wrote:
>      > The following log message may appear after a slave is idle (or nearly
>      > idle) for a few minutes: "PMD: Failed to allocate LACP packet from
>      > pool".
>      >
>      > Problem: All mbufs from a slave's private pool (used exclusively for
>      > transmitting LACPDUs) have been allocated and are still sitting in
>      > the device's tx descriptor ring and other cores' mempool caches.
>      >
>      > Solution: Ensure that each slaves' tx (LACPDU) mempool owns more than
>      > n-tx-queues * (n-tx-descriptors + per-core-mempool-flush-threshold)
>      > mbufs.
>      >
>      > Note that the LACP tx machine function is the only code that allocates
>      > from a slave's private pool. It runs in the context of the interrupt
>      > thread, and thus it has no mempool cache of its own.
>      >
>      > Signed-off-by: Robert Sanford <rsanford@akamai.com>
>      > ---
>      >   drivers/net/bonding/rte_eth_bond_8023ad.c |   10 +++++++---
>      >   1 files changed, 7 insertions(+), 3 deletions(-)
>      >
>      > diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c b/drivers/net/bonding/rte_eth_bond_8023ad.c
>      > index 2f7ae70..1207896 100644
>      > --- a/drivers/net/bonding/rte_eth_bond_8023ad.c
>      > +++ b/drivers/net/bonding/rte_eth_bond_8023ad.c
>      > @@ -854,6 +854,8 @@ bond_mode_8023ad_activate_slave(struct rte_eth_dev *bond_dev, uint8_t slave_id)
>      >   	char mem_name[RTE_ETH_NAME_MAX_LEN];
>      >   	int socket_id;
>      >   	unsigned element_size;
>      > +	unsigned cache_size;
>      > +	unsigned cache_flushthresh;
>      >   	uint32_t total_tx_desc;
>      >   	struct bond_tx_queue *bd_tx_q;
>      >   	uint16_t q_id;
>      > @@ -890,19 +892,21 @@ bond_mode_8023ad_activate_slave(struct rte_eth_dev *bond_dev, uint8_t slave_id)
>      >
>      >   	element_size = sizeof(struct slow_protocol_frame) + sizeof(struct rte_mbuf)
>      >   				+ RTE_PKTMBUF_HEADROOM;
>      > +	cache_size = RTE_MEMPOOL_CACHE_MAX_SIZE >= 32 ?
>      > +		32 : RTE_MEMPOOL_CACHE_MAX_SIZE;
>      > +	cache_flushthresh = RTE_MEMPOOL_CALC_CACHE_FLUSHTHRESH(cache_size);
>      >
>      >   	/* The size of the mempool should be at least:
>      >   	 * the sum of the TX descriptors + BOND_MODE_8023AX_SLAVE_TX_PKTS */
>      >   	total_tx_desc = BOND_MODE_8023AX_SLAVE_TX_PKTS;
>      >   	for (q_id = 0; q_id < bond_dev->data->nb_tx_queues; q_id++) {
>      >   		bd_tx_q = (struct bond_tx_queue*)bond_dev->data->tx_queues[q_id];
>      > -		total_tx_desc += bd_tx_q->nb_tx_desc;
>      > +		total_tx_desc += bd_tx_q->nb_tx_desc + cache_flushthresh;
>      >   	}
>      >
>      >   	snprintf(mem_name, RTE_DIM(mem_name), "slave_port%u_pool", slave_id);
>      >   	port->mbuf_pool = rte_mempool_create(mem_name,
>      > -		total_tx_desc, element_size,
>      > -		RTE_MEMPOOL_CACHE_MAX_SIZE >= 32 ? 32 : RTE_MEMPOOL_CACHE_MAX_SIZE,
>      > +		total_tx_desc, element_size, cache_size,
>      >   		sizeof(struct rte_pktmbuf_pool_private), rte_pktmbuf_pool_init,
>      >   		NULL, rte_pktmbuf_init, NULL, socket_id, MEMPOOL_F_NO_SPREAD);
>      >
>      >
>
>      I'm not very familiar with bonding code, so maybe your patch is correct.
>      I think the size of the mempool should be:
>
>         BOND_MODE_8023AX_SLAVE_TX_PKTS +
>           n_cores * RTE_MEMPOOL_CALC_CACHE_FLUSHTHRESH(cache_size)
>
>      With n_cores = number of cores that can dequeue from the mempool.
>
>      The safest thing to do would be to have n_cores = RTE_MAX_LCORE. I don't
>      know if bond_dev->data->nb_tx_queue corresponds to this definition, if
>      yes you can ignore my comment ;)
>
>
>      Regards,
>      Olivier
>
> --
>
> Only one, non-EAL thread dequeues from a per-slave, private (LACPDU) mempool. The thread calls mbuf-alloc in the context of an rte_eal_alarm_set( ) callback function, and enqueues the mbuf to a multi-consumer ring.
>
> The consumer(s) of the mbufs can be any core(s) that invoke the bond PMD’s tx-burst function. As we know, mbufs can sit around on TX descriptor rings long after we transmit them, often until we need to reuse the descriptors for subsequent TX packets. When the TX cores finally free some of the mbufs, some of the mbufs get left in a pool’s per-core cache. So, the maximum number of mbufs that may be lying around doing nothing is: the “done” mbufs in the TX desc ring and in the per-core cache, times the number of TX cores, or approximately nb_tx_queues * (nb_tx_desc + CACHE_FLUSHTHRESH(cache_size)). (I probably should also update the comment above the for-loop that calculates total_tx_desc.)
>
> When we include normal TX mbufs and private LACPDU mbufs displacing each other in the TX desc ring, there are many (if not infinite) possibilities. I wrote a program to simulate all this interaction, trying different values for cache_size, nb_tx_desc, etc., because it took a long time to run out of LACPDU mbufs (or it never happened) just using long-interval pings.
>
> Anyway, I hope that this is a nice, long way of saying that I will ignore your comment. ;-)

OK, thanks for the detailed explanation.

> Another idea, related to mempools (but not this LACP patch) ...
> Back when I first investigated the LACP pool problem, I got an idea related to the mempool per-core caches. It seems like a waste that in some usage patterns (like above), a core may “put” objects to a pool, but rarely/never “get” objects from the pool, the result being that it rarely/never reuses objects stuck in the per-core cache. We could recognize this kind of behavior by adding a small counter to the rte_mempool_cache struct. During a put, we increment the counter, and during a get, we set the counter to zero. If, during a put that is going to flush some of the objects to the global part of the pool, the counter is at or above some threshold (8, 16, 32, or whatever), then we flush ALL of the objects, since this core hasn’t been using them. What do you think?

While I understand the need, it looks a bit complicated to me.

When I read your first patch, I was thinking about automatically 
increase the size of the pool in mempool_create(), according to the 
number of cores and size of the cache. But it would have too many drawbacks:
  - waste of memory in many cases, especially for large objects
  - would not take the external caches (new 16.07 feature) in account
  - behavior change

  reply	other threads:[~2016-08-24 16:14 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-01 20:42 [dpdk-dev] [PATCH 0/4] net/bonding: bonding and LACP fixes Robert Sanford
2016-08-01 20:42 ` [dpdk-dev] [PATCH 1/4] testpmd: fix LACP ports to work with idle links Robert Sanford
2017-06-22  1:25   ` Wu, Jingjing
2017-10-31  1:07     ` Ferruh Yigit
2017-11-01 20:06       ` Ferruh Yigit
2016-08-01 20:42 ` [dpdk-dev] [PATCH 2/4] mempool: make cache flush threshold macro public Robert Sanford
2016-08-23 15:09   ` Olivier MATZ
2016-08-23 16:07     ` Sanford, Robert
2016-08-24 16:15       ` Olivier MATZ
2016-08-01 20:42 ` [dpdk-dev] [PATCH 3/4] net/bonding: another fix to LACP mempool size Robert Sanford
2016-08-23 15:09   ` Olivier MATZ
2016-08-23 20:01     ` Sanford, Robert
2016-08-24 16:14       ` Olivier MATZ [this message]
2016-11-07 16:02   ` Kulasek, TomaszX
2016-08-01 20:42 ` [dpdk-dev] [PATCH 4/4] net/bonding: fix configuration of LACP slaves Robert Sanford
2016-11-07 16:03   ` Kulasek, TomaszX
2017-02-08 17:14 ` [dpdk-dev] [PATCH 0/4] net/bonding: bonding and LACP fixes Thomas Monjalon
2017-03-09 13:19   ` Thomas Monjalon
2017-03-09 16:57     ` Declan Doherty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57BDC7CF.9060502@6wind.com \
    --to=olivier.matz@6wind.com \
    --cc=declan.doherty@intel.com \
    --cc=dev@dpdk.org \
    --cc=pablo.de.lara.guarch@intel.com \
    --cc=rsanford@akamai.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).