DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: <andrew.rybchenko@oktetlabs.ru>
Cc: <bruce.richardson@intel.com>, <jerinjacobk@gmail.com>,
	<dev@dpdk.org>, <olivier.matz@6wind.com>
Subject: RE: [PATCH v4] mempool: fix mempool cache flushing algorithm
Date: Tue, 4 Oct 2022 22:01:08 +0200	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D87390@smartserver.smartshare.dk> (raw)
In-Reply-To: <20220202103354.79832-1-mb@smartsharesystems.com>

> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> Sent: Wednesday, 2 February 2022 11.34
> 
> This patch fixes the rte_mempool_do_generic_put() caching algorithm,
> which was fundamentally wrong, causing multiple performance issues when
> flushing.
> 
> Although the bugs do have serious performance implications when
> flushing, the function did not fail when flushing (or otherwise).
> Backporting could be considered optional.
> 
> The algorithm was:
>  1. Add the objects to the cache
>  2. Anything greater than the cache size (if it crosses the cache flush
>     threshold) is flushed to the ring.
> 
> Please note that the description in the source code said that it kept
> "cache min value" objects after flushing, but the function actually
> kept
> the cache full after flushing, which the above description reflects.
> 
> Now, the algorithm is:
>  1. If the objects cannot be added to the cache without crossing the
>     flush threshold, flush the cache to the ring.
>  2. Add the objects to the cache.
> 
> This patch fixes these bugs:
> 
> 1. The cache was still full after flushing.
> In the opposite direction, i.e. when getting objects from the cache,
> the
> cache is refilled to full level when it crosses the low watermark
> (which
> happens to be zero).
> Similarly, the cache should be flushed to empty level when it crosses
> the high watermark (which happens to be 1.5 x the size of the cache).
> The existing flushing behaviour was suboptimal for real applications,
> because crossing the low or high watermark typically happens when the
> application is in a state where the number of put/get events are out of
> balance, e.g. when absorbing a burst of packets into a QoS queue
> (getting more mbufs from the mempool), or when a burst of packets is
> trickling out from the QoS queue (putting the mbufs back into the
> mempool).
> Now, the mempool cache is completely flushed when crossing the flush
> threshold, so only the newly put (hot) objects remain in the mempool
> cache afterwards.
> 
> This bug degraded performance caused by too frequent flushing.
> 
> Consider this application scenario:
> 
> Either, an lcore thread in the application is in a state of balance,
> where it uses the mempool cache within its flush/refill boundaries; in
> this situation, the flush method is less important, and this fix is
> irrelevant.
> 
> Or, an lcore thread in the application is out of balance (either
> permanently or temporarily), and mostly gets or puts objects from/to
> the
> mempool. If it mostly puts objects, not flushing all of the objects
> will
> cause more frequent flushing. This is the scenario addressed by this
> fix. E.g.:
> 
> Cache size=256, flushthresh=384 (1.5x size), initial len=256;
> application burst len=32.
> 
> If there are "size" objects in the cache after flushing, the cache is
> flushed at every 4th burst.
> 
> If the cache is flushed completely, the cache is only flushed at every
> 16th burst.
> 
> As you can see, this bug caused the cache to be flushed 4x too
> frequently in this example.
> 
> And when/if the application thread breaks its pattern of continuously
> putting objects, and suddenly starts to get objects instead, it will
> either get objects already in the cache, or the get() function will
> refill the cache.
> 
> The concept of not flushing the cache completely was probably based on
> an assumption that it is more likely for an application's lcore thread
> to get() after flushing than to put() after flushing.
> I strongly disagree with this assumption! If an application thread is
> continuously putting so much that it overflows the cache, it is much
> more likely to keep putting than it is to start getting. If in doubt,
> consider how CPU branch predictors work: When the application has done
> something many times consecutively, the branch predictor will expect
> the
> application to do the same again, rather than suddenly do something
> else.
> 
> Also, if you consider the description of the algorithm in the source
> code, and agree that "cache min value" cannot mean "cache size", the
> function did not behave as intended. This in itself is a bug.
> 
> 2. The flush threshold comparison was off by one.
> It must be "len > flushthresh", not "len >= flushthresh".
> Consider a flush multiplier of 1 instead of 1.5; the cache would be
> flushed already when reaching size objecs, not when exceeding size
> objects. In other words, the cache would not be able to hold "size"
> objects, which is clearly a bug.
> Now, flushing is triggered when the flush threshold is exceeded, not
> when reached.
> 
> This bug degraded performance due to premature flushing. In my example
> above, this bug caused flushing every 3rd burst instead of every 4th.
> 
> 3. The most recent (hot) objects were flushed, leaving the oldest
> (cold)
> objects in the mempool cache.
> This bug degraded performance, because flushing prevented immediate
> reuse of the (hot) objects already in the CPU cache.
> Now, the existing (cold) objects in the mempool cache are flushed
> before
> the new (hot) objects are added the to the mempool cache.
> 
> 4. With RTE_LIBRTE_MEMPOOL_DEBUG defined, the return value of
> rte_mempool_ops_enqueue_bulk() was not checked when flushing the cache.
> Now, it is checked in both locations where used; and obviously still
> only if RTE_LIBRTE_MEMPOOL_DEBUG is defined.
> 
> v2 changes:
> 
> - Not adding the new objects to the mempool cache before flushing it
> also allows the memory allocated for the mempool cache to be reduced
> from 3 x to 2 x RTE_MEMPOOL_CACHE_MAX_SIZE.
> However, such this change would break the ABI, so it was removed in v2.
> 
> - The mempool cache should be cache line aligned for the benefit of the
> copying method, which on some CPU architectures performs worse on data
> crossing a cache boundary.
> However, such this change would break the ABI, so it was removed in v2;
> and yet another alternative copying method replaced the rte_memcpy().
> 
> v3 changes:
> 
> - Actually remove my modifications of the rte_mempool_cache structure.
> 
> v4 changes:
> 
> - Updated patch title to reflect that the scope of the patch is only
> mempool cache flushing.
> 
> - Do not replace rte_memcpy() with alternative copying method. This was
> a pure optimization, not a fix.
> 
> - Elaborate even more on the bugs fixed by the modifications.
> 
> - Added 4th bullet item to the patch description, regarding
> rte_mempool_ops_enqueue_bulk() with RTE_LIBRTE_MEMPOOL_DEBUG.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
>  lib/mempool/rte_mempool.h | 34 ++++++++++++++++++++++------------
>  1 file changed, 22 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 1e7a3c1527..e7e09e48fc 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -1344,31 +1344,41 @@ rte_mempool_do_generic_put(struct rte_mempool
> *mp, void * const *obj_table,
>  	if (unlikely(cache == NULL || n > RTE_MEMPOOL_CACHE_MAX_SIZE))
>  		goto ring_enqueue;
> 
> -	cache_objs = &cache->objs[cache->len];
> +	/* If the request itself is too big for the cache */
> +	if (unlikely(n > cache->flushthresh))
> +		goto ring_enqueue;
> 
>  	/*
>  	 * The cache follows the following algorithm
> -	 *   1. Add the objects to the cache
> -	 *   2. Anything greater than the cache min value (if it crosses
> the
> -	 *   cache flush threshold) is flushed to the ring.
> +	 *   1. If the objects cannot be added to the cache without
> +	 *   crossing the flush threshold, flush the cache to the ring.
> +	 *   2. Add the objects to the cache.
>  	 */
> 
> -	/* Add elements back into the cache */
> -	rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n);
> +	if (cache->len + n <= cache->flushthresh) {
> +		cache_objs = &cache->objs[cache->len];
> 
> -	cache->len += n;
> +		cache->len += n;
> +	} else {
> +		cache_objs = &cache->objs[0];
> 
> -	if (cache->len >= cache->flushthresh) {
> -		rte_mempool_ops_enqueue_bulk(mp, &cache->objs[cache->size],
> -				cache->len - cache->size);
> -		cache->len = cache->size;
> +#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
> +		if (rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> >len) < 0)
> +			rte_panic("cannot put objects in mempool\n");
> +#else
> +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
> +#endif
> +		cache->len = n;
>  	}
> 
> +	/* Add the objects to the cache. */
> +	rte_memcpy(cache_objs, obj_table, sizeof(void *) * n);
> +
>  	return;
> 
>  ring_enqueue:
> 
> -	/* push remaining objects in ring */
> +	/* Put the objects into the ring */
>  #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
>  	if (rte_mempool_ops_enqueue_bulk(mp, obj_table, n) < 0)
>  		rte_panic("cannot put objects in mempool\n");
> --
> 2.17.1

Andrew, would you please also take a look at this patch and share your opinion.

I guess that the most controversial change in the patch is that it leaves the mempool cache nearly empty after flushing it.

Without the patch, the mempool cache is left full (at 100% size) after flushing. (Flushing is triggered by crossing the flush threshold, which is 50% above the cache size. This is not changed by the patch.)

As described with the patch, I consider this behavior incorrect: In periods where an application is sending more from its QoS queues that goes into the QoS queues, the mempool_put() function is called more often than the mempool_get() function, so there will naturally be consecutive cache flushing.

Many applications use QoS queues or similar traffic shapers, so mempool cache flushing is not as infrequent and exotic as some might think! (And flushing a burst of packets from the mempool cache to the underlying mempool is considered costly.)

Without the patch, consecutive cache flushing will be processed as many small flushes, because only the 50% objects above the cache size (the objects between the cache size and the cache threshold) are flushed each time.

With the patch, the flushes will be fewer and larger, because the full 150% cache size (every object in the cache up to the cache threshold) will be flushed each time.

PS: Bruce and I discussed this patch back in April, but didn't reach a conclusion. You might find some insights in that mail thread.


  parent reply	other threads:[~2022-10-04 20:01 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-26 15:34 [RFC] mempool: rte_mempool_do_generic_get optimizations Morten Brørup
2022-01-06 12:23 ` [PATCH] mempool: optimize incomplete cache handling Morten Brørup
2022-01-06 16:55   ` Jerin Jacob
2022-01-07  8:46     ` Morten Brørup
2022-01-10  7:26       ` Jerin Jacob
2022-01-10 10:55         ` Morten Brørup
2022-01-14 16:36 ` [PATCH] mempool: fix get objects from mempool with cache Morten Brørup
2022-01-17 17:35   ` Bruce Richardson
2022-01-18  8:25     ` Morten Brørup
2022-01-18  9:07       ` Bruce Richardson
2022-01-24 15:38   ` Olivier Matz
2022-01-24 16:11     ` Olivier Matz
2022-01-28 10:22     ` Morten Brørup
2022-01-17 11:52 ` [PATCH] mempool: optimize put objects to " Morten Brørup
2022-01-19 14:52 ` [PATCH v2] mempool: fix " Morten Brørup
2022-01-19 15:03 ` [PATCH v3] " Morten Brørup
2022-01-24 15:39   ` Olivier Matz
2022-01-28  9:37     ` Morten Brørup
2022-02-02  8:14 ` [PATCH v2] mempool: fix get objects from " Morten Brørup
2022-06-15 21:18   ` Morten Brørup
2022-09-29 10:52     ` Morten Brørup
2022-10-04 12:57   ` Andrew Rybchenko
2022-10-04 15:13     ` Morten Brørup
2022-10-04 15:58       ` Andrew Rybchenko
2022-10-04 18:09         ` Morten Brørup
2022-10-06 13:43       ` Aaron Conole
2022-10-04 16:03   ` Morten Brørup
2022-10-04 16:36   ` Morten Brørup
2022-10-04 16:39   ` Morten Brørup
2022-02-02 10:33 ` [PATCH v4] mempool: fix mempool cache flushing algorithm Morten Brørup
2022-04-07  9:04   ` Morten Brørup
2022-04-07  9:14     ` Bruce Richardson
2022-04-07  9:26       ` Morten Brørup
2022-04-07 10:32         ` Bruce Richardson
2022-04-07 10:43           ` Bruce Richardson
2022-04-07 11:36             ` Morten Brørup
2022-10-04 20:01   ` Morten Brørup [this message]
2022-10-09 11:11   ` [PATCH 1/2] mempool: check driver enqueue result in one place Andrew Rybchenko
2022-10-09 11:11     ` [PATCH 2/2] mempool: avoid usage of term ring on put Andrew Rybchenko
2022-10-09 13:08       ` Morten Brørup
2022-10-09 13:14         ` Andrew Rybchenko
2022-10-09 13:01     ` [PATCH 1/2] mempool: check driver enqueue result in one place Morten Brørup
2022-10-09 13:19   ` [PATCH v4] mempool: fix mempool cache flushing algorithm Andrew Rybchenko
2022-10-04 12:53 ` [PATCH v3] mempool: fix get objects from mempool with cache Andrew Rybchenko
2022-10-04 14:42   ` Morten Brørup
2022-10-07 10:44 ` [PATCH v4] " Andrew Rybchenko
2022-10-08 20:56   ` Thomas Monjalon
2022-10-11 20:30     ` Copy-pasted code should be updated Morten Brørup
2022-10-11 21:47       ` Honnappa Nagarahalli
2022-10-30  8:44         ` Morten Brørup
2022-10-30 22:50           ` Honnappa Nagarahalli
2022-10-14 14:01     ` [PATCH v4] mempool: fix get objects from mempool with cache Olivier Matz
2022-10-09 13:37 ` [PATCH v6 0/4] mempool: fix mempool cache flushing algorithm Andrew Rybchenko
2022-10-09 13:37   ` [PATCH v6 1/4] mempool: check driver enqueue result in one place Andrew Rybchenko
2022-10-09 13:37   ` [PATCH v6 2/4] mempool: avoid usage of term ring on put Andrew Rybchenko
2022-10-09 13:37   ` [PATCH v6 3/4] mempool: fix cache flushing algorithm Andrew Rybchenko
2022-10-09 14:31     ` Morten Brørup
2022-10-09 14:51       ` Andrew Rybchenko
2022-10-09 15:08         ` Morten Brørup
2022-10-14 14:01           ` Olivier Matz
2022-10-14 15:57             ` Morten Brørup
2022-10-14 19:50               ` Olivier Matz
2022-10-15  6:57                 ` Morten Brørup
2022-10-18 16:32                   ` Jerin Jacob
2022-10-09 13:37   ` [PATCH v6 4/4] mempool: flush cache completely on overflow Andrew Rybchenko
2022-10-09 14:44     ` Morten Brørup
2022-10-14 14:01       ` Olivier Matz
2022-10-10 15:21   ` [PATCH v6 0/4] mempool: fix mempool cache flushing algorithm Thomas Monjalon
2022-10-11 19:26     ` Morten Brørup
2022-10-26 14:09     ` Thomas Monjalon
2022-10-26 14:26       ` Morten Brørup
2022-10-26 14:44         ` [PATCH] mempool: cache align mempool cache objects Morten Brørup
2022-10-26 19:44           ` Andrew Rybchenko
2022-10-27  8:34           ` Olivier Matz
2022-10-27  9:22             ` Morten Brørup
2022-10-27 11:42               ` Olivier Matz
2022-10-27 12:11                 ` Morten Brørup
2022-10-27 15:20                   ` Olivier Matz
2022-10-28  6:35           ` [PATCH v3 1/2] " Morten Brørup
2022-10-28  6:35             ` [PATCH v3 2/2] mempool: optimized debug statistics Morten Brørup
2022-10-28  6:41           ` [PATCH v4 1/2] mempool: cache align mempool cache objects Morten Brørup
2022-10-28  6:41             ` [PATCH v4 2/2] mempool: optimized debug statistics Morten Brørup
2022-10-30  9:09               ` Morten Brørup
2022-10-30  9:16                 ` Thomas Monjalon
2022-10-30  9:17             ` [PATCH v4 1/2] mempool: cache align mempool cache objects Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35D87390@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=jerinjacobk@gmail.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).