DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Olivier Matz" <olivier.matz@6wind.com>
Cc: "Bruce Richardson" <bruce.richardson@intel.com>,
	"Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
	"Honnappa Nagarahalli" <Honnappa.Nagarahalli@arm.com>,
	<dev@dpdk.org>
Subject: RE: [RFC] mempool: modify flush threshold
Date: Fri, 28 Jan 2022 09:40:45 +0100	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D86E53@smartserver.smartshare.dk> (raw)
In-Reply-To: <Ye7MS7BffKJJGB1c@platinum>

+Honnappa, you have shown a keen interest in this topic.

> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Monday, 24 January 2022 16.57
> 
> On Mon, Jan 10, 2022 at 09:40:48AM +0000, Bruce Richardson wrote:
> > On Sat, Jan 08, 2022 at 12:00:17PM +0100, Morten Brørup wrote:
> > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > Sent: Friday, 7 January 2022 16.12
> > > >
> > > > On Tue, Dec 28, 2021 at 03:28:45PM +0100, Morten Brørup wrote:
> > > > > Hi mempool maintainers and DPDK team.
> > > > >
> > > > > Does anyone know the reason or history why
> > > > CACHE_FLUSHTHRESH_MULTIPLIER was chosen to be 1.5? I think it is
> > > > counterintuitive.
> > > > >
> > > > > The mempool cache flush threshold was introduced in DPDK
> version 1.3;
> > > > it was not in DPDK version 1.2. The copyright notice for
> rte_mempool.c
> > > > says year 2012.
> > > > >
> > > > >
> > > > > Here is my analysis:
> > > > >
> > > > > With the multiplier of 1.5, a mempool cache is allowed to be
> filled
> > > > up to 50 % above than its target size before its excess entries
> are
> > > > flushed to the mempool (thereby reducing the cache length to the
> target
> > > > size).
> > > > >
> > > > > In the opposite direction, a mempool cache is allowed to be
> drained
> > > > completely, i.e. up to 100 % below its target size.
> > > > >
> > > > > My instinct tells me that it would be more natural to let a
> mempool
> > > > cache go the same amount above and below its target size, i.e.
> using a
> > > > flush multiplier of 2 instead of 1.5.
> > > > >
> > > > > Also, the cache should be allowed to fill up to and including
> the
> > > > flush threshold, so it is flushed when the threshold is exceeded,
> > > > instead of when it is reached.
> > > > >
> > > > > Here is a simplified example:
> > > > >
> > > > > Imagine a cache target size of 32, corresponding to a typical
> packet
> > > > burst. With a flush threshold of 2 (and len > threshold instead
> of len
> > > > >= threshold), the cache could hold 1 +/-1 packet bursts. With
> the
> > > > current multiplier it can only hold [0 .. 1.5[ packet bursts, not
> > > > really providing a lot of elasticity.
> > > > >
> > > > Hi Morten,
> > > >
> > > > Interesting to see this being looked at again. The original idea
> of
> > > > adding
> > > > in some extra room above the requested value was to avoid the
> worst-
> > > > case
> > > > scenario of a pool oscillating between full and empty repeatedly
> due to
> > > > the
> > > > addition/removal of perhaps a single packet. As for why 1.5 was
> chosen
> > > > as
> > > > the value, I don't recall any particular reason for it myself.
> The main
> > > > objective was to have a separate flush and size values so that we
> could
> > > > go
> > > > a bit above full, and when flushing, not emptying the entire
> cache down
> > > > to
> > > > zero.
> > >
> > > Thanks for providing the historical background for this feature,
> Bruce.
> > >
> > > >
> > > > In terms of the behavioural points you make above, I wonder if
> symmetry
> > > > is
> > > > actually necessary or desirable in this case. After all, the
> ideal case
> > > > is
> > > > probably to keep the mempool neither full nor empty, so that both
> > > > allocations or frees can be done without having to go to the
> underlying
> > > > shared data structure. To accomodate this, the mempool will only
> flush
> > > > when
> > > > the number of elements is greater than size * 1.5, and then it
> only
> > > > flushes
> > > > elements down to size, ensuring that allocations can still take
> place.
> > > > On allocation, new buffers are taken when we don't have enough in
> the
> > > > cache
> > > > to fullfil the request, and then the cache is filled up to size,
> not to
> > > > the
> > > > flush threshold.
> > >
> > > I agree with the ideal case.
> > >
> > > However, it looks like the addition of the flush threshold also
> changed the "size" parameter to effectively become "desired length"
> instead. This interpretation is also supported by the flush algorithm,
> which doesn't flush below the "size", but to the "size". So based on
> interpretation, I was wondering why it is not symmetrical around the
> "desired length", but allowed to go 100 % below and only 50 % above.
> > >
> > > >
> > > > Now, for the scenario you describe - where the mempool cache size
> is
> > > > set to
> > > > be the same as the burst size, this scheme probably does break
> down, in
> > > > that we don't really have any burst elasticity. However, I would
> query
> > > > if
> > > > that is a configuration that is used, since to the user it should
> > > > appear
> > > > correctly to provide no elasticity. Looking at testpmd, and our
> example
> > > > apps, the standard there is a burst size of 32 and mempool cache
> of
> > > > ~256.
> > > > In OVS code, netdev-dpdk.c seems to initialize the mempool  with
> cache
> > > > size
> > > > of RTE_MEMPOOL_CACHE_MAX_SIZE (through define MP_CACHE_SZ). In
> all
> > > > these
> > > > cases, I think the 1.5 threshold should work just fine for us.
> > >
> > > My example was for demonstration only, and hopefully not being used
> by any applications.
> > >
> > > The simplified example was intended to demonstrate the theoretical
> effect of the unbalance in using the 1.5 threshold. It will be similar
> with a cache size of 256 objects: You will be allowed to go 8 bursts
> (calculation: 256 objects / 32 objects per burst) below the cache
> "size" without retrieving objects from the backing pool, but only 4
> bursts above the cache "size" before flushing objects to the backing
> pool.
> > >
> > > > That said,
> > > > if you want to bump it up to 2x, I can't say I'd object strongly
> as it
> > > > should be harmless, I think.
> > >
> > > From an academic viewpoint, 2x seems much better to me than 1.5x.
> And based on your background information from the introduction of the
> flush threshold, there was no academic reason why 1.5x was chosen -
> this is certainly valuable feedback.
> > >
> 
> Just to say I agree with Morten's analysis. Having a threshold at 2x
> size would
> make more sense to me.
> 
> Without the corner cases (big requests, or common pool empty), I would
> have suggested this, which looks more symetric than what we have today:
> 
> get()
>   fill request with as many objs as possible from cache
>   if cache is empty, request "size" more objects from common pool
>   fill the rest of the request from cache

I agree, and this is doable without a performance penalty.

> 
> put()
>   fill cache with provided objects up to 2x size
>   if cache exceeds threshold, fill common pool with "size" objects
>   fill the cache with the remaining objs from request

I tried implementing this already, but it cannot be done without a performance cost, one way or the other:

1. If we flush the top of the cache (which is a stack), we are flushing the hot objects, leaving cold objects at the new top of the cache.

2. If we flush the bottom of the cache, we need to move the remaining objects in the cache down. E.g. flushing objects at index [0..256[ in the cache means that the objects at index [256..len[ in the cache must be moved down to index[0..len-256[.

So I came to the conclusion that flushing the cache needs to flush it completely. I also consider this the best choice for real applications:

Either, an lcore thread in the application is in a state of balance, where it uses the mempool cache within its flush/refill boundaries - which makes the flush method less important.

Or, an lcore thread in the application is out of balance (either permanently or temporarily), and mostly gets or puts objects from/to the mempool. If it mostly puts objects, not flushing all of the objects will cause more frequent flushing. E.g.:

Cache size=256, flushthresh=512 (2x, not 1.5x), initial len=256; application burst len=32.

If flushing leaves "size" objects in the cache, the cache is flushed every 8th burst. And with flushthresh=384 (1.5x), flushing occurs every 4th burst, which is what the mempool does today!

If the cache is flushed completely, the cache is only flushed at every 16th burst.


And when/if the application thread breaks its pattern of continuously putting objects, and suddenly starts to get objects instead, it will either get objects already in the cache, or the get() function will refill the cache.

The concept of not flushing the cache completely is based on the assumption that it is more likely for an application's lcore thread to get() after flushing than to put() after flushing. I strongly disagree with this assumption! If an application thread is continuously putting so much that it overflows the cache, it is much more likely to keep putting than it is to start getting. This is CPU branch prediction 101.

> 
> But this is a quite sensitive area, and modifying these functions
> should
> be done with care, as it impacts mbuf allocation.

Certainly!

The current mempool caching algorithm is defective, and I'm trying to repair it. Honnappa already provided very valuable feedback regarding CPUs with small L1 cache, which must be taken carefully into consideration.

For now, there seems to be consensus regarding changing the flush threshold multiplier from 1.5 to 2.

> 
> > > I will consider providing a patch. Worst case it probably doesn't
> do any harm. And if anyone is really interested, they can look at the
> mempool debug stats to see the difference it makes for their specific
> application, e.g. the put_common_pool_bulk/put_bulk and
> get_common_pool_bulk/get_success_bulk ratios should decrease.
> >
> > Sure, thanks.
> >
> > /Bruce


  reply	other threads:[~2022-01-28  8:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-28 14:28 Morten Brørup
2022-01-07 15:12 ` Bruce Richardson
2022-01-08 11:00   ` Morten Brørup
2022-01-10  9:40     ` Bruce Richardson
2022-01-24 15:56       ` Olivier Matz
2022-01-28  8:40         ` Morten Brørup [this message]
2022-01-28  8:52         ` Morten Brørup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35D86E53@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).