DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Olivier Matz" <olivier.matz@6wind.com>,
	"Jerin Jacob" <jerinjacobk@gmail.com>
Cc: "Bruce Richardson" <bruce.richardson@intel.com>,
	"Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
	"Honnappa Nagarahalli" <Honnappa.Nagarahalli@arm.com>,
	<dev@dpdk.org>
Subject: RE: [RFC] mempool: modify flush threshold
Date: Fri, 28 Jan 2022 09:52:26 +0100	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D86E54@smartserver.smartshare.dk> (raw)
In-Reply-To: <Ye7MS7BffKJJGB1c@platinum>

+Jerin, you had feedback regarding the mempool.

> From: Morten Brørup
> Sent: Friday, 28 January 2022 09.41
> 
> +Honnappa, you have shown a keen interest in this topic.
> 
> > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > Sent: Monday, 24 January 2022 16.57
> >
> > On Mon, Jan 10, 2022 at 09:40:48AM +0000, Bruce Richardson wrote:
> > > On Sat, Jan 08, 2022 at 12:00:17PM +0100, Morten Brørup wrote:
> > > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > > Sent: Friday, 7 January 2022 16.12
> > > > >
> > > > > On Tue, Dec 28, 2021 at 03:28:45PM +0100, Morten Brørup wrote:
> > > > > > Hi mempool maintainers and DPDK team.
> > > > > >
> > > > > > Does anyone know the reason or history why
> > > > > CACHE_FLUSHTHRESH_MULTIPLIER was chosen to be 1.5? I think it
> is
> > > > > counterintuitive.
> > > > > >
> > > > > > The mempool cache flush threshold was introduced in DPDK
> > version 1.3;
> > > > > it was not in DPDK version 1.2. The copyright notice for
> > rte_mempool.c
> > > > > says year 2012.
> > > > > >
> > > > > >
> > > > > > Here is my analysis:
> > > > > >
> > > > > > With the multiplier of 1.5, a mempool cache is allowed to be
> > filled
> > > > > up to 50 % above than its target size before its excess entries
> > are
> > > > > flushed to the mempool (thereby reducing the cache length to
> the
> > target
> > > > > size).
> > > > > >
> > > > > > In the opposite direction, a mempool cache is allowed to be
> > drained
> > > > > completely, i.e. up to 100 % below its target size.
> > > > > >
> > > > > > My instinct tells me that it would be more natural to let a
> > mempool
> > > > > cache go the same amount above and below its target size, i.e.
> > using a
> > > > > flush multiplier of 2 instead of 1.5.
> > > > > >
> > > > > > Also, the cache should be allowed to fill up to and including
> > the
> > > > > flush threshold, so it is flushed when the threshold is
> exceeded,
> > > > > instead of when it is reached.
> > > > > >
> > > > > > Here is a simplified example:
> > > > > >
> > > > > > Imagine a cache target size of 32, corresponding to a typical
> > packet
> > > > > burst. With a flush threshold of 2 (and len > threshold instead
> > of len
> > > > > >= threshold), the cache could hold 1 +/-1 packet bursts. With
> > the
> > > > > current multiplier it can only hold [0 .. 1.5[ packet bursts,
> not
> > > > > really providing a lot of elasticity.
> > > > > >
> > > > > Hi Morten,
> > > > >
> > > > > Interesting to see this being looked at again. The original
> idea
> > of
> > > > > adding
> > > > > in some extra room above the requested value was to avoid the
> > worst-
> > > > > case
> > > > > scenario of a pool oscillating between full and empty
> repeatedly
> > due to
> > > > > the
> > > > > addition/removal of perhaps a single packet. As for why 1.5 was
> > chosen
> > > > > as
> > > > > the value, I don't recall any particular reason for it myself.
> > The main
> > > > > objective was to have a separate flush and size values so that
> we
> > could
> > > > > go
> > > > > a bit above full, and when flushing, not emptying the entire
> > cache down
> > > > > to
> > > > > zero.
> > > >
> > > > Thanks for providing the historical background for this feature,
> > Bruce.
> > > >
> > > > >
> > > > > In terms of the behavioural points you make above, I wonder if
> > symmetry
> > > > > is
> > > > > actually necessary or desirable in this case. After all, the
> > ideal case
> > > > > is
> > > > > probably to keep the mempool neither full nor empty, so that
> both
> > > > > allocations or frees can be done without having to go to the
> > underlying
> > > > > shared data structure. To accomodate this, the mempool will
> only
> > flush
> > > > > when
> > > > > the number of elements is greater than size * 1.5, and then it
> > only
> > > > > flushes
> > > > > elements down to size, ensuring that allocations can still take
> > place.
> > > > > On allocation, new buffers are taken when we don't have enough
> in
> > the
> > > > > cache
> > > > > to fullfil the request, and then the cache is filled up to
> size,
> > not to
> > > > > the
> > > > > flush threshold.
> > > >
> > > > I agree with the ideal case.
> > > >
> > > > However, it looks like the addition of the flush threshold also
> > changed the "size" parameter to effectively become "desired length"
> > instead. This interpretation is also supported by the flush
> algorithm,
> > which doesn't flush below the "size", but to the "size". So based on
> > interpretation, I was wondering why it is not symmetrical around the
> > "desired length", but allowed to go 100 % below and only 50 % above.
> > > >
> > > > >
> > > > > Now, for the scenario you describe - where the mempool cache
> size
> > is
> > > > > set to
> > > > > be the same as the burst size, this scheme probably does break
> > down, in
> > > > > that we don't really have any burst elasticity. However, I
> would
> > query
> > > > > if
> > > > > that is a configuration that is used, since to the user it
> should
> > > > > appear
> > > > > correctly to provide no elasticity. Looking at testpmd, and our
> > example
> > > > > apps, the standard there is a burst size of 32 and mempool
> cache
> > of
> > > > > ~256.
> > > > > In OVS code, netdev-dpdk.c seems to initialize the mempool
> with
> > cache
> > > > > size
> > > > > of RTE_MEMPOOL_CACHE_MAX_SIZE (through define MP_CACHE_SZ). In
> > all
> > > > > these
> > > > > cases, I think the 1.5 threshold should work just fine for us.
> > > >
> > > > My example was for demonstration only, and hopefully not being
> used
> > by any applications.
> > > >
> > > > The simplified example was intended to demonstrate the
> theoretical
> > effect of the unbalance in using the 1.5 threshold. It will be
> similar
> > with a cache size of 256 objects: You will be allowed to go 8 bursts
> > (calculation: 256 objects / 32 objects per burst) below the cache
> > "size" without retrieving objects from the backing pool, but only 4
> > bursts above the cache "size" before flushing objects to the backing
> > pool.
> > > >
> > > > > That said,
> > > > > if you want to bump it up to 2x, I can't say I'd object
> strongly
> > as it
> > > > > should be harmless, I think.
> > > >
> > > > From an academic viewpoint, 2x seems much better to me than 1.5x.
> > And based on your background information from the introduction of the
> > flush threshold, there was no academic reason why 1.5x was chosen -
> > this is certainly valuable feedback.
> > > >
> >
> > Just to say I agree with Morten's analysis. Having a threshold at 2x
> > size would
> > make more sense to me.
> >
> > Without the corner cases (big requests, or common pool empty), I
> would
> > have suggested this, which looks more symetric than what we have
> today:
> >
> > get()
> >   fill request with as many objs as possible from cache
> >   if cache is empty, request "size" more objects from common pool
> >   fill the rest of the request from cache
> 
> I agree, and this is doable without a performance penalty.
> 
> >
> > put()
> >   fill cache with provided objects up to 2x size
> >   if cache exceeds threshold, fill common pool with "size" objects
> >   fill the cache with the remaining objs from request
> 
> I tried implementing this already, but it cannot be done without a
> performance cost, one way or the other:
> 
> 1. If we flush the top of the cache (which is a stack), we are flushing
> the hot objects, leaving cold objects at the new top of the cache.
> 
> 2. If we flush the bottom of the cache, we need to move the remaining
> objects in the cache down. E.g. flushing objects at index [0..256[ in
> the cache means that the objects at index [256..len[ in the cache must
> be moved down to index[0..len-256[.
> 
> So I came to the conclusion that flushing the cache needs to flush it
> completely. I also consider this the best choice for real applications:
> 
> Either, an lcore thread in the application is in a state of balance,
> where it uses the mempool cache within its flush/refill boundaries -
> which makes the flush method less important.
> 
> Or, an lcore thread in the application is out of balance (either
> permanently or temporarily), and mostly gets or puts objects from/to
> the mempool. If it mostly puts objects, not flushing all of the objects
> will cause more frequent flushing. E.g.:
> 
> Cache size=256, flushthresh=512 (2x, not 1.5x), initial len=256;
> application burst len=32.
> 
> If flushing leaves "size" objects in the cache, the cache is flushed
> every 8th burst. And with flushthresh=384 (1.5x), flushing occurs every
> 4th burst, which is what the mempool does today!
> 
> If the cache is flushed completely, the cache is only flushed at every
> 16th burst.
> 
> 
> And when/if the application thread breaks its pattern of continuously
> putting objects, and suddenly starts to get objects instead, it will
> either get objects already in the cache, or the get() function will
> refill the cache.
> 
> The concept of not flushing the cache completely is based on the
> assumption that it is more likely for an application's lcore thread to
> get() after flushing than to put() after flushing. I strongly disagree
> with this assumption! If an application thread is continuously putting
> so much that it overflows the cache, it is much more likely to keep
> putting than it is to start getting. This is CPU branch prediction 101.
> 
> >
> > But this is a quite sensitive area, and modifying these functions
> > should
> > be done with care, as it impacts mbuf allocation.
> 
> Certainly!
> 
> The current mempool caching algorithm is defective, and I'm trying to
> repair it. Honnappa already provided very valuable feedback regarding
> CPUs with small L1 cache, which must be taken carefully into
> consideration.

I'm sorry... The L1 cache feedback was from Jerin, not Honnappa.

> 
> For now, there seems to be consensus regarding changing the flush
> threshold multiplier from 1.5 to 2.
> 
> >
> > > > I will consider providing a patch. Worst case it probably doesn't
> > do any harm. And if anyone is really interested, they can look at the
> > mempool debug stats to see the difference it makes for their specific
> > application, e.g. the put_common_pool_bulk/put_bulk and
> > get_common_pool_bulk/get_success_bulk ratios should decrease.
> > >
> > > Sure, thanks.
> > >
> > > /Bruce


      parent reply	other threads:[~2022-01-28  8:52 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-28 14:28 Morten Brørup
2022-01-07 15:12 ` Bruce Richardson
2022-01-08 11:00   ` Morten Brørup
2022-01-10  9:40     ` Bruce Richardson
2022-01-24 15:56       ` Olivier Matz
2022-01-28  8:40         ` Morten Brørup
2022-01-28  8:52         ` Morten Brørup [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35D86E54@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=jerinjacobk@gmail.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).