DPDK patches and discussions
 help / color / mirror / Atom feed
* [RFC] mempool: modify flush threshold
@ 2021-12-28 14:28 Morten Brørup
  2022-01-07 15:12 ` Bruce Richardson
  0 siblings, 1 reply; 7+ messages in thread
From: Morten Brørup @ 2021-12-28 14:28 UTC (permalink / raw)
  To: Olivier Matz, Andrew Rybchenko, dev

Hi mempool maintainers and DPDK team.

Does anyone know the reason or history why CACHE_FLUSHTHRESH_MULTIPLIER was chosen to be 1.5? I think it is counterintuitive.

The mempool cache flush threshold was introduced in DPDK version 1.3; it was not in DPDK version 1.2. The copyright notice for rte_mempool.c says year 2012.


Here is my analysis:

With the multiplier of 1.5, a mempool cache is allowed to be filled up to 50 % above than its target size before its excess entries are flushed to the mempool (thereby reducing the cache length to the target size).

In the opposite direction, a mempool cache is allowed to be drained completely, i.e. up to 100 % below its target size.

My instinct tells me that it would be more natural to let a mempool cache go the same amount above and below its target size, i.e. using a flush multiplier of 2 instead of 1.5.

Also, the cache should be allowed to fill up to and including the flush threshold, so it is flushed when the threshold is exceeded, instead of when it is reached.

Here is a simplified example:

Imagine a cache target size of 32, corresponding to a typical packet burst. With a flush threshold of 2 (and len > threshold instead of len >= threshold), the cache could hold 1 +/-1 packet bursts. With the current multiplier it can only hold [0 .. 1.5[ packet bursts, not really providing a lot of elasticity.


Med venlig hilsen / Kind regards,
-Morten Brørup


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] mempool: modify flush threshold
  2021-12-28 14:28 [RFC] mempool: modify flush threshold Morten Brørup
@ 2022-01-07 15:12 ` Bruce Richardson
  2022-01-08 11:00   ` Morten Brørup
  0 siblings, 1 reply; 7+ messages in thread
From: Bruce Richardson @ 2022-01-07 15:12 UTC (permalink / raw)
  To: Morten Brørup; +Cc: Olivier Matz, Andrew Rybchenko, dev

On Tue, Dec 28, 2021 at 03:28:45PM +0100, Morten Brørup wrote:
> Hi mempool maintainers and DPDK team.
> 
> Does anyone know the reason or history why CACHE_FLUSHTHRESH_MULTIPLIER was chosen to be 1.5? I think it is counterintuitive.
> 
> The mempool cache flush threshold was introduced in DPDK version 1.3; it was not in DPDK version 1.2. The copyright notice for rte_mempool.c says year 2012.
> 
> 
> Here is my analysis:
> 
> With the multiplier of 1.5, a mempool cache is allowed to be filled up to 50 % above than its target size before its excess entries are flushed to the mempool (thereby reducing the cache length to the target size).
> 
> In the opposite direction, a mempool cache is allowed to be drained completely, i.e. up to 100 % below its target size.
> 
> My instinct tells me that it would be more natural to let a mempool cache go the same amount above and below its target size, i.e. using a flush multiplier of 2 instead of 1.5.
> 
> Also, the cache should be allowed to fill up to and including the flush threshold, so it is flushed when the threshold is exceeded, instead of when it is reached.
> 
> Here is a simplified example:
> 
> Imagine a cache target size of 32, corresponding to a typical packet burst. With a flush threshold of 2 (and len > threshold instead of len >= threshold), the cache could hold 1 +/-1 packet bursts. With the current multiplier it can only hold [0 .. 1.5[ packet bursts, not really providing a lot of elasticity.
> 
Hi Morten,

Interesting to see this being looked at again. The original idea of adding
in some extra room above the requested value was to avoid the worst-case
scenario of a pool oscillating between full and empty repeatedly due to the
addition/removal of perhaps a single packet. As for why 1.5 was chosen as
the value, I don't recall any particular reason for it myself. The main
objective was to have a separate flush and size values so that we could go
a bit above full, and when flushing, not emptying the entire cache down to
zero.

In terms of the behavioural points you make above, I wonder if symmetry is
actually necessary or desirable in this case. After all, the ideal case is
probably to keep the mempool neither full nor empty, so that both
allocations or frees can be done without having to go to the underlying
shared data structure. To accomodate this, the mempool will only flush when
the number of elements is greater than size * 1.5, and then it only flushes
elements down to size, ensuring that allocations can still take place.
On allocation, new buffers are taken when we don't have enough in the cache
to fullfil the request, and then the cache is filled up to size, not to the
flush threshold.

Now, for the scenario you describe - where the mempool cache size is set to
be the same as the burst size, this scheme probably does break down, in
that we don't really have any burst elasticity. However, I would query if
that is a configuration that is used, since to the user it should appear
correctly to provide no elasticity. Looking at testpmd, and our example
apps, the standard there is a burst size of 32 and mempool cache of ~256.
In OVS code, netdev-dpdk.c seems to initialize the mempool  with cache size
of RTE_MEMPOOL_CACHE_MAX_SIZE (through define MP_CACHE_SZ). In all these
cases, I think the 1.5 threshold should work just fine for us. That said,
if you want to bump it up to 2x, I can't say I'd object strongly as it
should be harmless, I think.

Just my initial thoughts,

/Bruce

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [RFC] mempool: modify flush threshold
  2022-01-07 15:12 ` Bruce Richardson
@ 2022-01-08 11:00   ` Morten Brørup
  2022-01-10  9:40     ` Bruce Richardson
  0 siblings, 1 reply; 7+ messages in thread
From: Morten Brørup @ 2022-01-08 11:00 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Olivier Matz, Andrew Rybchenko, dev

> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Friday, 7 January 2022 16.12
> 
> On Tue, Dec 28, 2021 at 03:28:45PM +0100, Morten Brørup wrote:
> > Hi mempool maintainers and DPDK team.
> >
> > Does anyone know the reason or history why
> CACHE_FLUSHTHRESH_MULTIPLIER was chosen to be 1.5? I think it is
> counterintuitive.
> >
> > The mempool cache flush threshold was introduced in DPDK version 1.3;
> it was not in DPDK version 1.2. The copyright notice for rte_mempool.c
> says year 2012.
> >
> >
> > Here is my analysis:
> >
> > With the multiplier of 1.5, a mempool cache is allowed to be filled
> up to 50 % above than its target size before its excess entries are
> flushed to the mempool (thereby reducing the cache length to the target
> size).
> >
> > In the opposite direction, a mempool cache is allowed to be drained
> completely, i.e. up to 100 % below its target size.
> >
> > My instinct tells me that it would be more natural to let a mempool
> cache go the same amount above and below its target size, i.e. using a
> flush multiplier of 2 instead of 1.5.
> >
> > Also, the cache should be allowed to fill up to and including the
> flush threshold, so it is flushed when the threshold is exceeded,
> instead of when it is reached.
> >
> > Here is a simplified example:
> >
> > Imagine a cache target size of 32, corresponding to a typical packet
> burst. With a flush threshold of 2 (and len > threshold instead of len
> >= threshold), the cache could hold 1 +/-1 packet bursts. With the
> current multiplier it can only hold [0 .. 1.5[ packet bursts, not
> really providing a lot of elasticity.
> >
> Hi Morten,
> 
> Interesting to see this being looked at again. The original idea of
> adding
> in some extra room above the requested value was to avoid the worst-
> case
> scenario of a pool oscillating between full and empty repeatedly due to
> the
> addition/removal of perhaps a single packet. As for why 1.5 was chosen
> as
> the value, I don't recall any particular reason for it myself. The main
> objective was to have a separate flush and size values so that we could
> go
> a bit above full, and when flushing, not emptying the entire cache down
> to
> zero.

Thanks for providing the historical background for this feature, Bruce.

> 
> In terms of the behavioural points you make above, I wonder if symmetry
> is
> actually necessary or desirable in this case. After all, the ideal case
> is
> probably to keep the mempool neither full nor empty, so that both
> allocations or frees can be done without having to go to the underlying
> shared data structure. To accomodate this, the mempool will only flush
> when
> the number of elements is greater than size * 1.5, and then it only
> flushes
> elements down to size, ensuring that allocations can still take place.
> On allocation, new buffers are taken when we don't have enough in the
> cache
> to fullfil the request, and then the cache is filled up to size, not to
> the
> flush threshold.

I agree with the ideal case.

However, it looks like the addition of the flush threshold also changed the "size" parameter to effectively become "desired length" instead. This interpretation is also supported by the flush algorithm, which doesn't flush below the "size", but to the "size". So based on interpretation, I was wondering why it is not symmetrical around the "desired length", but allowed to go 100 % below and only 50 % above.

> 
> Now, for the scenario you describe - where the mempool cache size is
> set to
> be the same as the burst size, this scheme probably does break down, in
> that we don't really have any burst elasticity. However, I would query
> if
> that is a configuration that is used, since to the user it should
> appear
> correctly to provide no elasticity. Looking at testpmd, and our example
> apps, the standard there is a burst size of 32 and mempool cache of
> ~256.
> In OVS code, netdev-dpdk.c seems to initialize the mempool  with cache
> size
> of RTE_MEMPOOL_CACHE_MAX_SIZE (through define MP_CACHE_SZ). In all
> these
> cases, I think the 1.5 threshold should work just fine for us.

My example was for demonstration only, and hopefully not being used by any applications.

The simplified example was intended to demonstrate the theoretical effect of the unbalance in using the 1.5 threshold. It will be similar with a cache size of 256 objects: You will be allowed to go 8 bursts (calculation: 256 objects / 32 objects per burst) below the cache "size" without retrieving objects from the backing pool, but only 4 bursts above the cache "size" before flushing objects to the backing pool.

> That said,
> if you want to bump it up to 2x, I can't say I'd object strongly as it
> should be harmless, I think.

From an academic viewpoint, 2x seems much better to me than 1.5x. And based on your background information from the introduction of the flush threshold, there was no academic reason why 1.5x was chosen - this is certainly valuable feedback.

I will consider providing a patch. Worst case it probably doesn't do any harm. And if anyone is really interested, they can look at the mempool debug stats to see the difference it makes for their specific application, e.g. the put_common_pool_bulk/put_bulk and get_common_pool_bulk/get_success_bulk ratios should decrease.

> 
> Just my initial thoughts,
> 
> /Bruce


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] mempool: modify flush threshold
  2022-01-08 11:00   ` Morten Brørup
@ 2022-01-10  9:40     ` Bruce Richardson
  2022-01-24 15:56       ` Olivier Matz
  0 siblings, 1 reply; 7+ messages in thread
From: Bruce Richardson @ 2022-01-10  9:40 UTC (permalink / raw)
  To: Morten Brørup; +Cc: Olivier Matz, Andrew Rybchenko, dev

On Sat, Jan 08, 2022 at 12:00:17PM +0100, Morten Brørup wrote:
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: Friday, 7 January 2022 16.12
> > 
> > On Tue, Dec 28, 2021 at 03:28:45PM +0100, Morten Brørup wrote:
> > > Hi mempool maintainers and DPDK team.
> > >
> > > Does anyone know the reason or history why
> > CACHE_FLUSHTHRESH_MULTIPLIER was chosen to be 1.5? I think it is
> > counterintuitive.
> > >
> > > The mempool cache flush threshold was introduced in DPDK version 1.3;
> > it was not in DPDK version 1.2. The copyright notice for rte_mempool.c
> > says year 2012.
> > >
> > >
> > > Here is my analysis:
> > >
> > > With the multiplier of 1.5, a mempool cache is allowed to be filled
> > up to 50 % above than its target size before its excess entries are
> > flushed to the mempool (thereby reducing the cache length to the target
> > size).
> > >
> > > In the opposite direction, a mempool cache is allowed to be drained
> > completely, i.e. up to 100 % below its target size.
> > >
> > > My instinct tells me that it would be more natural to let a mempool
> > cache go the same amount above and below its target size, i.e. using a
> > flush multiplier of 2 instead of 1.5.
> > >
> > > Also, the cache should be allowed to fill up to and including the
> > flush threshold, so it is flushed when the threshold is exceeded,
> > instead of when it is reached.
> > >
> > > Here is a simplified example:
> > >
> > > Imagine a cache target size of 32, corresponding to a typical packet
> > burst. With a flush threshold of 2 (and len > threshold instead of len
> > >= threshold), the cache could hold 1 +/-1 packet bursts. With the
> > current multiplier it can only hold [0 .. 1.5[ packet bursts, not
> > really providing a lot of elasticity.
> > >
> > Hi Morten,
> > 
> > Interesting to see this being looked at again. The original idea of
> > adding
> > in some extra room above the requested value was to avoid the worst-
> > case
> > scenario of a pool oscillating between full and empty repeatedly due to
> > the
> > addition/removal of perhaps a single packet. As for why 1.5 was chosen
> > as
> > the value, I don't recall any particular reason for it myself. The main
> > objective was to have a separate flush and size values so that we could
> > go
> > a bit above full, and when flushing, not emptying the entire cache down
> > to
> > zero.
> 
> Thanks for providing the historical background for this feature, Bruce.
> 
> > 
> > In terms of the behavioural points you make above, I wonder if symmetry
> > is
> > actually necessary or desirable in this case. After all, the ideal case
> > is
> > probably to keep the mempool neither full nor empty, so that both
> > allocations or frees can be done without having to go to the underlying
> > shared data structure. To accomodate this, the mempool will only flush
> > when
> > the number of elements is greater than size * 1.5, and then it only
> > flushes
> > elements down to size, ensuring that allocations can still take place.
> > On allocation, new buffers are taken when we don't have enough in the
> > cache
> > to fullfil the request, and then the cache is filled up to size, not to
> > the
> > flush threshold.
> 
> I agree with the ideal case.
> 
> However, it looks like the addition of the flush threshold also changed the "size" parameter to effectively become "desired length" instead. This interpretation is also supported by the flush algorithm, which doesn't flush below the "size", but to the "size". So based on interpretation, I was wondering why it is not symmetrical around the "desired length", but allowed to go 100 % below and only 50 % above.
> 
> > 
> > Now, for the scenario you describe - where the mempool cache size is
> > set to
> > be the same as the burst size, this scheme probably does break down, in
> > that we don't really have any burst elasticity. However, I would query
> > if
> > that is a configuration that is used, since to the user it should
> > appear
> > correctly to provide no elasticity. Looking at testpmd, and our example
> > apps, the standard there is a burst size of 32 and mempool cache of
> > ~256.
> > In OVS code, netdev-dpdk.c seems to initialize the mempool  with cache
> > size
> > of RTE_MEMPOOL_CACHE_MAX_SIZE (through define MP_CACHE_SZ). In all
> > these
> > cases, I think the 1.5 threshold should work just fine for us.
> 
> My example was for demonstration only, and hopefully not being used by any applications.
> 
> The simplified example was intended to demonstrate the theoretical effect of the unbalance in using the 1.5 threshold. It will be similar with a cache size of 256 objects: You will be allowed to go 8 bursts (calculation: 256 objects / 32 objects per burst) below the cache "size" without retrieving objects from the backing pool, but only 4 bursts above the cache "size" before flushing objects to the backing pool.
> 
> > That said,
> > if you want to bump it up to 2x, I can't say I'd object strongly as it
> > should be harmless, I think.
> 
> From an academic viewpoint, 2x seems much better to me than 1.5x. And based on your background information from the introduction of the flush threshold, there was no academic reason why 1.5x was chosen - this is certainly valuable feedback.
> 
> I will consider providing a patch. Worst case it probably doesn't do any harm. And if anyone is really interested, they can look at the mempool debug stats to see the difference it makes for their specific application, e.g. the put_common_pool_bulk/put_bulk and get_common_pool_bulk/get_success_bulk ratios should decrease.

Sure, thanks.

/Bruce

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] mempool: modify flush threshold
  2022-01-10  9:40     ` Bruce Richardson
@ 2022-01-24 15:56       ` Olivier Matz
  2022-01-28  8:40         ` Morten Brørup
  2022-01-28  8:52         ` Morten Brørup
  0 siblings, 2 replies; 7+ messages in thread
From: Olivier Matz @ 2022-01-24 15:56 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Morten Brørup, Andrew Rybchenko, dev

On Mon, Jan 10, 2022 at 09:40:48AM +0000, Bruce Richardson wrote:
> On Sat, Jan 08, 2022 at 12:00:17PM +0100, Morten Brørup wrote:
> > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > Sent: Friday, 7 January 2022 16.12
> > > 
> > > On Tue, Dec 28, 2021 at 03:28:45PM +0100, Morten Brørup wrote:
> > > > Hi mempool maintainers and DPDK team.
> > > >
> > > > Does anyone know the reason or history why
> > > CACHE_FLUSHTHRESH_MULTIPLIER was chosen to be 1.5? I think it is
> > > counterintuitive.
> > > >
> > > > The mempool cache flush threshold was introduced in DPDK version 1.3;
> > > it was not in DPDK version 1.2. The copyright notice for rte_mempool.c
> > > says year 2012.
> > > >
> > > >
> > > > Here is my analysis:
> > > >
> > > > With the multiplier of 1.5, a mempool cache is allowed to be filled
> > > up to 50 % above than its target size before its excess entries are
> > > flushed to the mempool (thereby reducing the cache length to the target
> > > size).
> > > >
> > > > In the opposite direction, a mempool cache is allowed to be drained
> > > completely, i.e. up to 100 % below its target size.
> > > >
> > > > My instinct tells me that it would be more natural to let a mempool
> > > cache go the same amount above and below its target size, i.e. using a
> > > flush multiplier of 2 instead of 1.5.
> > > >
> > > > Also, the cache should be allowed to fill up to and including the
> > > flush threshold, so it is flushed when the threshold is exceeded,
> > > instead of when it is reached.
> > > >
> > > > Here is a simplified example:
> > > >
> > > > Imagine a cache target size of 32, corresponding to a typical packet
> > > burst. With a flush threshold of 2 (and len > threshold instead of len
> > > >= threshold), the cache could hold 1 +/-1 packet bursts. With the
> > > current multiplier it can only hold [0 .. 1.5[ packet bursts, not
> > > really providing a lot of elasticity.
> > > >
> > > Hi Morten,
> > > 
> > > Interesting to see this being looked at again. The original idea of
> > > adding
> > > in some extra room above the requested value was to avoid the worst-
> > > case
> > > scenario of a pool oscillating between full and empty repeatedly due to
> > > the
> > > addition/removal of perhaps a single packet. As for why 1.5 was chosen
> > > as
> > > the value, I don't recall any particular reason for it myself. The main
> > > objective was to have a separate flush and size values so that we could
> > > go
> > > a bit above full, and when flushing, not emptying the entire cache down
> > > to
> > > zero.
> > 
> > Thanks for providing the historical background for this feature, Bruce.
> > 
> > > 
> > > In terms of the behavioural points you make above, I wonder if symmetry
> > > is
> > > actually necessary or desirable in this case. After all, the ideal case
> > > is
> > > probably to keep the mempool neither full nor empty, so that both
> > > allocations or frees can be done without having to go to the underlying
> > > shared data structure. To accomodate this, the mempool will only flush
> > > when
> > > the number of elements is greater than size * 1.5, and then it only
> > > flushes
> > > elements down to size, ensuring that allocations can still take place.
> > > On allocation, new buffers are taken when we don't have enough in the
> > > cache
> > > to fullfil the request, and then the cache is filled up to size, not to
> > > the
> > > flush threshold.
> > 
> > I agree with the ideal case.
> > 
> > However, it looks like the addition of the flush threshold also changed the "size" parameter to effectively become "desired length" instead. This interpretation is also supported by the flush algorithm, which doesn't flush below the "size", but to the "size". So based on interpretation, I was wondering why it is not symmetrical around the "desired length", but allowed to go 100 % below and only 50 % above.
> > 
> > > 
> > > Now, for the scenario you describe - where the mempool cache size is
> > > set to
> > > be the same as the burst size, this scheme probably does break down, in
> > > that we don't really have any burst elasticity. However, I would query
> > > if
> > > that is a configuration that is used, since to the user it should
> > > appear
> > > correctly to provide no elasticity. Looking at testpmd, and our example
> > > apps, the standard there is a burst size of 32 and mempool cache of
> > > ~256.
> > > In OVS code, netdev-dpdk.c seems to initialize the mempool  with cache
> > > size
> > > of RTE_MEMPOOL_CACHE_MAX_SIZE (through define MP_CACHE_SZ). In all
> > > these
> > > cases, I think the 1.5 threshold should work just fine for us.
> > 
> > My example was for demonstration only, and hopefully not being used by any applications.
> > 
> > The simplified example was intended to demonstrate the theoretical effect of the unbalance in using the 1.5 threshold. It will be similar with a cache size of 256 objects: You will be allowed to go 8 bursts (calculation: 256 objects / 32 objects per burst) below the cache "size" without retrieving objects from the backing pool, but only 4 bursts above the cache "size" before flushing objects to the backing pool.
> > 
> > > That said,
> > > if you want to bump it up to 2x, I can't say I'd object strongly as it
> > > should be harmless, I think.
> > 
> > From an academic viewpoint, 2x seems much better to me than 1.5x. And based on your background information from the introduction of the flush threshold, there was no academic reason why 1.5x was chosen - this is certainly valuable feedback.
> > 

Just to say I agree with Morten's analysis. Having a threshold at 2x size would
make more sense to me.

Without the corner cases (big requests, or common pool empty), I would
have suggested this, which looks more symetric than what we have today:

get()
  fill request with as many objs as possible from cache
  if cache is empty, request "size" more objects from common pool
  fill the rest of the request from cache

put()
  fill cache with provided objects up to 2x size
  if cache exceeds threshold, fill common pool with "size" objects
  fill the cache with the remaining objs from request

But this is a quite sensitive area, and modifying these functions should
be done with care, as it impacts mbuf allocation.

> > I will consider providing a patch. Worst case it probably doesn't do any harm. And if anyone is really interested, they can look at the mempool debug stats to see the difference it makes for their specific application, e.g. the put_common_pool_bulk/put_bulk and get_common_pool_bulk/get_success_bulk ratios should decrease.
> 
> Sure, thanks.
> 
> /Bruce

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [RFC] mempool: modify flush threshold
  2022-01-24 15:56       ` Olivier Matz
@ 2022-01-28  8:40         ` Morten Brørup
  2022-01-28  8:52         ` Morten Brørup
  1 sibling, 0 replies; 7+ messages in thread
From: Morten Brørup @ 2022-01-28  8:40 UTC (permalink / raw)
  To: Olivier Matz
  Cc: Bruce Richardson, Andrew Rybchenko, Honnappa Nagarahalli, dev

+Honnappa, you have shown a keen interest in this topic.

> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Monday, 24 January 2022 16.57
> 
> On Mon, Jan 10, 2022 at 09:40:48AM +0000, Bruce Richardson wrote:
> > On Sat, Jan 08, 2022 at 12:00:17PM +0100, Morten Brørup wrote:
> > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > Sent: Friday, 7 January 2022 16.12
> > > >
> > > > On Tue, Dec 28, 2021 at 03:28:45PM +0100, Morten Brørup wrote:
> > > > > Hi mempool maintainers and DPDK team.
> > > > >
> > > > > Does anyone know the reason or history why
> > > > CACHE_FLUSHTHRESH_MULTIPLIER was chosen to be 1.5? I think it is
> > > > counterintuitive.
> > > > >
> > > > > The mempool cache flush threshold was introduced in DPDK
> version 1.3;
> > > > it was not in DPDK version 1.2. The copyright notice for
> rte_mempool.c
> > > > says year 2012.
> > > > >
> > > > >
> > > > > Here is my analysis:
> > > > >
> > > > > With the multiplier of 1.5, a mempool cache is allowed to be
> filled
> > > > up to 50 % above than its target size before its excess entries
> are
> > > > flushed to the mempool (thereby reducing the cache length to the
> target
> > > > size).
> > > > >
> > > > > In the opposite direction, a mempool cache is allowed to be
> drained
> > > > completely, i.e. up to 100 % below its target size.
> > > > >
> > > > > My instinct tells me that it would be more natural to let a
> mempool
> > > > cache go the same amount above and below its target size, i.e.
> using a
> > > > flush multiplier of 2 instead of 1.5.
> > > > >
> > > > > Also, the cache should be allowed to fill up to and including
> the
> > > > flush threshold, so it is flushed when the threshold is exceeded,
> > > > instead of when it is reached.
> > > > >
> > > > > Here is a simplified example:
> > > > >
> > > > > Imagine a cache target size of 32, corresponding to a typical
> packet
> > > > burst. With a flush threshold of 2 (and len > threshold instead
> of len
> > > > >= threshold), the cache could hold 1 +/-1 packet bursts. With
> the
> > > > current multiplier it can only hold [0 .. 1.5[ packet bursts, not
> > > > really providing a lot of elasticity.
> > > > >
> > > > Hi Morten,
> > > >
> > > > Interesting to see this being looked at again. The original idea
> of
> > > > adding
> > > > in some extra room above the requested value was to avoid the
> worst-
> > > > case
> > > > scenario of a pool oscillating between full and empty repeatedly
> due to
> > > > the
> > > > addition/removal of perhaps a single packet. As for why 1.5 was
> chosen
> > > > as
> > > > the value, I don't recall any particular reason for it myself.
> The main
> > > > objective was to have a separate flush and size values so that we
> could
> > > > go
> > > > a bit above full, and when flushing, not emptying the entire
> cache down
> > > > to
> > > > zero.
> > >
> > > Thanks for providing the historical background for this feature,
> Bruce.
> > >
> > > >
> > > > In terms of the behavioural points you make above, I wonder if
> symmetry
> > > > is
> > > > actually necessary or desirable in this case. After all, the
> ideal case
> > > > is
> > > > probably to keep the mempool neither full nor empty, so that both
> > > > allocations or frees can be done without having to go to the
> underlying
> > > > shared data structure. To accomodate this, the mempool will only
> flush
> > > > when
> > > > the number of elements is greater than size * 1.5, and then it
> only
> > > > flushes
> > > > elements down to size, ensuring that allocations can still take
> place.
> > > > On allocation, new buffers are taken when we don't have enough in
> the
> > > > cache
> > > > to fullfil the request, and then the cache is filled up to size,
> not to
> > > > the
> > > > flush threshold.
> > >
> > > I agree with the ideal case.
> > >
> > > However, it looks like the addition of the flush threshold also
> changed the "size" parameter to effectively become "desired length"
> instead. This interpretation is also supported by the flush algorithm,
> which doesn't flush below the "size", but to the "size". So based on
> interpretation, I was wondering why it is not symmetrical around the
> "desired length", but allowed to go 100 % below and only 50 % above.
> > >
> > > >
> > > > Now, for the scenario you describe - where the mempool cache size
> is
> > > > set to
> > > > be the same as the burst size, this scheme probably does break
> down, in
> > > > that we don't really have any burst elasticity. However, I would
> query
> > > > if
> > > > that is a configuration that is used, since to the user it should
> > > > appear
> > > > correctly to provide no elasticity. Looking at testpmd, and our
> example
> > > > apps, the standard there is a burst size of 32 and mempool cache
> of
> > > > ~256.
> > > > In OVS code, netdev-dpdk.c seems to initialize the mempool  with
> cache
> > > > size
> > > > of RTE_MEMPOOL_CACHE_MAX_SIZE (through define MP_CACHE_SZ). In
> all
> > > > these
> > > > cases, I think the 1.5 threshold should work just fine for us.
> > >
> > > My example was for demonstration only, and hopefully not being used
> by any applications.
> > >
> > > The simplified example was intended to demonstrate the theoretical
> effect of the unbalance in using the 1.5 threshold. It will be similar
> with a cache size of 256 objects: You will be allowed to go 8 bursts
> (calculation: 256 objects / 32 objects per burst) below the cache
> "size" without retrieving objects from the backing pool, but only 4
> bursts above the cache "size" before flushing objects to the backing
> pool.
> > >
> > > > That said,
> > > > if you want to bump it up to 2x, I can't say I'd object strongly
> as it
> > > > should be harmless, I think.
> > >
> > > From an academic viewpoint, 2x seems much better to me than 1.5x.
> And based on your background information from the introduction of the
> flush threshold, there was no academic reason why 1.5x was chosen -
> this is certainly valuable feedback.
> > >
> 
> Just to say I agree with Morten's analysis. Having a threshold at 2x
> size would
> make more sense to me.
> 
> Without the corner cases (big requests, or common pool empty), I would
> have suggested this, which looks more symetric than what we have today:
> 
> get()
>   fill request with as many objs as possible from cache
>   if cache is empty, request "size" more objects from common pool
>   fill the rest of the request from cache

I agree, and this is doable without a performance penalty.

> 
> put()
>   fill cache with provided objects up to 2x size
>   if cache exceeds threshold, fill common pool with "size" objects
>   fill the cache with the remaining objs from request

I tried implementing this already, but it cannot be done without a performance cost, one way or the other:

1. If we flush the top of the cache (which is a stack), we are flushing the hot objects, leaving cold objects at the new top of the cache.

2. If we flush the bottom of the cache, we need to move the remaining objects in the cache down. E.g. flushing objects at index [0..256[ in the cache means that the objects at index [256..len[ in the cache must be moved down to index[0..len-256[.

So I came to the conclusion that flushing the cache needs to flush it completely. I also consider this the best choice for real applications:

Either, an lcore thread in the application is in a state of balance, where it uses the mempool cache within its flush/refill boundaries - which makes the flush method less important.

Or, an lcore thread in the application is out of balance (either permanently or temporarily), and mostly gets or puts objects from/to the mempool. If it mostly puts objects, not flushing all of the objects will cause more frequent flushing. E.g.:

Cache size=256, flushthresh=512 (2x, not 1.5x), initial len=256; application burst len=32.

If flushing leaves "size" objects in the cache, the cache is flushed every 8th burst. And with flushthresh=384 (1.5x), flushing occurs every 4th burst, which is what the mempool does today!

If the cache is flushed completely, the cache is only flushed at every 16th burst.


And when/if the application thread breaks its pattern of continuously putting objects, and suddenly starts to get objects instead, it will either get objects already in the cache, or the get() function will refill the cache.

The concept of not flushing the cache completely is based on the assumption that it is more likely for an application's lcore thread to get() after flushing than to put() after flushing. I strongly disagree with this assumption! If an application thread is continuously putting so much that it overflows the cache, it is much more likely to keep putting than it is to start getting. This is CPU branch prediction 101.

> 
> But this is a quite sensitive area, and modifying these functions
> should
> be done with care, as it impacts mbuf allocation.

Certainly!

The current mempool caching algorithm is defective, and I'm trying to repair it. Honnappa already provided very valuable feedback regarding CPUs with small L1 cache, which must be taken carefully into consideration.

For now, there seems to be consensus regarding changing the flush threshold multiplier from 1.5 to 2.

> 
> > > I will consider providing a patch. Worst case it probably doesn't
> do any harm. And if anyone is really interested, they can look at the
> mempool debug stats to see the difference it makes for their specific
> application, e.g. the put_common_pool_bulk/put_bulk and
> get_common_pool_bulk/get_success_bulk ratios should decrease.
> >
> > Sure, thanks.
> >
> > /Bruce


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [RFC] mempool: modify flush threshold
  2022-01-24 15:56       ` Olivier Matz
  2022-01-28  8:40         ` Morten Brørup
@ 2022-01-28  8:52         ` Morten Brørup
  1 sibling, 0 replies; 7+ messages in thread
From: Morten Brørup @ 2022-01-28  8:52 UTC (permalink / raw)
  To: Olivier Matz, Jerin Jacob
  Cc: Bruce Richardson, Andrew Rybchenko, Honnappa Nagarahalli, dev

+Jerin, you had feedback regarding the mempool.

> From: Morten Brørup
> Sent: Friday, 28 January 2022 09.41
> 
> +Honnappa, you have shown a keen interest in this topic.
> 
> > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > Sent: Monday, 24 January 2022 16.57
> >
> > On Mon, Jan 10, 2022 at 09:40:48AM +0000, Bruce Richardson wrote:
> > > On Sat, Jan 08, 2022 at 12:00:17PM +0100, Morten Brørup wrote:
> > > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > > Sent: Friday, 7 January 2022 16.12
> > > > >
> > > > > On Tue, Dec 28, 2021 at 03:28:45PM +0100, Morten Brørup wrote:
> > > > > > Hi mempool maintainers and DPDK team.
> > > > > >
> > > > > > Does anyone know the reason or history why
> > > > > CACHE_FLUSHTHRESH_MULTIPLIER was chosen to be 1.5? I think it
> is
> > > > > counterintuitive.
> > > > > >
> > > > > > The mempool cache flush threshold was introduced in DPDK
> > version 1.3;
> > > > > it was not in DPDK version 1.2. The copyright notice for
> > rte_mempool.c
> > > > > says year 2012.
> > > > > >
> > > > > >
> > > > > > Here is my analysis:
> > > > > >
> > > > > > With the multiplier of 1.5, a mempool cache is allowed to be
> > filled
> > > > > up to 50 % above than its target size before its excess entries
> > are
> > > > > flushed to the mempool (thereby reducing the cache length to
> the
> > target
> > > > > size).
> > > > > >
> > > > > > In the opposite direction, a mempool cache is allowed to be
> > drained
> > > > > completely, i.e. up to 100 % below its target size.
> > > > > >
> > > > > > My instinct tells me that it would be more natural to let a
> > mempool
> > > > > cache go the same amount above and below its target size, i.e.
> > using a
> > > > > flush multiplier of 2 instead of 1.5.
> > > > > >
> > > > > > Also, the cache should be allowed to fill up to and including
> > the
> > > > > flush threshold, so it is flushed when the threshold is
> exceeded,
> > > > > instead of when it is reached.
> > > > > >
> > > > > > Here is a simplified example:
> > > > > >
> > > > > > Imagine a cache target size of 32, corresponding to a typical
> > packet
> > > > > burst. With a flush threshold of 2 (and len > threshold instead
> > of len
> > > > > >= threshold), the cache could hold 1 +/-1 packet bursts. With
> > the
> > > > > current multiplier it can only hold [0 .. 1.5[ packet bursts,
> not
> > > > > really providing a lot of elasticity.
> > > > > >
> > > > > Hi Morten,
> > > > >
> > > > > Interesting to see this being looked at again. The original
> idea
> > of
> > > > > adding
> > > > > in some extra room above the requested value was to avoid the
> > worst-
> > > > > case
> > > > > scenario of a pool oscillating between full and empty
> repeatedly
> > due to
> > > > > the
> > > > > addition/removal of perhaps a single packet. As for why 1.5 was
> > chosen
> > > > > as
> > > > > the value, I don't recall any particular reason for it myself.
> > The main
> > > > > objective was to have a separate flush and size values so that
> we
> > could
> > > > > go
> > > > > a bit above full, and when flushing, not emptying the entire
> > cache down
> > > > > to
> > > > > zero.
> > > >
> > > > Thanks for providing the historical background for this feature,
> > Bruce.
> > > >
> > > > >
> > > > > In terms of the behavioural points you make above, I wonder if
> > symmetry
> > > > > is
> > > > > actually necessary or desirable in this case. After all, the
> > ideal case
> > > > > is
> > > > > probably to keep the mempool neither full nor empty, so that
> both
> > > > > allocations or frees can be done without having to go to the
> > underlying
> > > > > shared data structure. To accomodate this, the mempool will
> only
> > flush
> > > > > when
> > > > > the number of elements is greater than size * 1.5, and then it
> > only
> > > > > flushes
> > > > > elements down to size, ensuring that allocations can still take
> > place.
> > > > > On allocation, new buffers are taken when we don't have enough
> in
> > the
> > > > > cache
> > > > > to fullfil the request, and then the cache is filled up to
> size,
> > not to
> > > > > the
> > > > > flush threshold.
> > > >
> > > > I agree with the ideal case.
> > > >
> > > > However, it looks like the addition of the flush threshold also
> > changed the "size" parameter to effectively become "desired length"
> > instead. This interpretation is also supported by the flush
> algorithm,
> > which doesn't flush below the "size", but to the "size". So based on
> > interpretation, I was wondering why it is not symmetrical around the
> > "desired length", but allowed to go 100 % below and only 50 % above.
> > > >
> > > > >
> > > > > Now, for the scenario you describe - where the mempool cache
> size
> > is
> > > > > set to
> > > > > be the same as the burst size, this scheme probably does break
> > down, in
> > > > > that we don't really have any burst elasticity. However, I
> would
> > query
> > > > > if
> > > > > that is a configuration that is used, since to the user it
> should
> > > > > appear
> > > > > correctly to provide no elasticity. Looking at testpmd, and our
> > example
> > > > > apps, the standard there is a burst size of 32 and mempool
> cache
> > of
> > > > > ~256.
> > > > > In OVS code, netdev-dpdk.c seems to initialize the mempool
> with
> > cache
> > > > > size
> > > > > of RTE_MEMPOOL_CACHE_MAX_SIZE (through define MP_CACHE_SZ). In
> > all
> > > > > these
> > > > > cases, I think the 1.5 threshold should work just fine for us.
> > > >
> > > > My example was for demonstration only, and hopefully not being
> used
> > by any applications.
> > > >
> > > > The simplified example was intended to demonstrate the
> theoretical
> > effect of the unbalance in using the 1.5 threshold. It will be
> similar
> > with a cache size of 256 objects: You will be allowed to go 8 bursts
> > (calculation: 256 objects / 32 objects per burst) below the cache
> > "size" without retrieving objects from the backing pool, but only 4
> > bursts above the cache "size" before flushing objects to the backing
> > pool.
> > > >
> > > > > That said,
> > > > > if you want to bump it up to 2x, I can't say I'd object
> strongly
> > as it
> > > > > should be harmless, I think.
> > > >
> > > > From an academic viewpoint, 2x seems much better to me than 1.5x.
> > And based on your background information from the introduction of the
> > flush threshold, there was no academic reason why 1.5x was chosen -
> > this is certainly valuable feedback.
> > > >
> >
> > Just to say I agree with Morten's analysis. Having a threshold at 2x
> > size would
> > make more sense to me.
> >
> > Without the corner cases (big requests, or common pool empty), I
> would
> > have suggested this, which looks more symetric than what we have
> today:
> >
> > get()
> >   fill request with as many objs as possible from cache
> >   if cache is empty, request "size" more objects from common pool
> >   fill the rest of the request from cache
> 
> I agree, and this is doable without a performance penalty.
> 
> >
> > put()
> >   fill cache with provided objects up to 2x size
> >   if cache exceeds threshold, fill common pool with "size" objects
> >   fill the cache with the remaining objs from request
> 
> I tried implementing this already, but it cannot be done without a
> performance cost, one way or the other:
> 
> 1. If we flush the top of the cache (which is a stack), we are flushing
> the hot objects, leaving cold objects at the new top of the cache.
> 
> 2. If we flush the bottom of the cache, we need to move the remaining
> objects in the cache down. E.g. flushing objects at index [0..256[ in
> the cache means that the objects at index [256..len[ in the cache must
> be moved down to index[0..len-256[.
> 
> So I came to the conclusion that flushing the cache needs to flush it
> completely. I also consider this the best choice for real applications:
> 
> Either, an lcore thread in the application is in a state of balance,
> where it uses the mempool cache within its flush/refill boundaries -
> which makes the flush method less important.
> 
> Or, an lcore thread in the application is out of balance (either
> permanently or temporarily), and mostly gets or puts objects from/to
> the mempool. If it mostly puts objects, not flushing all of the objects
> will cause more frequent flushing. E.g.:
> 
> Cache size=256, flushthresh=512 (2x, not 1.5x), initial len=256;
> application burst len=32.
> 
> If flushing leaves "size" objects in the cache, the cache is flushed
> every 8th burst. And with flushthresh=384 (1.5x), flushing occurs every
> 4th burst, which is what the mempool does today!
> 
> If the cache is flushed completely, the cache is only flushed at every
> 16th burst.
> 
> 
> And when/if the application thread breaks its pattern of continuously
> putting objects, and suddenly starts to get objects instead, it will
> either get objects already in the cache, or the get() function will
> refill the cache.
> 
> The concept of not flushing the cache completely is based on the
> assumption that it is more likely for an application's lcore thread to
> get() after flushing than to put() after flushing. I strongly disagree
> with this assumption! If an application thread is continuously putting
> so much that it overflows the cache, it is much more likely to keep
> putting than it is to start getting. This is CPU branch prediction 101.
> 
> >
> > But this is a quite sensitive area, and modifying these functions
> > should
> > be done with care, as it impacts mbuf allocation.
> 
> Certainly!
> 
> The current mempool caching algorithm is defective, and I'm trying to
> repair it. Honnappa already provided very valuable feedback regarding
> CPUs with small L1 cache, which must be taken carefully into
> consideration.

I'm sorry... The L1 cache feedback was from Jerin, not Honnappa.

> 
> For now, there seems to be consensus regarding changing the flush
> threshold multiplier from 1.5 to 2.
> 
> >
> > > > I will consider providing a patch. Worst case it probably doesn't
> > do any harm. And if anyone is really interested, they can look at the
> > mempool debug stats to see the difference it makes for their specific
> > application, e.g. the put_common_pool_bulk/put_bulk and
> > get_common_pool_bulk/get_success_bulk ratios should decrease.
> > >
> > > Sure, thanks.
> > >
> > > /Bruce


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-01-28  8:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-28 14:28 [RFC] mempool: modify flush threshold Morten Brørup
2022-01-07 15:12 ` Bruce Richardson
2022-01-08 11:00   ` Morten Brørup
2022-01-10  9:40     ` Bruce Richardson
2022-01-24 15:56       ` Olivier Matz
2022-01-28  8:40         ` Morten Brørup
2022-01-28  8:52         ` Morten Brørup

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).