DPDK patches and discussions
 help / color / mirror / Atom feed
* FW: [PATCH v4 3/3] mempool: use cache for frequently updated stats
@ 2022-11-08 11:25 Morten Brørup
  2022-11-08 13:32 ` Thomas Monjalon
  0 siblings, 1 reply; 10+ messages in thread
From: Morten Brørup @ 2022-11-08 11:25 UTC (permalink / raw)
  To: thomas, david.marchand; +Cc: dev

From: Morten Brørup 
Sent: Tuesday, 8 November 2022 12.22

> From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> Sent: Tuesday, 8 November 2022 10.20
> 
> > When built with stats enabled (RTE_LIBRTE_MEMPOOL_STATS defined), the
> > performance of mempools with caches is improved as follows.
> >
> > When accessing objects in the mempool, either the put_bulk and
> put_objs or
> > the get_success_bulk and get_success_objs statistics counters are
> likely
> > to be incremented.
> >
> > By adding an alternative set of these counters to the mempool cache
> > structure, accessing the dedicated statistics structure is avoided in
> the
> > likely cases where these counters are incremented.
> >
> > The trick here is that the cache line holding the mempool cache
> structure
> > is accessed anyway, in order to access the 'len' or 'flushthresh'
> fields.
> > Updating some statistics counters in the same cache line has lower
> > performance cost than accessing the statistics counters in the
> dedicated
> > statistics structure, which resides in another cache line.
> >
> > mempool_perf_autotest with this patch shows the following
> improvements in
> > rate_persec.
> >
> > The cost of enabling mempool stats (without debug) after this patch:
> > -6.8 % and -6.7 %, respectively without and with cache.
> >
> > v4:
> > * Fix checkpatch warnings:
> >   A couple of typos in the patch description.
> >   The macro to add to a mempool cache stat variable should not use
> >   do {} while (0). Personally, I would tend to disagree with this,
> but
> >   whatever keeps the CI happy.
> > v3:
> > * Don't update the description of the RTE_MEMPOOL_STAT_ADD macro.
> >   This change belongs in the first patch of the series.
> > v2:
> > * Move the statistics counters into a stats structure.
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > ---

[...]

> > +/**
> > + * @internal When stats is enabled, store some statistics.
> > + *
> > + * @param cache
> > + *   Pointer to the memory pool cache.
> > + * @param name
> > + *   Name of the statistics field to increment in the memory pool
> cache.
> > + * @param n
> > + *   Number to add to the statistics.
> > + */
> > +#ifdef RTE_LIBRTE_MEMPOOL_STATS
> > +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) (cache)->stats.name += n
> 
> As Andrew already pointed, it needs to be: ((cache)->stats.name += (n))
> Apart from that, LGTM.
> Series-Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>

@Thomas, this series should be ready to apply... it now has been:
Reviewed-by: (mempool maintainer) Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-By: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>

Please fix the RTE_MEMPOOL_CACHE_STAT_ADD macro while merging, to satisfy checkpatch. ;-)

It should be:

+#ifdef RTE_LIBRTE_MEMPOOL_STATS
+#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) ((cache)->stats.name += (n))
+#else
+#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) do {} while (0)
+#endif

@Thomas/@David: I changed the state of this patch series to Awaiting Upstream in patchwork. Is that helpful, or should I change them to some other state?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: FW: [PATCH v4 3/3] mempool: use cache for frequently updated stats
  2022-11-08 11:25 FW: [PATCH v4 3/3] mempool: use cache for frequently updated stats Morten Brørup
@ 2022-11-08 13:32 ` Thomas Monjalon
  2022-11-08 14:30   ` Morten Brørup
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Monjalon @ 2022-11-08 13:32 UTC (permalink / raw)
  To: Morten Brørup; +Cc: david.marchand, dev, andrew.rybchenko, olivier.matz

08/11/2022 12:25, Morten Brørup:
> From: Morten Brørup 
> Sent: Tuesday, 8 November 2022 12.22
> 
> > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> > Sent: Tuesday, 8 November 2022 10.20
> > 
> > > When built with stats enabled (RTE_LIBRTE_MEMPOOL_STATS defined), the
> > > performance of mempools with caches is improved as follows.
> > >
> > > When accessing objects in the mempool, either the put_bulk and
> > put_objs or
> > > the get_success_bulk and get_success_objs statistics counters are
> > likely
> > > to be incremented.
> > >
> > > By adding an alternative set of these counters to the mempool cache
> > > structure, accessing the dedicated statistics structure is avoided in
> > the
> > > likely cases where these counters are incremented.
> > >
> > > The trick here is that the cache line holding the mempool cache
> > structure
> > > is accessed anyway, in order to access the 'len' or 'flushthresh'
> > fields.
> > > Updating some statistics counters in the same cache line has lower
> > > performance cost than accessing the statistics counters in the
> > dedicated
> > > statistics structure, which resides in another cache line.
> > >
> > > mempool_perf_autotest with this patch shows the following
> > improvements in
> > > rate_persec.
> > >
> > > The cost of enabling mempool stats (without debug) after this patch:
> > > -6.8 % and -6.7 %, respectively without and with cache.
> > >
> > > v4:
> > > * Fix checkpatch warnings:
> > >   A couple of typos in the patch description.
> > >   The macro to add to a mempool cache stat variable should not use
> > >   do {} while (0). Personally, I would tend to disagree with this,
> > but
> > >   whatever keeps the CI happy.
> > > v3:
> > > * Don't update the description of the RTE_MEMPOOL_STAT_ADD macro.
> > >   This change belongs in the first patch of the series.
> > > v2:
> > > * Move the statistics counters into a stats structure.
> > >
> > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > ---
> 
> [...]
> 
> > > +/**
> > > + * @internal When stats is enabled, store some statistics.
> > > + *
> > > + * @param cache
> > > + *   Pointer to the memory pool cache.
> > > + * @param name
> > > + *   Name of the statistics field to increment in the memory pool
> > cache.
> > > + * @param n
> > > + *   Number to add to the statistics.
> > > + */
> > > +#ifdef RTE_LIBRTE_MEMPOOL_STATS
> > > +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) (cache)->stats.name += n
> > 
> > As Andrew already pointed, it needs to be: ((cache)->stats.name += (n))
> > Apart from that, LGTM.
> > Series-Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> 
> @Thomas, this series should be ready to apply... it now has been:
> Reviewed-by: (mempool maintainer) Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Reviewed-By: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>

Being acked does not mean it is good to apply in -rc3.
Please tell what is the benefit for 22.11 (before/after and condition).
Note there is a real risk doing such change that late.

> Please fix the RTE_MEMPOOL_CACHE_STAT_ADD macro while merging, to satisfy checkpatch. ;-)
> 
> It should be:
> 
> +#ifdef RTE_LIBRTE_MEMPOOL_STATS
> +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) ((cache)->stats.name += (n))
> +#else
> +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) do {} while (0)
> +#endif

Would be easier if you fix it.

> @Thomas/@David: I changed the state of this patch series to Awaiting Upstream in patchwork. Is that helpful, or should I change them to some other state?

You should keep it as "New".



^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: FW: [PATCH v4 3/3] mempool: use cache for frequently updated stats
  2022-11-08 13:32 ` Thomas Monjalon
@ 2022-11-08 14:30   ` Morten Brørup
  2022-11-08 15:51     ` Thomas Monjalon
  0 siblings, 1 reply; 10+ messages in thread
From: Morten Brørup @ 2022-11-08 14:30 UTC (permalink / raw)
  To: Thomas Monjalon, andrew.rybchenko, olivier.matz
  Cc: david.marchand, dev, hofors, dev, Konstantin Ananyev,
	mattias.ronnblom, stephen, jerinj, bruce.richardson

> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Tuesday, 8 November 2022 14.32
> 
> 08/11/2022 12:25, Morten Brørup:
> > From: Morten Brørup
> > Sent: Tuesday, 8 November 2022 12.22
> >
> > > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> > > Sent: Tuesday, 8 November 2022 10.20
> > >
> > > > When built with stats enabled (RTE_LIBRTE_MEMPOOL_STATS defined),
> the
> > > > performance of mempools with caches is improved as follows.
> > > >
> > > > When accessing objects in the mempool, either the put_bulk and
> > > put_objs or
> > > > the get_success_bulk and get_success_objs statistics counters are
> > > likely
> > > > to be incremented.
> > > >
> > > > By adding an alternative set of these counters to the mempool
> cache
> > > > structure, accessing the dedicated statistics structure is
> avoided in
> > > the
> > > > likely cases where these counters are incremented.
> > > >
> > > > The trick here is that the cache line holding the mempool cache
> > > structure
> > > > is accessed anyway, in order to access the 'len' or 'flushthresh'
> > > fields.
> > > > Updating some statistics counters in the same cache line has
> lower
> > > > performance cost than accessing the statistics counters in the
> > > dedicated
> > > > statistics structure, which resides in another cache line.
> > > >
> > > > mempool_perf_autotest with this patch shows the following
> > > improvements in
> > > > rate_persec.
> > > >
> > > > The cost of enabling mempool stats (without debug) after this
> patch:
> > > > -6.8 % and -6.7 %, respectively without and with cache.
> > > >
> > > > v4:
> > > > * Fix checkpatch warnings:
> > > >   A couple of typos in the patch description.
> > > >   The macro to add to a mempool cache stat variable should not
> use
> > > >   do {} while (0). Personally, I would tend to disagree with
> this,
> > > but
> > > >   whatever keeps the CI happy.
> > > > v3:
> > > > * Don't update the description of the RTE_MEMPOOL_STAT_ADD macro.
> > > >   This change belongs in the first patch of the series.
> > > > v2:
> > > > * Move the statistics counters into a stats structure.
> > > >
> > > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > > ---
> >
> > [...]
> >
> > > > +/**
> > > > + * @internal When stats is enabled, store some statistics.
> > > > + *
> > > > + * @param cache
> > > > + *   Pointer to the memory pool cache.
> > > > + * @param name
> > > > + *   Name of the statistics field to increment in the memory
> pool
> > > cache.
> > > > + * @param n
> > > > + *   Number to add to the statistics.
> > > > + */
> > > > +#ifdef RTE_LIBRTE_MEMPOOL_STATS
> > > > +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) (cache)-
> >stats.name += n
> > >
> > > As Andrew already pointed, it needs to be: ((cache)->stats.name +=
> (n))
> > > Apart from that, LGTM.
> > > Series-Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> >
> > @Thomas, this series should be ready to apply... it now has been:
> > Reviewed-by: (mempool maintainer) Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> > Reviewed-By: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> > Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> 
> Being acked does not mean it is good to apply in -rc3.

I understand that the RFC/v1 of this series was formally too late to make it in 22.11, so I will not complain loudly if you choose to omit it for 22.11.

With two independent reviews, including from a mempool maintainer, I still have some hope. Also considering the risk assessment below. ;-)

> Please tell what is the benefit for 22.11 (before/after and condition).

Short version: With this series, mempool statistics can be used in production. Without it, the performance cost (mempool_perf_autotest: -74 %) is prohibitive!

Long version:

The patch series provides significantly higher performance for mempool statistics, which are readable through rte_mempool_dump(FILE *f, struct rte_mempool *mp).

Without this series, you have to set RTE_LIBRTE_MEMPOOL_DEBUG at build time to get mempool statistics. RTE_LIBRTE_MEMPOOL_DEBUG also enables protective cookies before and after each mempool object, which are all verified on get/put from the mempool. According to mempool_perf_autotest, the performance cost of mempool statistics (by setting RTE_LIBRTE_MEMPOOL_DEBUG) is a 74 % decrease in rate_persec for mempools with cache (i.e. mbuf pools). Prohibitive for use in production!

With this series, the performance cost of mempool statistics (by setting RTE_LIBRTE_MEMPOOL_STATS) in mempool_perf_autotest is only 6.7 %, so mempool statistics can be used in production.

> Note there is a real risk doing such change that late.

Risk assessment:

The patch series has zero effect unless either RTE_LIBRTE_MEMPOOL_DEBUG or RTE_LIBRTE_MEMPOOL_STATS are set when building. They are not set in the default build.

> 
> > Please fix the RTE_MEMPOOL_CACHE_STAT_ADD macro while merging, to
> satisfy checkpatch. ;-)
> >
> > It should be:
> >
> > +#ifdef RTE_LIBRTE_MEMPOOL_STATS
> > +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) ((cache)-
> >stats.name += (n))
> > +#else
> > +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) do {} while (0)
> > +#endif
> 
> Would be easier if you fix it.

I will send a v5 of the series.

> 
> > @Thomas/@David: I changed the state of this patch series to Awaiting
> Upstream in patchwork. Is that helpful, or should I change them to some
> other state?
> 
> You should keep it as "New".

OK. Thank you.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: FW: [PATCH v4 3/3] mempool: use cache for frequently updated stats
  2022-11-08 14:30   ` Morten Brørup
@ 2022-11-08 15:51     ` Thomas Monjalon
  2022-11-08 15:59       ` Bruce Richardson
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Monjalon @ 2022-11-08 15:51 UTC (permalink / raw)
  To: Morten Brørup
  Cc: andrew.rybchenko, olivier.matz, david.marchand, dev, hofors, dev,
	Konstantin Ananyev, mattias.ronnblom, stephen, jerinj,
	bruce.richardson

08/11/2022 15:30, Morten Brørup:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > 08/11/2022 12:25, Morten Brørup:
> > > From: Morten Brørup
> > > > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> > > > Sent: Tuesday, 8 November 2022 10.20
> > > > > +#ifdef RTE_LIBRTE_MEMPOOL_STATS
> > > > > +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) (cache)-
> > >stats.name += n
> > > >
> > > > As Andrew already pointed, it needs to be: ((cache)->stats.name +=
> > (n))
> > > > Apart from that, LGTM.
> > > > Series-Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> > >
> > > @Thomas, this series should be ready to apply... it now has been:
> > > Reviewed-by: (mempool maintainer) Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>
> > > Reviewed-By: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> > > Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> > 
> > Being acked does not mean it is good to apply in -rc3.
> 
> I understand that the RFC/v1 of this series was formally too late to make it in 22.11, so I will not complain loudly if you choose to omit it for 22.11.
> 
> With two independent reviews, including from a mempool maintainer, I still have some hope. Also considering the risk assessment below. ;-)
> 
> > Please tell what is the benefit for 22.11 (before/after and condition).
> 
> Short version: With this series, mempool statistics can be used in production. Without it, the performance cost (mempool_perf_autotest: -74 %) is prohibitive!
> 
> Long version:
> 
> The patch series provides significantly higher performance for mempool statistics, which are readable through rte_mempool_dump(FILE *f, struct rte_mempool *mp).
> 
> Without this series, you have to set RTE_LIBRTE_MEMPOOL_DEBUG at build time to get mempool statistics. RTE_LIBRTE_MEMPOOL_DEBUG also enables protective cookies before and after each mempool object, which are all verified on get/put from the mempool. According to mempool_perf_autotest, the performance cost of mempool statistics (by setting RTE_LIBRTE_MEMPOOL_DEBUG) is a 74 % decrease in rate_persec for mempools with cache (i.e. mbuf pools). Prohibitive for use in production!
> 
> With this series, the performance cost of mempool statistics (by setting RTE_LIBRTE_MEMPOOL_STATS) in mempool_perf_autotest is only 6.7 %, so mempool statistics can be used in production.
> 
> > Note there is a real risk doing such change that late.
> 
> Risk assessment:
> 
> The patch series has zero effect unless either RTE_LIBRTE_MEMPOOL_DEBUG or RTE_LIBRTE_MEMPOOL_STATS are set when building. They are not set in the default build.

If theses build flags are not set, there is no risk and no benefit.
But if they are set, there is a risk of regression,
for the benefit of an increased performance of a debug feature.
I would say it is better to avoid any functional regression in a debug feature
at this stage.
Any other opinion?




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: FW: [PATCH v4 3/3] mempool: use cache for frequently updated stats
  2022-11-08 15:51     ` Thomas Monjalon
@ 2022-11-08 15:59       ` Bruce Richardson
  2022-11-08 17:38         ` Konstantin Ananyev
  0 siblings, 1 reply; 10+ messages in thread
From: Bruce Richardson @ 2022-11-08 15:59 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Morten Brørup, andrew.rybchenko, olivier.matz,
	david.marchand, dev, hofors, Konstantin Ananyev,
	mattias.ronnblom, stephen, jerinj

On Tue, Nov 08, 2022 at 04:51:11PM +0100, Thomas Monjalon wrote:
> 08/11/2022 15:30, Morten Brørup:
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > 08/11/2022 12:25, Morten Brørup:
> > > > From: Morten Brørup
> > > > > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> > > > > Sent: Tuesday, 8 November 2022 10.20
> > > > > > +#ifdef RTE_LIBRTE_MEMPOOL_STATS
> > > > > > +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) (cache)-
> > > >stats.name += n
> > > > >
> > > > > As Andrew already pointed, it needs to be: ((cache)->stats.name +=
> > > (n))
> > > > > Apart from that, LGTM.
> > > > > Series-Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> > > >
> > > > @Thomas, this series should be ready to apply... it now has been:
> > > > Reviewed-by: (mempool maintainer) Andrew Rybchenko
> > > <andrew.rybchenko@oktetlabs.ru>
> > > > Reviewed-By: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> > > > Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> > > 
> > > Being acked does not mean it is good to apply in -rc3.
> > 
> > I understand that the RFC/v1 of this series was formally too late to make it in 22.11, so I will not complain loudly if you choose to omit it for 22.11.
> > 
> > With two independent reviews, including from a mempool maintainer, I still have some hope. Also considering the risk assessment below. ;-)
> > 
> > > Please tell what is the benefit for 22.11 (before/after and condition).
> > 
> > Short version: With this series, mempool statistics can be used in production. Without it, the performance cost (mempool_perf_autotest: -74 %) is prohibitive!
> > 
> > Long version:
> > 
> > The patch series provides significantly higher performance for mempool statistics, which are readable through rte_mempool_dump(FILE *f, struct rte_mempool *mp).
> > 
> > Without this series, you have to set RTE_LIBRTE_MEMPOOL_DEBUG at build time to get mempool statistics. RTE_LIBRTE_MEMPOOL_DEBUG also enables protective cookies before and after each mempool object, which are all verified on get/put from the mempool. According to mempool_perf_autotest, the performance cost of mempool statistics (by setting RTE_LIBRTE_MEMPOOL_DEBUG) is a 74 % decrease in rate_persec for mempools with cache (i.e. mbuf pools). Prohibitive for use in production!
> > 
> > With this series, the performance cost of mempool statistics (by setting RTE_LIBRTE_MEMPOOL_STATS) in mempool_perf_autotest is only 6.7 %, so mempool statistics can be used in production.
> > 
> > > Note there is a real risk doing such change that late.
> > 
> > Risk assessment:
> > 
> > The patch series has zero effect unless either RTE_LIBRTE_MEMPOOL_DEBUG or RTE_LIBRTE_MEMPOOL_STATS are set when building. They are not set in the default build.
> 
> If theses build flags are not set, there is no risk and no benefit.
> But if they are set, there is a risk of regression,
> for the benefit of an increased performance of a debug feature.
> I would say it is better to avoid any functional regression in a debug feature
> at this stage.
> Any other opinion?
> 
While I agree that we should avoid any functional regression, I wonder how
widely used the debug feature is, and how big the risk of a regression is?
Even if there is one, having a regression in a debug feature is a lot less
serious than having one in something which goes into production.

/Bruce

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: FW: [PATCH v4 3/3] mempool: use cache for frequently updated stats
  2022-11-08 15:59       ` Bruce Richardson
@ 2022-11-08 17:38         ` Konstantin Ananyev
  2022-11-09  5:03           ` Morten Brørup
  0 siblings, 1 reply; 10+ messages in thread
From: Konstantin Ananyev @ 2022-11-08 17:38 UTC (permalink / raw)
  To: Bruce Richardson, Thomas Monjalon
  Cc: Morten Brørup, andrew.rybchenko, olivier.matz,
	david.marchand, dev, hofors, mattias.ronnblom, stephen, jerinj


> 
> On Tue, Nov 08, 2022 at 04:51:11PM +0100, Thomas Monjalon wrote:
> > 08/11/2022 15:30, Morten Brørup:
> > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > 08/11/2022 12:25, Morten Brørup:
> > > > > From: Morten Brørup
> > > > > > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> > > > > > Sent: Tuesday, 8 November 2022 10.20
> > > > > > > +#ifdef RTE_LIBRTE_MEMPOOL_STATS
> > > > > > > +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) (cache)-
> > > > >stats.name += n
> > > > > >
> > > > > > As Andrew already pointed, it needs to be: ((cache)->stats.name +=
> > > > (n))
> > > > > > Apart from that, LGTM.
> > > > > > Series-Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> > > > >
> > > > > @Thomas, this series should be ready to apply... it now has been:
> > > > > Reviewed-by: (mempool maintainer) Andrew Rybchenko
> > > > <andrew.rybchenko@oktetlabs.ru>
> > > > > Reviewed-By: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> > > > > Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> > > >
> > > > Being acked does not mean it is good to apply in -rc3.
> > >
> > > I understand that the RFC/v1 of this series was formally too late to make it in 22.11, so I will not complain loudly if you choose to
> omit it for 22.11.
> > >
> > > With two independent reviews, including from a mempool maintainer, I still have some hope. Also considering the risk assessment
> below. ;-)
> > >
> > > > Please tell what is the benefit for 22.11 (before/after and condition).
> > >
> > > Short version: With this series, mempool statistics can be used in production. Without it, the performance cost
> (mempool_perf_autotest: -74 %) is prohibitive!
> > >
> > > Long version:
> > >
> > > The patch series provides significantly higher performance for mempool statistics, which are readable through
> rte_mempool_dump(FILE *f, struct rte_mempool *mp).
> > >
> > > Without this series, you have to set RTE_LIBRTE_MEMPOOL_DEBUG at build time to get mempool statistics.
> RTE_LIBRTE_MEMPOOL_DEBUG also enables protective cookies before and after each mempool object, which are all verified on
> get/put from the mempool. According to mempool_perf_autotest, the performance cost of mempool statistics (by setting
> RTE_LIBRTE_MEMPOOL_DEBUG) is a 74 % decrease in rate_persec for mempools with cache (i.e. mbuf pools). Prohibitive for use in
> production!
> > >
> > > With this series, the performance cost of mempool statistics (by setting RTE_LIBRTE_MEMPOOL_STATS) in
> mempool_perf_autotest is only 6.7 %, so mempool statistics can be used in production.
> > >
> > > > Note there is a real risk doing such change that late.
> > >
> > > Risk assessment:
> > >
> > > The patch series has zero effect unless either RTE_LIBRTE_MEMPOOL_DEBUG or RTE_LIBRTE_MEMPOOL_STATS are set when
> building. They are not set in the default build.
> >
> > If theses build flags are not set, there is no risk and no benefit.
> > But if they are set, there is a risk of regression,
> > for the benefit of an increased performance of a debug feature.
> > I would say it is better to avoid any functional regression in a debug feature
> > at this stage.
> > Any other opinion?
> >
> While I agree that we should avoid any functional regression, I wonder how
> widely used the debug feature is, and how big the risk of a regression is?
> Even if there is one, having a regression in a debug feature is a lot less
> serious than having one in something which goes into production.
> 

Unless it introduces an ABI breakage (as I understand it doesn't), I'll wait till 23.03.
Just in case.
BTW, as a side thought - if the impact is really that small now, would it make sense to make
it run-time option, instead of compile-time one?
Konstantin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: FW: [PATCH v4 3/3] mempool: use cache for frequently updated stats
  2022-11-08 17:38         ` Konstantin Ananyev
@ 2022-11-09  5:03           ` Morten Brørup
  2022-11-09  8:21             ` Mattias Rönnblom
  0 siblings, 1 reply; 10+ messages in thread
From: Morten Brørup @ 2022-11-09  5:03 UTC (permalink / raw)
  To: Konstantin Ananyev, Bruce Richardson, Thomas Monjalon
  Cc: andrew.rybchenko, olivier.matz, david.marchand, dev, hofors,
	mattias.ronnblom, stephen, jerinj

> From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> Sent: Tuesday, 8 November 2022 18.38
> >
> > On Tue, Nov 08, 2022 at 04:51:11PM +0100, Thomas Monjalon wrote:
> > > 08/11/2022 15:30, Morten Brørup:
> > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > 08/11/2022 12:25, Morten Brørup:
> > > > > > From: Morten Brørup
> > > > > > > From: Konstantin Ananyev
> [mailto:konstantin.ananyev@huawei.com]
> > > > > > > Sent: Tuesday, 8 November 2022 10.20
> > > > > > > > +#ifdef RTE_LIBRTE_MEMPOOL_STATS
> > > > > > > > +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n)
> (cache)-
> > > > > >stats.name += n
> > > > > > >
> > > > > > > As Andrew already pointed, it needs to be: ((cache)-
> >stats.name +=
> > > > > (n))
> > > > > > > Apart from that, LGTM.
> > > > > > > Series-Acked-by: Konstantin Ananyev
> <konstantin.ananyev@huawei.com>
> > > > > >
> > > > > > @Thomas, this series should be ready to apply... it now has
> been:
> > > > > > Reviewed-by: (mempool maintainer) Andrew Rybchenko
> > > > > <andrew.rybchenko@oktetlabs.ru>
> > > > > > Reviewed-By: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> > > > > > Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> > > > >
> > > > > Being acked does not mean it is good to apply in -rc3.
> > > >
> > > > I understand that the RFC/v1 of this series was formally too late
> to make it in 22.11, so I will not complain loudly if you choose to
> > omit it for 22.11.
> > > >
> > > > With two independent reviews, including from a mempool
> maintainer, I still have some hope. Also considering the risk
> assessment
> > below. ;-)
> > > >
> > > > > Please tell what is the benefit for 22.11 (before/after and
> condition).
> > > >
> > > > Short version: With this series, mempool statistics can be used
> in production. Without it, the performance cost
> > (mempool_perf_autotest: -74 %) is prohibitive!
> > > >
> > > > Long version:
> > > >
> > > > The patch series provides significantly higher performance for
> mempool statistics, which are readable through
> > rte_mempool_dump(FILE *f, struct rte_mempool *mp).
> > > >
> > > > Without this series, you have to set RTE_LIBRTE_MEMPOOL_DEBUG at
> build time to get mempool statistics.
> > RTE_LIBRTE_MEMPOOL_DEBUG also enables protective cookies before and
> after each mempool object, which are all verified on
> > get/put from the mempool. According to mempool_perf_autotest, the
> performance cost of mempool statistics (by setting
> > RTE_LIBRTE_MEMPOOL_DEBUG) is a 74 % decrease in rate_persec for
> mempools with cache (i.e. mbuf pools). Prohibitive for use in
> > production!
> > > >
> > > > With this series, the performance cost of mempool statistics (by
> setting RTE_LIBRTE_MEMPOOL_STATS) in
> > mempool_perf_autotest is only 6.7 %, so mempool statistics can be
> used in production.
> > > >
> > > > > Note there is a real risk doing such change that late.
> > > >
> > > > Risk assessment:
> > > >
> > > > The patch series has zero effect unless either
> RTE_LIBRTE_MEMPOOL_DEBUG or RTE_LIBRTE_MEMPOOL_STATS are set when
> > building. They are not set in the default build.
> > >
> > > If theses build flags are not set, there is no risk and no benefit.
> > > But if they are set, there is a risk of regression,
> > > for the benefit of an increased performance of a debug feature.
> > > I would say it is better to avoid any functional regression in a
> debug feature
> > > at this stage.
> > > Any other opinion?
> > >
> > While I agree that we should avoid any functional regression, I
> wonder how
> > widely used the debug feature is, and how big the risk of a
> regression is?
> > Even if there is one, having a regression in a debug feature is a lot
> less
> > serious than having one in something which goes into production.
> >
> 
> Unless it introduces an ABI breakage (as I understand it doesn't), I'll
> wait till 23.03.
> Just in case.

If built (both before and after this series) without RTE_LIBRTE_MEMPOOL_DEBUG (and without RTE_LIBRTE_MEMPOOL_STATS, which is introduced by the series), there is no ABI breakage.

If built (both before and after this series) with RTE_LIBRTE_MEMPOOL_DEBUG (and without RTE_LIBRTE_MEMPOOL_STATS), the ABI differs between before and after this series: The stats array disappears from struct rte_mempool, and the output from rte_mempool_dump() does not include the statistics.

If built (both before and after this series) with RTE_LIBRTE_MEMPOOL_DEBUG (and with RTE_LIBRTE_MEMPOOL_STATS), the ABI also differs between before and after this series: The size of the stats array in struct rte_mempool grows by one element.

> BTW, as a side thought - if the impact is really that small now, would
> it make sense to make
> it run-time option, instead of compile-time one?

The mempool get/put functions are very lean when built without STATS or DEBUG. With a runtime option, the resulting code would be slightly longer, and only one additional conditional would be hit in the common case (i.e. when the objects don't miss the mempool cache). So with stats disabled (at runtime), it would only add a very small performance cost. However, checking the value of the enabled/disabled variable can cause a CPU cache miss, which has a performance cost. And the enabled/disabled variable should definitely be global - if it is per mempool, it will cause many CPU cache misses (because the common case doesn't touch the mempool structure, only the mempool cache structure).

Also, checking the runtime option should have unlikely(), so the performance cost of the stats (when enabled at runtime) is also higher than with a build time option. (Yes, dynamic branch prediction will alleviate most of this, but it will consume entries in the branch predictor table - these are inlined functions. Just like we always try to avoid cache misses in DPDK, we should also try to conserve branch predictor table entries. I hate the argument that branch prediction fixes conditionals, especially if they are weird or could have been avoided.)

In the cost/benefit analysis, we need to consider that these statistics are not fill/emptiness level status or similar, but only debug counters (number of get/put transactions and objects), so we need to ask ourselves this question: How many users are interested in these statistics for production and are unable to build their application with RTE_LIBRTE_MEMPOOL_STATS?

For example, we (SmartShare Systems) are only interested in them for application profiling purposes... trying to improve the performance by striving for a higher number of objects per burst in every pipeline stage.

> Konstantin


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: FW: [PATCH v4 3/3] mempool: use cache for frequently updated stats
  2022-11-09  5:03           ` Morten Brørup
@ 2022-11-09  8:21             ` Mattias Rönnblom
  2022-11-09 10:19               ` Konstantin Ananyev
  0 siblings, 1 reply; 10+ messages in thread
From: Mattias Rönnblom @ 2022-11-09  8:21 UTC (permalink / raw)
  To: Morten Brørup, Konstantin Ananyev, Bruce Richardson,
	Thomas Monjalon
  Cc: andrew.rybchenko, olivier.matz, david.marchand, dev, hofors,
	stephen, jerinj

On 2022-11-09 06:03, Morten Brørup wrote:
>> From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
>> Sent: Tuesday, 8 November 2022 18.38
>>>
>>> On Tue, Nov 08, 2022 at 04:51:11PM +0100, Thomas Monjalon wrote:
>>>> 08/11/2022 15:30, Morten Brørup:
>>>>>> From: Thomas Monjalon [mailto:thomas@monjalon.net]
>>>>>> 08/11/2022 12:25, Morten Brørup:
>>>>>>> From: Morten Brørup
>>>>>>>> From: Konstantin Ananyev
>> [mailto:konstantin.ananyev@huawei.com]
>>>>>>>> Sent: Tuesday, 8 November 2022 10.20
>>>>>>>>> +#ifdef RTE_LIBRTE_MEMPOOL_STATS
>>>>>>>>> +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n)
>> (cache)-
>>>>>>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-454445555731-27a859e7ec13035a&q=1&e=a120e28e-caa7-4783-9686-5868c871553d&u=http%3A%2F%2Fstats.name%2F += n
>>>>>>>>
>>>>>>>> As Andrew already pointed, it needs to be: ((cache)-
>>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-454445555731-27a859e7ec13035a&q=1&e=a120e28e-caa7-4783-9686-5868c871553d&u=http%3A%2F%2Fstats.name%2F +=
>>>>>> (n))
>>>>>>>> Apart from that, LGTM.
>>>>>>>> Series-Acked-by: Konstantin Ananyev
>> <konstantin.ananyev@huawei.com>
>>>>>>>
>>>>>>> @Thomas, this series should be ready to apply... it now has
>> been:
>>>>>>> Reviewed-by: (mempool maintainer) Andrew Rybchenko
>>>>>> <andrew.rybchenko@oktetlabs.ru>
>>>>>>> Reviewed-By: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>>>>> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
>>>>>>
>>>>>> Being acked does not mean it is good to apply in -rc3.
>>>>>
>>>>> I understand that the RFC/v1 of this series was formally too late
>> to make it in 22.11, so I will not complain loudly if you choose to
>>> omit it for 22.11.
>>>>>
>>>>> With two independent reviews, including from a mempool
>> maintainer, I still have some hope. Also considering the risk
>> assessment
>>> below. ;-)
>>>>>
>>>>>> Please tell what is the benefit for 22.11 (before/after and
>> condition).
>>>>>
>>>>> Short version: With this series, mempool statistics can be used
>> in production. Without it, the performance cost
>>> (mempool_perf_autotest: -74 %) is prohibitive!
>>>>>
>>>>> Long version:
>>>>>
>>>>> The patch series provides significantly higher performance for
>> mempool statistics, which are readable through
>>> rte_mempool_dump(FILE *f, struct rte_mempool *mp).
>>>>>
>>>>> Without this series, you have to set RTE_LIBRTE_MEMPOOL_DEBUG at
>> build time to get mempool statistics.
>>> RTE_LIBRTE_MEMPOOL_DEBUG also enables protective cookies before and
>> after each mempool object, which are all verified on
>>> get/put from the mempool. According to mempool_perf_autotest, the
>> performance cost of mempool statistics (by setting
>>> RTE_LIBRTE_MEMPOOL_DEBUG) is a 74 % decrease in rate_persec for
>> mempools with cache (i.e. mbuf pools). Prohibitive for use in
>>> production!
>>>>>
>>>>> With this series, the performance cost of mempool statistics (by
>> setting RTE_LIBRTE_MEMPOOL_STATS) in
>>> mempool_perf_autotest is only 6.7 %, so mempool statistics can be
>> used in production.
>>>>>
>>>>>> Note there is a real risk doing such change that late.
>>>>>
>>>>> Risk assessment:
>>>>>
>>>>> The patch series has zero effect unless either
>> RTE_LIBRTE_MEMPOOL_DEBUG or RTE_LIBRTE_MEMPOOL_STATS are set when
>>> building. They are not set in the default build.
>>>>
>>>> If theses build flags are not set, there is no risk and no benefit.
>>>> But if they are set, there is a risk of regression,
>>>> for the benefit of an increased performance of a debug feature.
>>>> I would say it is better to avoid any functional regression in a
>> debug feature
>>>> at this stage.
>>>> Any other opinion?
>>>>
>>> While I agree that we should avoid any functional regression, I
>> wonder how
>>> widely used the debug feature is, and how big the risk of a
>> regression is?
>>> Even if there is one, having a regression in a debug feature is a lot
>> less
>>> serious than having one in something which goes into production.
>>>
>>
>> Unless it introduces an ABI breakage (as I understand it doesn't), I'll
>> wait till 23.03.
>> Just in case.
> 
> If built (both before and after this series) without RTE_LIBRTE_MEMPOOL_DEBUG (and without RTE_LIBRTE_MEMPOOL_STATS, which is introduced by the series), there is no ABI breakage.
> 
> If built (both before and after this series) with RTE_LIBRTE_MEMPOOL_DEBUG (and without RTE_LIBRTE_MEMPOOL_STATS), the ABI differs between before and after this series: The stats array disappears from struct rte_mempool, and the output from rte_mempool_dump() does not include the statistics.
> 
> If built (both before and after this series) with RTE_LIBRTE_MEMPOOL_DEBUG (and with RTE_LIBRTE_MEMPOOL_STATS), the ABI also differs between before and after this series: The size of the stats array in struct rte_mempool grows by one element.
> 
>> BTW, as a side thought - if the impact is really that small now, would
>> it make sense to make
>> it run-time option, instead of compile-time one?
> 
> The mempool get/put functions are very lean when built without STATS or DEBUG. With a runtime option, the resulting code would be slightly longer, and only one additional conditional would be hit in the common case (i.e. when the objects don't miss the mempool cache). So with stats disabled (at runtime), it would only add a very small performance cost. However, checking the value of the enabled/disabled variable can cause a CPU cache miss, which has a performance cost. And the enabled/disabled variable should definitely be global - if it is per mempool, it will cause many CPU cache misses (because the common case doesn't touch the mempool structure, only the mempool cache structure).
> 

It's not totally obvious that a conditional is better than just always 
performing these simple arithmetic operations, even if you don't know if 
you need the result (i.e., if stats is enabled or not), especially since 
they operate on a cache line that is very likely already owned by the 
core running the core (since the 'len' fields is frequently used).

> Also, checking the runtime option should have unlikely(), so the performance cost of the stats (when enabled at runtime) is also higher than with a build time option. (Yes, dynamic branch prediction will alleviate most of this, but it will consume entries in the branch predictor table - these are inlined functions. Just like we always try to avoid cache misses in DPDK, we should also try to conserve branch predictor table entries. I hate the argument that branch prediction fixes conditionals, especially if they are weird or could have been avoided.)
> 
> In the cost/benefit analysis, we need to consider that these statistics are not fill/emptiness level status or similar, but only debug counters (number of get/put transactions and objects), so we need to ask ourselves this question: How many users are interested in these statistics for production and are unable to build their application with RTE_LIBRTE_MEMPOOL_STATS?
> 
> For example, we (SmartShare Systems) are only interested in them for application profiling purposes... trying to improve the performance by striving for a higher number of objects per burst in every pipeline stage.
> 
>> Konstantin


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: FW: [PATCH v4 3/3] mempool: use cache for frequently updated stats
  2022-11-09  8:21             ` Mattias Rönnblom
@ 2022-11-09 10:19               ` Konstantin Ananyev
  2022-11-09 11:42                 ` Morten Brørup
  0 siblings, 1 reply; 10+ messages in thread
From: Konstantin Ananyev @ 2022-11-09 10:19 UTC (permalink / raw)
  To: Mattias Rönnblom, Morten Brørup, Bruce Richardson,
	Thomas Monjalon
  Cc: andrew.rybchenko, olivier.matz, david.marchand, dev, hofors,
	stephen, jerinj


> On 2022-11-09 06:03, Morten Brørup wrote:
> >> From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> >> Sent: Tuesday, 8 November 2022 18.38
> >>>
> >>> On Tue, Nov 08, 2022 at 04:51:11PM +0100, Thomas Monjalon wrote:
> >>>> 08/11/2022 15:30, Morten Brørup:
> >>>>>> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> >>>>>> 08/11/2022 12:25, Morten Brørup:
> >>>>>>> From: Morten Brørup
> >>>>>>>> From: Konstantin Ananyev
> >> [mailto:konstantin.ananyev@huawei.com]
> >>>>>>>> Sent: Tuesday, 8 November 2022 10.20
> >>>>>>>>> +#ifdef RTE_LIBRTE_MEMPOOL_STATS
> >>>>>>>>> +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n)
> >> (cache)-
> >>>>>>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-454445555731-27a859e7ec13035a&q=1&e=a120e28e-
> caa7-4783-9686-5868c871553d&u=http%3A%2F%2Fstats.name%2F += n
> >>>>>>>>
> >>>>>>>> As Andrew already pointed, it needs to be: ((cache)-
> >>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-454445555731-27a859e7ec13035a&q=1&e=a120e28e-caa7-
> 4783-9686-5868c871553d&u=http%3A%2F%2Fstats.name%2F +=
> >>>>>> (n))
> >>>>>>>> Apart from that, LGTM.
> >>>>>>>> Series-Acked-by: Konstantin Ananyev
> >> <konstantin.ananyev@huawei.com>
> >>>>>>>
> >>>>>>> @Thomas, this series should be ready to apply... it now has
> >> been:
> >>>>>>> Reviewed-by: (mempool maintainer) Andrew Rybchenko
> >>>>>> <andrew.rybchenko@oktetlabs.ru>
> >>>>>>> Reviewed-By: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>>>>>> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> >>>>>>
> >>>>>> Being acked does not mean it is good to apply in -rc3.
> >>>>>
> >>>>> I understand that the RFC/v1 of this series was formally too late
> >> to make it in 22.11, so I will not complain loudly if you choose to
> >>> omit it for 22.11.
> >>>>>
> >>>>> With two independent reviews, including from a mempool
> >> maintainer, I still have some hope. Also considering the risk
> >> assessment
> >>> below. ;-)
> >>>>>
> >>>>>> Please tell what is the benefit for 22.11 (before/after and
> >> condition).
> >>>>>
> >>>>> Short version: With this series, mempool statistics can be used
> >> in production. Without it, the performance cost
> >>> (mempool_perf_autotest: -74 %) is prohibitive!
> >>>>>
> >>>>> Long version:
> >>>>>
> >>>>> The patch series provides significantly higher performance for
> >> mempool statistics, which are readable through
> >>> rte_mempool_dump(FILE *f, struct rte_mempool *mp).
> >>>>>
> >>>>> Without this series, you have to set RTE_LIBRTE_MEMPOOL_DEBUG at
> >> build time to get mempool statistics.
> >>> RTE_LIBRTE_MEMPOOL_DEBUG also enables protective cookies before and
> >> after each mempool object, which are all verified on
> >>> get/put from the mempool. According to mempool_perf_autotest, the
> >> performance cost of mempool statistics (by setting
> >>> RTE_LIBRTE_MEMPOOL_DEBUG) is a 74 % decrease in rate_persec for
> >> mempools with cache (i.e. mbuf pools). Prohibitive for use in
> >>> production!
> >>>>>
> >>>>> With this series, the performance cost of mempool statistics (by
> >> setting RTE_LIBRTE_MEMPOOL_STATS) in
> >>> mempool_perf_autotest is only 6.7 %, so mempool statistics can be
> >> used in production.
> >>>>>
> >>>>>> Note there is a real risk doing such change that late.
> >>>>>
> >>>>> Risk assessment:
> >>>>>
> >>>>> The patch series has zero effect unless either
> >> RTE_LIBRTE_MEMPOOL_DEBUG or RTE_LIBRTE_MEMPOOL_STATS are set when
> >>> building. They are not set in the default build.
> >>>>
> >>>> If theses build flags are not set, there is no risk and no benefit.
> >>>> But if they are set, there is a risk of regression,
> >>>> for the benefit of an increased performance of a debug feature.
> >>>> I would say it is better to avoid any functional regression in a
> >> debug feature
> >>>> at this stage.
> >>>> Any other opinion?
> >>>>
> >>> While I agree that we should avoid any functional regression, I
> >> wonder how
> >>> widely used the debug feature is, and how big the risk of a
> >> regression is?
> >>> Even if there is one, having a regression in a debug feature is a lot
> >> less
> >>> serious than having one in something which goes into production.
> >>>
> >>
> >> Unless it introduces an ABI breakage (as I understand it doesn't), I'll
> >> wait till 23.03.
> >> Just in case.
> >
> > If built (both before and after this series) without RTE_LIBRTE_MEMPOOL_DEBUG (and without RTE_LIBRTE_MEMPOOL_STATS,
> which is introduced by the series), there is no ABI breakage.
> >
> > If built (both before and after this series) with RTE_LIBRTE_MEMPOOL_DEBUG (and without RTE_LIBRTE_MEMPOOL_STATS), the
> ABI differs between before and after this series: The stats array disappears from struct rte_mempool, and the output from
> rte_mempool_dump() does not include the statistics.
> >

Can we probably always enable RTE_LIBRTE_MEMPOOL_STATS when RTE_LIBRTE_MEMPOOL_DEBUG is on? 

> > If built (both before and after this series) with RTE_LIBRTE_MEMPOOL_DEBUG (and with RTE_LIBRTE_MEMPOOL_STATS), the ABI
> also differs between before and after this series: The size of the stats array in struct rte_mempool grows by one element.

Ah yes, missed that one.
So the question is then - does it count as formal ABI breakage or not?
If yes, then probably better to go ahead with these changes for 22.11
(it sounds too prohibitive to wait for an year here).  
Or at least take in the part that introduce the ABI breakage.
If not, probably not bit deal to wait till 23.03.

> >> BTW, as a side thought - if the impact is really that small now, would
> >> it make sense to make
> >> it run-time option, instead of compile-time one?
> >
> > The mempool get/put functions are very lean when built without STATS or DEBUG. With a runtime option, the resulting code would
> be slightly longer, and only one additional conditional would be hit in the common case (i.e. when the objects don't miss the mempool
> cache). So with stats disabled (at runtime), it would only add a very small performance cost. However, checking the value of the
> enabled/disabled variable can cause a CPU cache miss, which has a performance cost. And the enabled/disabled variable should
> definitely be global - if it is per mempool, it will cause many CPU cache misses (because the common case doesn't touch the mempool
> structure, only the mempool cache structure).
> >

Yes, either a global one, or put it into both structs: rte_mempool_cache and rte_mempool.

> It's not totally obvious that a conditional is better than just always
> performing these simple arithmetic operations, even if you don't know if
> you need the result (i.e., if stats is enabled or not), especially since
> they operate on a cache line that is very likely already owned by the
> core running the core (since the 'len' fields is frequently used).

Yep, that's another option - always update the cache part.

> > Also, checking the runtime option should have unlikely(), so the performance cost of the stats (when enabled at runtime) is also
> higher than with a build time option. (Yes, dynamic branch prediction will alleviate most of this, but it will consume entries in the
> branch predictor table - these are inlined functions. Just like we always try to avoid cache misses in DPDK, we should also try to
> conserve branch predictor table entries. I hate the argument that branch prediction fixes conditionals, especially if they are weird or
> could have been avoided.)
> >
> > In the cost/benefit analysis, we need to consider that these statistics are not fill/emptiness level status or similar, but only debug
> counters (number of get/put transactions and objects), so we need to ask ourselves this question: How many users are interested in
> these statistics for production and are unable to build their application with RTE_LIBRTE_MEMPOOL_STATS?

Obviously, I don't have such stats.
From my perspective - I am ok to spend few extra cycles to avoid building separate binary.
Again, I guess that  with global switch the impact will be negligible. 
But anyway, it will require even more changes and another ABI breakage (as stats should always be included),
so it definitely out of scope for this release. 

> > For example, we (SmartShare Systems) are only interested in them for application profiling purposes... trying to improve the
> performance by striving for a higher number of objects per burst in every pipeline stage.
> >
> >> Konstantin


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: FW: [PATCH v4 3/3] mempool: use cache for frequently updated stats
  2022-11-09 10:19               ` Konstantin Ananyev
@ 2022-11-09 11:42                 ` Morten Brørup
  0 siblings, 0 replies; 10+ messages in thread
From: Morten Brørup @ 2022-11-09 11:42 UTC (permalink / raw)
  To: Konstantin Ananyev, Mattias Rönnblom, Bruce Richardson,
	Thomas Monjalon
  Cc: andrew.rybchenko, olivier.matz, david.marchand, dev, hofors,
	stephen, jerinj

> From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> Sent: Wednesday, 9 November 2022 11.20
> 
> > On 2022-11-09 06:03, Morten Brørup wrote:
> > >> From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> > >> Sent: Tuesday, 8 November 2022 18.38
> > >>>
> > >>> On Tue, Nov 08, 2022 at 04:51:11PM +0100, Thomas Monjalon wrote:
> > >>>> 08/11/2022 15:30, Morten Brørup:
> > >>>>>> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > >>>>>> 08/11/2022 12:25, Morten Brørup:
> > >>>>>>> From: Morten Brørup
> > >>>>>>>> From: Konstantin Ananyev
> > >> [mailto:konstantin.ananyev@huawei.com]
> > >>>>>>>> Sent: Tuesday, 8 November 2022 10.20
> > >>>>>>>>> +#ifdef RTE_LIBRTE_MEMPOOL_STATS
> > >>>>>>>>> +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n)
> > >> (cache)-
> > >>>>>>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-
> 313273af-454445555731-27a859e7ec13035a&q=1&e=a120e28e-
> > caa7-4783-9686-5868c871553d&u=http%3A%2F%2Fstats.name%2F += n
> > >>>>>>>>
> > >>>>>>>> As Andrew already pointed, it needs to be: ((cache)-
> > >>> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-
> 454445555731-27a859e7ec13035a&q=1&e=a120e28e-caa7-
> > 4783-9686-5868c871553d&u=http%3A%2F%2Fstats.name%2F +=
> > >>>>>> (n))
> > >>>>>>>> Apart from that, LGTM.
> > >>>>>>>> Series-Acked-by: Konstantin Ananyev
> > >> <konstantin.ananyev@huawei.com>
> > >>>>>>>
> > >>>>>>> @Thomas, this series should be ready to apply... it now has
> > >> been:
> > >>>>>>> Reviewed-by: (mempool maintainer) Andrew Rybchenko
> > >>>>>> <andrew.rybchenko@oktetlabs.ru>
> > >>>>>>> Reviewed-By: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> > >>>>>>> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> > >>>>>>
> > >>>>>> Being acked does not mean it is good to apply in -rc3.
> > >>>>>
> > >>>>> I understand that the RFC/v1 of this series was formally too
> late
> > >> to make it in 22.11, so I will not complain loudly if you choose
> to
> > >>> omit it for 22.11.
> > >>>>>
> > >>>>> With two independent reviews, including from a mempool
> > >> maintainer, I still have some hope. Also considering the risk
> > >> assessment
> > >>> below. ;-)
> > >>>>>
> > >>>>>> Please tell what is the benefit for 22.11 (before/after and
> > >> condition).
> > >>>>>
> > >>>>> Short version: With this series, mempool statistics can be used
> > >> in production. Without it, the performance cost
> > >>> (mempool_perf_autotest: -74 %) is prohibitive!
> > >>>>>
> > >>>>> Long version:
> > >>>>>
> > >>>>> The patch series provides significantly higher performance for
> > >> mempool statistics, which are readable through
> > >>> rte_mempool_dump(FILE *f, struct rte_mempool *mp).
> > >>>>>
> > >>>>> Without this series, you have to set RTE_LIBRTE_MEMPOOL_DEBUG
> at
> > >> build time to get mempool statistics.
> > >>> RTE_LIBRTE_MEMPOOL_DEBUG also enables protective cookies before
> and
> > >> after each mempool object, which are all verified on
> > >>> get/put from the mempool. According to mempool_perf_autotest, the
> > >> performance cost of mempool statistics (by setting
> > >>> RTE_LIBRTE_MEMPOOL_DEBUG) is a 74 % decrease in rate_persec for
> > >> mempools with cache (i.e. mbuf pools). Prohibitive for use in
> > >>> production!
> > >>>>>
> > >>>>> With this series, the performance cost of mempool statistics
> (by
> > >> setting RTE_LIBRTE_MEMPOOL_STATS) in
> > >>> mempool_perf_autotest is only 6.7 %, so mempool statistics can be
> > >> used in production.
> > >>>>>
> > >>>>>> Note there is a real risk doing such change that late.
> > >>>>>
> > >>>>> Risk assessment:
> > >>>>>
> > >>>>> The patch series has zero effect unless either
> > >> RTE_LIBRTE_MEMPOOL_DEBUG or RTE_LIBRTE_MEMPOOL_STATS are set when
> > >>> building. They are not set in the default build.
> > >>>>
> > >>>> If theses build flags are not set, there is no risk and no
> benefit.
> > >>>> But if they are set, there is a risk of regression,
> > >>>> for the benefit of an increased performance of a debug feature.
> > >>>> I would say it is better to avoid any functional regression in a
> > >> debug feature
> > >>>> at this stage.
> > >>>> Any other opinion?
> > >>>>
> > >>> While I agree that we should avoid any functional regression, I
> > >> wonder how
> > >>> widely used the debug feature is, and how big the risk of a
> > >> regression is?
> > >>> Even if there is one, having a regression in a debug feature is a
> lot
> > >> less
> > >>> serious than having one in something which goes into production.
> > >>>
> > >>
> > >> Unless it introduces an ABI breakage (as I understand it doesn't),
> I'll
> > >> wait till 23.03.
> > >> Just in case.
> > >
> > > If built (both before and after this series) without
> RTE_LIBRTE_MEMPOOL_DEBUG (and without RTE_LIBRTE_MEMPOOL_STATS,
> > which is introduced by the series), there is no ABI breakage.
> > >
> > > If built (both before and after this series) with
> RTE_LIBRTE_MEMPOOL_DEBUG (and without RTE_LIBRTE_MEMPOOL_STATS), the
> > ABI differs between before and after this series: The stats array
> disappears from struct rte_mempool, and the output from
> > rte_mempool_dump() does not include the statistics.
> > >
> 
> Can we probably always enable RTE_LIBRTE_MEMPOOL_STATS when
> RTE_LIBRTE_MEMPOOL_DEBUG is on?

That would fix the rte_mempool_dump() API breakage, yes.

But since it's only a partial fix, I don't think we should do it.

> 
> > > If built (both before and after this series) with
> RTE_LIBRTE_MEMPOOL_DEBUG (and with RTE_LIBRTE_MEMPOOL_STATS), the ABI
> > also differs between before and after this series: The size of the
> stats array in struct rte_mempool grows by one element.
> 
> Ah yes, missed that one.
> So the question is then - does it count as formal ABI breakage or not?

Yes, this is a key question!

However, performance improvements are not accepted as LTS patches, so not including it in -rc3 will make it useless until 23.11. (At least for users deploying only LTS releases.)

> If yes, then probably better to go ahead with these changes for 22.11
> (it sounds too prohibitive to wait for an year here).
> Or at least take in the part that introduce the ABI breakage.

Without the 3rd patch in the series, the performance is not fully optimized. (Remember: Only relevant when built with RTE_LIBRTE_MEMPOOL_STATS.)

> If not, probably not bit deal to wait till 23.03.
> 
> > >> BTW, as a side thought - if the impact is really that small now,
> would
> > >> it make sense to make
> > >> it run-time option, instead of compile-time one?
> > >
> > > The mempool get/put functions are very lean when built without
> STATS or DEBUG. With a runtime option, the resulting code would
> > be slightly longer, and only one additional conditional would be hit
> in the common case (i.e. when the objects don't miss the mempool
> > cache). So with stats disabled (at runtime), it would only add a very
> small performance cost. However, checking the value of the
> > enabled/disabled variable can cause a CPU cache miss, which has a
> performance cost. And the enabled/disabled variable should
> > definitely be global - if it is per mempool, it will cause many CPU
> cache misses (because the common case doesn't touch the mempool
> > structure, only the mempool cache structure).
> > >
> 
> Yes, either a global one, or put it into both structs:
> rte_mempool_cache and rte_mempool.

That is doable. There is room for it in rte_mempool_cache, and it can be a flag in rte_mempool.

> 
> > It's not totally obvious that a conditional is better than just
> always
> > performing these simple arithmetic operations, even if you don't know
> if
> > you need the result (i.e., if stats is enabled or not), especially
> since
> > they operate on a cache line that is very likely already owned by the
> > core running the core (since the 'len' fields is frequently used).
> 
> Yep, that's another option - always update the cache part.

Yes, always updating the rte_mempool_cache stats to avoid the conditional in the likely code path seems like a viable concept.

(And for now, we can ignore that 64 bit stats are somewhat more costly on 32 bit architectures if tearing must be avoided. I say that we can ignore it for now, because this kind of tearing is ignored for 64 bit stats everywhere in DPDK, so it should be ignorable here too.)

> 
> > > Also, checking the runtime option should have unlikely(), so the
> performance cost of the stats (when enabled at runtime) is also
> > higher than with a build time option. (Yes, dynamic branch prediction
> will alleviate most of this, but it will consume entries in the
> > branch predictor table - these are inlined functions. Just like we
> always try to avoid cache misses in DPDK, we should also try to
> > conserve branch predictor table entries. I hate the argument that
> branch prediction fixes conditionals, especially if they are weird or
> > could have been avoided.)
> > >
> > > In the cost/benefit analysis, we need to consider that these
> statistics are not fill/emptiness level status or similar, but only
> debug
> > counters (number of get/put transactions and objects), so we need to
> ask ourselves this question: How many users are interested in
> > these statistics for production and are unable to build their
> application with RTE_LIBRTE_MEMPOOL_STATS?
> 
> Obviously, I don't have such stats.
> From my perspective - I am ok to spend few extra cycles to avoid
> building separate binary.
> Again, I guess that  with global switch the impact will be negligible.
> But anyway, it will require even more changes and another ABI breakage
> (as stats should always be included),
> so it definitely out of scope for this release.

Agree. This series can be tweaked further, as the discussion clearly shows.

However, time is about to run out on. If we want to include it 22.11, we should take it into -rc3 without any of the discussed modifications (except fixing the macro to satisfy checkpatch).

Furthermore, the discussed modifications - various ways of handling a runtime option - will introduce some performance degradation with the default build configuration. Regardless how small we think the performance degradation is, I would hesitate to introduce a performance degradation in -rc3.

The current patch series has zero performance effect with the default build configuration.

> 
> > > For example, we (SmartShare Systems) are only interested in them
> for application profiling purposes... trying to improve the
> > performance by striving for a higher number of objects per burst in
> every pipeline stage.
> > >
> > >> Konstantin


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-11-09 11:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-08 11:25 FW: [PATCH v4 3/3] mempool: use cache for frequently updated stats Morten Brørup
2022-11-08 13:32 ` Thomas Monjalon
2022-11-08 14:30   ` Morten Brørup
2022-11-08 15:51     ` Thomas Monjalon
2022-11-08 15:59       ` Bruce Richardson
2022-11-08 17:38         ` Konstantin Ananyev
2022-11-09  5:03           ` Morten Brørup
2022-11-09  8:21             ` Mattias Rönnblom
2022-11-09 10:19               ` Konstantin Ananyev
2022-11-09 11:42                 ` Morten Brørup

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).