From: Konstantin Ananyev <konstantin.ananyev@huawei.com>
To: "Morten Brørup" <mb@smartsharesystems.com>,
"Konstantin Ananyev" <konstantin.v.ananyev@yandex.ru>,
"olivier.matz@6wind.com" <olivier.matz@6wind.com>,
"andrew.rybchenko@oktetlabs.ru" <andrew.rybchenko@oktetlabs.ru>,
"honnappa.nagarahalli@arm.com" <honnappa.nagarahalli@arm.com>,
"kamalakshitha.aligeri@arm.com" <kamalakshitha.aligeri@arm.com>,
"bruce.richardson@intel.com" <bruce.richardson@intel.com>,
"dev@dpdk.org" <dev@dpdk.org>
Cc: "nd@arm.com" <nd@arm.com>
Subject: RE: [PATCH v5] mempool cache: add zero-copy get and put functions
Date: Mon, 23 Jan 2023 11:53:32 +0000 [thread overview]
Message-ID: <81e1ec960f6d4a1c80b236d9314cb549@huawei.com> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D876A3@smartserver.smartshare.dk>
> > Few nits, see below.
> > Also I still think we do need a test case for _zc_get_ before
> > accepting it in the mainline.
>
> Poking at my bad conscience... :-)
>
> It's on my todo-list. Apparently not high enough. ;-)
>
> > With that in place:
> > Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> >
> > > Zero-copy access to mempool caches is beneficial for PMD performance,
> > and
> > > must be provided by the mempool library to fix [Bug 1052] without a
> > > performance regression.
> > >
> > > [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> > >
> > > Bugzilla ID: 1052
> > >
> > > v5:
> > > * Bugfix: Compare zero-copy get request to the cache size instead of
> > the
> > > flush threshold; otherwise refill could overflow the memory
> > allocated
> > > for the cache. (Andrew)
> > > * Split the zero-copy put function into an internal function doing
> > the
> > > work, and a public function with trace.
> > > * Avoid code duplication by rewriting rte_mempool_do_generic_put() to
> > use
> > > the internal zero-copy put function. (Andrew)
> > > * Corrected the return type of rte_mempool_cache_zc_put_bulk() from
> > void *
> > > to void **; it returns a pointer to an array of objects.
> > > * Fix coding style: Add missing curly brackets. (Andrew)
> > > v4:
> > > * Fix checkpatch warnings.
> > > v3:
> > > * Bugfix: Respect the cache size; compare to the flush threshold
> > instead
> > > of RTE_MEMPOOL_CACHE_MAX_SIZE.
> > > * Added 'rewind' function for incomplete 'put' operations.
> > (Konstantin)
> > > * Replace RTE_ASSERTs with runtime checks of the request size.
> > > Instead of failing, return NULL if the request is too big.
> > (Konstantin)
> > > * Modified comparison to prevent overflow if n is really huge and len
> > is
> > > non-zero.
> > > * Updated the comments in the code.
> > > v2:
> > > * Fix checkpatch warnings.
> > > * Fix missing registration of trace points.
> > > * The functions are inline, so they don't go into the map file.
> > > v1 changes from the RFC:
> > > * Removed run-time parameter checks. (Honnappa)
> > > This is a hot fast path function; requiring correct application
> > > behaviour, i.e. function parameters must be valid.
> > > * Added RTE_ASSERT for parameters instead.
> > > Code for this is only generated if built with RTE_ENABLE_ASSERT.
> > > * Removed fallback when 'cache' parameter is not set. (Honnappa)
> > > * Chose the simple get function; i.e. do not move the existing
> > objects in
> > > the cache to the top of the new stack, just leave them at the
> > bottom.
> > > * Renamed the functions. Other suggestions are welcome, of course. ;-
> > )
> > > * Updated the function descriptions.
> > > * Added the functions to trace_fp and version.map.
> > >
> > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > ---
> > > lib/mempool/mempool_trace_points.c | 9 ++
> > > lib/mempool/rte_mempool.h | 237 +++++++++++++++++++++++++-
> > ---
> > > lib/mempool/rte_mempool_trace_fp.h | 23 +++
> > > lib/mempool/version.map | 5 +
> > > 4 files changed, 245 insertions(+), 29 deletions(-)
> > >
> > > diff --git a/lib/mempool/mempool_trace_points.c
> > b/lib/mempool/mempool_trace_points.c
> > > index 4ad76deb34..83d353a764 100644
> > > --- a/lib/mempool/mempool_trace_points.c
> > > +++ b/lib/mempool/mempool_trace_points.c
> > > @@ -77,3 +77,12 @@
> > RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free,
> > >
> > > RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname,
> > > lib.mempool.set.ops.byname)
> > > +
> > > +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk,
> > > + lib.mempool.cache.zc.put.bulk)
> > > +
> > > +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_rewind,
> > > + lib.mempool.cache.zc.put.rewind)
> > > +
> > > +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk,
> > > + lib.mempool.cache.zc.get.bulk)
> > > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> > > index 9f530db24b..5efd3c2b5b 100644
> > > --- a/lib/mempool/rte_mempool.h
> > > +++ b/lib/mempool/rte_mempool.h
> > > @@ -47,6 +47,7 @@
> > > #include <rte_ring.h>
> > > #include <rte_memcpy.h>
> > > #include <rte_common.h>
> > > +#include <rte_errno.h>
> > >
> > > #include "rte_mempool_trace_fp.h"
> > >
> > > @@ -1346,6 +1347,197 @@ rte_mempool_cache_flush(struct
> > rte_mempool_cache *cache,
> > > cache->len = 0;
> > > }
> > >
> > > +/**
> > > + * @internal used by rte_mempool_cache_zc_put_bulk() and
> > rte_mempool_do_generic_put().
> > > + *
> > > + * Zero-copy put objects in a user-owned mempool cache backed by the
> > specified mempool.
> > > + *
> > > + * @param cache
> > > + * A pointer to the mempool cache.
> > > + * @param mp
> > > + * A pointer to the mempool.
> > > + * @param n
> > > + * The number of objects to be put in the mempool cache.
> > > + * @return
> > > + * The pointer to where to put the objects in the mempool cache.
> > > + * NULL if the request itself is too big for the cache, i.e.
> > > + * exceeds the cache flush threshold.
> > > + */
> > > +static __rte_always_inline void **
> > > +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > + struct rte_mempool *mp,
> > > + unsigned int n)
> > > +{
> > > + void **cache_objs;
> > > +
> > > + RTE_ASSERT(cache != NULL);
> > > + RTE_ASSERT(mp != NULL);
> > > +
> > > + if (n <= cache->flushthresh - cache->len) {
> > > + /*
> > > + * The objects can be added to the cache without crossing
> > the
> > > + * flush threshold.
> > > + */
> > > + cache_objs = &cache->objs[cache->len];
> > > + cache->len += n;
> > > + } else if (likely(n <= cache->flushthresh)) {
> > > + /*
> > > + * The request itself fits into the cache.
> > > + * But first, the cache must be flushed to the backend, so
> > > + * adding the objects does not cross the flush threshold.
> > > + */
> > > + cache_objs = &cache->objs[0];
> > > + rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
> > > + cache->len = n;
> > > + } else {
> > > + /* The request itself is too big for the cache. */
> > > + return NULL;
> > > + }
> > > +
> > > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > +
> > > + return cache_objs;
> > > +}
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > prior notice.
> > > + *
> > > + * Zero-copy put objects in a user-owned mempool cache backed by the
> > specified mempool.
> > > + *
> > > + * @param cache
> > > + * A pointer to the mempool cache.
> > > + * @param mp
> > > + * A pointer to the mempool.
> > > + * @param n
> > > + * The number of objects to be put in the mempool cache.
> > > + * @return
> > > + * The pointer to where to put the objects in the mempool cache.
> > > + * NULL if the request itself is too big for the cache, i.e.
> > > + * exceeds the cache flush threshold.
> > > + */
> > > +__rte_experimental
> > > +static __rte_always_inline void **
> > > +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > + struct rte_mempool *mp,
> > > + unsigned int n)
> > > +{
> > > + RTE_ASSERT(cache != NULL);
> > > + RTE_ASSERT(mp != NULL);
> > > +
> > > + rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> > > + return __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> > > +}
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > prior notice.
> > > + *
> > > + * Zero-copy un-put objects in a user-owned mempool cache.
> > > + *
> > > + * @param cache
> > > + * A pointer to the mempool cache.
> > > + * @param n
> > > + * The number of objects not put in the mempool cache after
> > calling
> > > + * rte_mempool_cache_zc_put_bulk().
> > > + */
> > > +__rte_experimental
> > > +static __rte_always_inline void
> > > +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> > > + unsigned int n)
> > > +{
> > > + RTE_ASSERT(cache != NULL);
> > > + RTE_ASSERT(n <= cache->len);
> > > +
> > > + rte_mempool_trace_cache_zc_put_rewind(cache, n);
> > > +
> > > + cache->len -= n;
> > > +
> > > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n);
> > > +}
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > prior notice.
> > > + *
> > > + * Zero-copy get objects from a user-owned mempool cache backed by
> > the specified mempool.
> > > + *
> > > + * @param cache
> > > + * A pointer to the mempool cache.
> > > + * @param mp
> > > + * A pointer to the mempool.
> > > + * @param n
> > > + * The number of objects to prefetch into the mempool cache.
> >
> > Why not 'get' instead of 'prefetch'?
>
> This was my thinking:
>
> The function "prefetches" the objects into the cache. It is the application itself that "gets" the objects from the cache after having
> called the function.
> You might also notice that the n parameter for the zc_put() function is described as "to be put" (future), not "to put" (now) in the
> cache.
>
> On the other hand, I chose "Zero-copy get" for the function headline to keep it simple.
>
> If you think "get" is a more correct description of the n parameter, I can change it.
>
> Alternatively, I can use the same style as zc_put(), i.e. "to be gotten from the mempool cache" - but that would require input from a
> natively English speaking person, because Danish and English grammar is very different, and I am highly uncertain about my English
> grammar here! I originally considered this phrase, but concluded that the "prefetch" description was easier to understand - especially
> for non-native English readers.
For me 'prefetch' seems a bit unclear in that situation...
Probably: "number of objects that user plans to extract from the cache"?
But again, I am not native English speaker too, so might be someone can suggest a better option.
>
> >
> >
> > > + * @return
> > > + * The pointer to the objects in the mempool cache.
> > > + * NULL on error; i.e. the cache + the pool does not contain 'n'
> > objects.
> > > + * With rte_errno set to the error code of the mempool dequeue
> > function,
> > > + * or EINVAL if the request itself is too big for the cache, i.e.
> > > + * exceeds the cache flush threshold.
> > > + */
> > > +__rte_experimental
> > > +static __rte_always_inline void *
> > > +rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
> > > + struct rte_mempool *mp,
> > > + unsigned int n)
> > > +{
> > > + unsigned int len, size;
> > > +
> > > + RTE_ASSERT(cache != NULL);
> > > + RTE_ASSERT(mp != NULL);
> > > +
> > > + rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
> > > +
> > > + len = cache->len;
> > > + size = cache->size;
> > > +
> > > + if (n <= len) {
> > > + /* The request can be satisfied from the cache as is. */
> > > + len -= n;
> > > + } else if (likely(n <= size)) {
> > > + /*
> > > + * The request itself can be satisfied from the cache.
> > > + * But first, the cache must be filled from the backend;
> > > + * fetch size + requested - len objects.
> > > + */
> > > + int ret;
> > > +
> > > + ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len],
> > size + n - len);
> > > + if (unlikely(ret < 0)) {
> > > + /*
> > > + * We are buffer constrained.
> > > + * Do not fill the cache, just satisfy the request.
> > > + */
> > > + ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> > >objs[len], n - len);
> > > + if (unlikely(ret < 0)) {
> > > + /* Unable to satisfy the request. */
> > > +
> > > + RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
> > > + RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
> > > +
> > > + rte_errno = -ret;
> > > + return NULL;
> > > + }
> > > +
> > > + len = 0;
> > > + } else {
> > > + len = size;
> > > + }
> > > + } else {
> > > + /* The request itself is too big for the cache. */
> > > + rte_errno = EINVAL;
> > > + return NULL;
> > > + }
> > > +
> > > + cache->len = len;
> > > +
> > > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> > > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> > > +
> > > + return &cache->objs[len];
> > > +}
> > > +
> > > /**
> > > * @internal Put several objects back in the mempool; used
> > internally.
> > > * @param mp
> > > @@ -1364,32 +1556,25 @@ rte_mempool_do_generic_put(struct rte_mempool
> > *mp, void * const *obj_table,
> > > {
> > > void **cache_objs;
> > >
> > > - /* No cache provided */
> > > - if (unlikely(cache == NULL))
> > > - goto driver_enqueue;
> > > + /* No cache provided? */
> > > + if (unlikely(cache == NULL)) {
> > > + /* Increment stats now, adding in mempool always succeeds.
> > */
> > > + RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > > + RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > >
> > > - /* increment stat now, adding in mempool always success */
> > > - RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > - RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > + goto driver_enqueue;
> > > + }
> > >
> > > - /* The request itself is too big for the cache */
> > > - if (unlikely(n > cache->flushthresh))
> > > - goto driver_enqueue_stats_incremented;
> > > + /* Prepare to add the objects to the cache. */
> > > + cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> > >
> > > - /*
> > > - * The cache follows the following algorithm:
> > > - * 1. If the objects cannot be added to the cache without
> > crossing
> > > - * the flush threshold, flush the cache to the backend.
> > > - * 2. Add the objects to the cache.
> > > - */
> > > + /* The request itself is too big for the cache? */
> > > + if (unlikely(cache_objs == NULL)) {
> > > + /* Increment stats now, adding in mempool always succeeds.
> > */
> > > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> >
> > Shouldn't it be RTE_MEMPOOL_STAT_ADD() here?
>
> I can see why you are wondering, but the answer is no. The statistics in mempool cache are not related to the cache, they are related
> to the mempool; they are there to provide faster per-lcore update access [1].
>
> [1]: https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/mempool/rte_mempool.h#L94
But the condition above:
if (unlikely(cache_objs == NULL))
means that me can't put these object to the cache and have to put
objects straight to the pool (skipping cache completely), right?
If so, then why to update cache stats instead of pool stats?
> >
> > >
> > > - if (cache->len + n <= cache->flushthresh) {
> > > - cache_objs = &cache->objs[cache->len];
> > > - cache->len += n;
> > > - } else {
> > > - cache_objs = &cache->objs[0];
> > > - rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
> > > - cache->len = n;
> > > + goto driver_enqueue;
> > > }
> > >
> > > /* Add the objects to the cache. */
> > > @@ -1399,13 +1584,7 @@ rte_mempool_do_generic_put(struct rte_mempool
> > *mp, void * const *obj_table,
> > >
> > > driver_enqueue:
> > >
> > > - /* increment stat now, adding in mempool always success */
> > > - RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > > - RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > > -
> > > -driver_enqueue_stats_incremented:
> > > -
> > > - /* push objects to the backend */
> > > + /* Push the objects to the backend. */
> > > rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
> > > }
> > >
> > > diff --git a/lib/mempool/rte_mempool_trace_fp.h
> > b/lib/mempool/rte_mempool_trace_fp.h
> > > index ed060e887c..14666457f7 100644
> > > --- a/lib/mempool/rte_mempool_trace_fp.h
> > > +++ b/lib/mempool/rte_mempool_trace_fp.h
> > > @@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
> > > rte_trace_point_emit_ptr(mempool);
> > > )
> > >
> > > +RTE_TRACE_POINT_FP(
> > > + rte_mempool_trace_cache_zc_put_bulk,
> > > + RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> > nb_objs),
> > > + rte_trace_point_emit_ptr(cache);
> > > + rte_trace_point_emit_ptr(mempool);
> > > + rte_trace_point_emit_u32(nb_objs);
> > > +)
> > > +
> > > +RTE_TRACE_POINT_FP(
> > > + rte_mempool_trace_cache_zc_put_rewind,
> > > + RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
> > > + rte_trace_point_emit_ptr(cache);
> > > + rte_trace_point_emit_u32(nb_objs);
> > > +)
> > > +
> > > +RTE_TRACE_POINT_FP(
> > > + rte_mempool_trace_cache_zc_get_bulk,
> > > + RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> > nb_objs),
> > > + rte_trace_point_emit_ptr(cache);
> > > + rte_trace_point_emit_ptr(mempool);
> > > + rte_trace_point_emit_u32(nb_objs);
> > > +)
> > > +
> > > #ifdef __cplusplus
> > > }
> > > #endif
> > > diff --git a/lib/mempool/version.map b/lib/mempool/version.map
> > > index b67d7aace7..1383ae6db2 100644
> > > --- a/lib/mempool/version.map
> > > +++ b/lib/mempool/version.map
> > > @@ -63,6 +63,11 @@ EXPERIMENTAL {
> > > __rte_mempool_trace_ops_alloc;
> > > __rte_mempool_trace_ops_free;
> > > __rte_mempool_trace_set_ops_byname;
> > > +
> > > + # added in 23.03
> > > + __rte_mempool_trace_cache_zc_put_bulk;
> > > + __rte_mempool_trace_cache_zc_put_rewind;
> > > + __rte_mempool_trace_cache_zc_get_bulk;
> > > };
> > >
> > > INTERNAL {
> >
next prev parent reply other threads:[~2023-01-23 11:53 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-05 13:19 [RFC]: mempool: zero-copy cache get bulk Morten Brørup
2022-11-07 9:19 ` Bruce Richardson
2022-11-07 14:32 ` Morten Brørup
2022-11-15 16:18 ` [PATCH] mempool cache: add zero-copy get and put functions Morten Brørup
2022-11-16 18:04 ` [PATCH v2] " Morten Brørup
2022-11-29 20:54 ` Kamalakshitha Aligeri
2022-11-30 10:21 ` Morten Brørup
2022-12-22 15:57 ` Konstantin Ananyev
2022-12-22 17:55 ` Morten Brørup
2022-12-23 16:58 ` Konstantin Ananyev
2022-12-24 12:17 ` Morten Brørup
2022-12-24 11:49 ` [PATCH v3] " Morten Brørup
2022-12-24 11:55 ` [PATCH v4] " Morten Brørup
2022-12-27 9:24 ` Andrew Rybchenko
2022-12-27 10:31 ` Morten Brørup
2022-12-27 15:17 ` [PATCH v5] " Morten Brørup
2023-01-22 20:34 ` Konstantin Ananyev
2023-01-22 21:17 ` Morten Brørup
2023-01-23 11:53 ` Konstantin Ananyev [this message]
2023-01-23 12:23 ` Morten Brørup
2023-01-23 12:52 ` Konstantin Ananyev
2023-01-23 14:30 ` Bruce Richardson
2023-01-24 1:53 ` Kamalakshitha Aligeri
2023-02-09 14:39 ` [PATCH v6] " Morten Brørup
2023-02-09 14:52 ` [PATCH v7] " Morten Brørup
2023-02-09 14:58 ` [PATCH v8] " Morten Brørup
2023-02-10 8:35 ` fengchengwen
2023-02-12 19:56 ` Honnappa Nagarahalli
2023-02-12 23:15 ` Morten Brørup
2023-02-13 4:29 ` Honnappa Nagarahalli
2023-02-13 9:30 ` Morten Brørup
2023-02-13 9:37 ` Olivier Matz
2023-02-13 10:25 ` Morten Brørup
2023-02-14 14:16 ` Andrew Rybchenko
2023-02-13 12:24 ` [PATCH v9] " Morten Brørup
2023-02-13 14:33 ` Kamalakshitha Aligeri
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=81e1ec960f6d4a1c80b236d9314cb549@huawei.com \
--to=konstantin.ananyev@huawei.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=honnappa.nagarahalli@arm.com \
--cc=kamalakshitha.aligeri@arm.com \
--cc=konstantin.v.ananyev@yandex.ru \
--cc=mb@smartsharesystems.com \
--cc=nd@arm.com \
--cc=olivier.matz@6wind.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).