From: Olivier Matz <olivier.matz@6wind.com>
To: "Morten Brørup" <mb@smartsharesystems.com>
Cc: "andrew.rybchenko@oktetlabs.ru" <andrew.rybchenko@oktetlabs.ru>,
Kamalakshitha Aligeri <Kamalakshitha.Aligeri@arm.com>,
"bruce.richardson@intel.com" <bruce.richardson@intel.com>,
"konstantin.ananyev@huawei.com" <konstantin.ananyev@huawei.com>,
"dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>,
"david.marchand@redhat.com" <david.marchand@redhat.com>,
Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Subject: Re: [PATCH v8] mempool cache: add zero-copy get and put functions
Date: Mon, 13 Feb 2023 10:37:01 +0100 [thread overview]
Message-ID: <Y+oEvWDEwWcR7wLp@platinum> (raw)
In-Reply-To: <DBAPR08MB581485125EF8FA24E2B0176A98DD9@DBAPR08MB5814.eurprd08.prod.outlook.com>
Hello,
Thank you for this work, and sorry for the late feedback too.
On Mon, Feb 13, 2023 at 04:29:51AM +0000, Honnappa Nagarahalli wrote:
> <snip>
>
> > > > +/**
> > > > + * @internal used by rte_mempool_cache_zc_put_bulk() and
> > > > rte_mempool_do_generic_put().
> > > > + *
> > > > + * Zero-copy put objects in a mempool cache backed by the specified
> > > > mempool.
> > > > + *
> > > > + * @param cache
> > > > + * A pointer to the mempool cache.
> > > > + * @param mp
> > > > + * A pointer to the mempool.
> > > > + * @param n
> > > > + * The number of objects to be put in the mempool cache.
> > > > + * @return
> > > > + * The pointer to where to put the objects in the mempool cache.
> > > > + * NULL if the request itself is too big for the cache, i.e.
> > > > + * exceeds the cache flush threshold.
> > > > + */
> > > > +static __rte_always_inline void **
> > > > +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > > + struct rte_mempool *mp,
> > > > + unsigned int n)
> > > > +{
> > > > + void **cache_objs;
> > > > +
> > > > + RTE_ASSERT(cache != NULL);
> > > > + RTE_ASSERT(mp != NULL);
> > > > +
> > > > + if (n <= cache->flushthresh - cache->len) {
The previous code was doing this test instead:
if (cache->len + n <= cache->flushthresh)
I know there is an invariant asserting that cache->len <= cache->threshold,
so there is no real issue, but I'll tend to say that it is a good practise
to avoid substractions on unsigned values to avoid the risk of wrapping.
I also think the previous test was a bit more readable.
> > > > + /*
> > > > + * The objects can be added to the cache without crossing
> > > the
> > > > + * flush threshold.
> > > > + */
> > > > + cache_objs = &cache->objs[cache->len];
> > > > + cache->len += n;
> > > > + } else if (likely(n <= cache->flushthresh)) {
> > > > + /*
> > > > + * The request itself fits into the cache.
> > > > + * But first, the cache must be flushed to the backend, so
> > > > + * adding the objects does not cross the flush threshold.
> > > > + */
> > > > + cache_objs = &cache->objs[0];
> > > > + rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > > > >len);
> > > This is a flush of the cache. It is probably worth having a counter
> > > for this.
> >
> > We somewhat already do. The put_common_pool_bulk counter is
> > incremented in rte_mempool_ops_enqueue_bulk() [1].
> >
> > [1]:
> > https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/mempool/rte_mempool.
> > h#L824
> >
> > This counter doesn't exactly count the number of times the cache was
> > flushed, because it also counts bulk put transactions not going via the cache.
> >
> > Thinking further about it, I agree that specific counters for cache flush and
> > cache refill could be useful, and should be added. However, being this late, I
> > would prefer postponing them for a separate patch.
> Agree, can be in a separate patch, they never existed.
>
> >
> > >
> > > > + cache->len = n;
> > > > + } else {
> > > > + /* The request itself is too big for the cache. */
> > > This is possibly an error condition. Do we need to set rte_errno?
> >
> > I considered this when I wrote the function, and concluded that this function
> > only returns NULL as normal behavior, never because of failure. E.g. if a cache
> > can only hold 4 mbufs, and the PMD tries to store 32 mbufs, it is correct
> > behavior of the PMD to call this function and learn that the cache is too small
> > for direct access; it is not a failure in the PMD nor in the cache.
> This condition happens when there is a mismatch between the cache configuration and the behavior of the PMD. From this perspective I think this is an error. This could go unnoticed, I do not think this misconfiguration is reported anywhere.
>
> >
> > If it could return NULL due to a variety of reasons, I would probably agree that
> > we *need* to set rte_errno, so the application could determine the reason.
> >
> > But since it can only return NULL due to one reason (which involves correct
> > use of the function), I convinced myself that setting rte_errno would not
> > convey any additional information than the NULL return value itself, and be a
> > waste of performance in the fast path. If you disagree, then it should be set to
> > EINVAL, like when rte_mempool_cache_zc_get_bulk() is called with a request
> > too big for the cache.
> >
> > > Do we need a counter here to capture that?
> >
> > Good question. I don't know. It would indicate that a cache is smaller than
> > expected by the users trying to access the cache directly.
> >
> > And if we add such a counter, we should probably add a similar counter for
> > the cache get function too.
> Agree
>
> >
> > But again, being this late, I would postpone such counters for a separate
> > patch. And they should trigger more discussions about required/useful
> > counters.
> Agree, should be postponed to another patch.
>
> >
> > For reference, the rte_mempool_debug_stats is cache aligned and currently
> > holds 12 64-bit counters, so we can add 4 more - which is exactly the number
> > discussed here - without changing its size. So this is not a barrier to adding
> > those counters.
> >
> > Furthermore, I suppose that we only want to increase the counter when the
> > called through the mempool cache API, not when called indirectly through
> > the mempool API. This would mean that the ordinary mempool functions
> > cannot call the mempool cache functions, or the wrong counters would
> > increase. So adding such counters is not completely trivial.
> >
> > >
> > > > + return NULL;
> > > > + }
> > > > +
> > > > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > > +
> > > > + return cache_objs;
> > > > +}
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > prior
> > > > notice.
> > > > + *
> > > > + * Zero-copy put objects in a mempool cache backed by the specified
> > > > mempool.
I think we should document the differences and advantage of using this
function over the standard version, explaining which copy is avoided,
why it is faster, ...
Also, we should say that once this function is called, the user has
to copy the objects to the cache.
> > > > + *
> > > > + * @param cache
> > > > + * A pointer to the mempool cache.
> > > > + * @param mp
> > > > + * A pointer to the mempool.
> > > > + * @param n
> > > > + * The number of objects to be put in the mempool cache.
> > > > + * @return
> > > > + * The pointer to where to put the objects in the mempool cache.
> > > > + * NULL if the request itself is too big for the cache, i.e.
> > > > + * exceeds the cache flush threshold.
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline void **
> > > > +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > > + struct rte_mempool *mp,
> > > > + unsigned int n)
> > > > +{
> > > > + RTE_ASSERT(cache != NULL);
> > > > + RTE_ASSERT(mp != NULL);
> > > > +
> > > > + rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> > > > + return __rte_mempool_cache_zc_put_bulk(cache, mp, n); }
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > prior
> > > > notice.
> > > > + *
> > > > + * Zero-copy un-put objects in a mempool cache.
> > > > + *
> > > > + * @param cache
> > > > + * A pointer to the mempool cache.
> > > > + * @param n
> > > > + * The number of objects not put in the mempool cache after
> > > calling
> > > > + * rte_mempool_cache_zc_put_bulk().
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline void
> > > > +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> > > > + unsigned int n)
> Earlier there was a discussion on the API name.
> IMO, we should keep the API names similar to those in ring library. This would provide consistency across the libraries.
> There were some concerns expressed in PMD having to call 2 APIs. I do not think changing to 2 APIs will have any perf impact.
I'm not really convinced by the API names too. Again, sorry, I know this
comment arrives after the battle.
Your proposal is:
/* Zero-copy put objects in a mempool cache backed by the specified mempool. */
rte_mempool_cache_zc_put_bulk(cache, mp, n)
/* Zero-copy get objects from a mempool cache backed by the specified mempool. */
rte_mempool_cache_zc_get_bulk(cache, mp, n)
Here are some observations:
- This was said in the discussion previously, but the functions do not
really get or put objects in the cache. Instead, they prepare the
cache (filling it or flushing it if needed) and update its length so
that the user can do the effective copy.
- The "_cache" is superfluous for me: these functions do not deal more
with the cache than the non zero-copy version
- The order of the parameters is (cache, mp, n) while the other functions
that take a mempool and a cache as parameters have the mp first (see
_generic versions).
- The "_bulk" is indeed present on other functions, but not all (the generic
version does not have it), I'm not sure it is absolutely required
What do you think about these API below?
rte_mempool_prepare_zc_put(mp, n, cache)
rte_mempool_prepare_zc_get(mp, n, cache)
>
> Also, what is the use case for the 'rewind' API?
+1
I have the same feeling that rewind() is not required now. It can be
added later if we find a use-case.
In case we want to keep it, I think we need to better specify in the API
comments in which unique conditions the function can be called
(i.e. after a call to rte_mempool_prepare_zc_put() with the same number
of objects, given no other operations were done on the mempool in
between). A call outside of these conditions has an undefined behavior.
>
> > > > +{
> > > > + RTE_ASSERT(cache != NULL);
> > > > + RTE_ASSERT(n <= cache->len);
> > > > +
> > > > + rte_mempool_trace_cache_zc_put_rewind(cache, n);
> > > > +
> > > > + cache->len -= n;
> > > > +
> > > > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n); }
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > prior
> > > > notice.
> > > > + *
> > > > + * Zero-copy get objects from a mempool cache backed by the
> > > specified
> > > > mempool.
> > > > + *
> > > > + * @param cache
> > > > + * A pointer to the mempool cache.
> > > > + * @param mp
> > > > + * A pointer to the mempool.
> > > > + * @param n
> > > > + * The number of objects to be made available for extraction from
> > > the
> > > > mempool cache.
> > > > + * @return
> > > > + * The pointer to the objects in the mempool cache.
> > > > + * NULL on error; i.e. the cache + the pool does not contain 'n'
> > > objects.
> > > > + * With rte_errno set to the error code of the mempool dequeue
> > > function,
> > > > + * or EINVAL if the request itself is too big for the cache, i.e.
> > > > + * exceeds the cache flush threshold.
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline void *
> > > > +rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
> > > > + struct rte_mempool *mp,
> > > > + unsigned int n)
> > > > +{
> > > > + unsigned int len, size;
> > > > +
> > > > + RTE_ASSERT(cache != NULL);
> > > > + RTE_ASSERT(mp != NULL);
> > > > +
> > > > + rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
> > > > +
> > > > + len = cache->len;
> > > > + size = cache->size;
> > > > +
> > > > + if (n <= len) {
> > > > + /* The request can be satisfied from the cache as is. */
> > > > + len -= n;
> > > > + } else if (likely(n <= size)) {
> > > > + /*
> > > > + * The request itself can be satisfied from the cache.
> > > > + * But first, the cache must be filled from the backend;
> > > > + * fetch size + requested - len objects.
> > > > + */
> > > > + int ret;
> > > > +
> > > > + ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> > > > >objs[len], size + n - len);
> > > > + if (unlikely(ret < 0)) {
> > > > + /*
> > > > + * We are buffer constrained.
> > > > + * Do not fill the cache, just satisfy the request.
> > > > + */
> > > > + ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> > > > >objs[len], n - len);
> > > > + if (unlikely(ret < 0)) {
> > > > + /* Unable to satisfy the request. */
> > > > +
> > > > + RTE_MEMPOOL_STAT_ADD(mp,
> > > > get_fail_bulk, 1);
> > > > + RTE_MEMPOOL_STAT_ADD(mp,
> > > > get_fail_objs, n);
> > > > +
> > > > + rte_errno = -ret;
> > > > + return NULL;
> > > > + }
> > > > +
> > > > + len = 0;
> > > > + } else {
> > > > + len = size;
> > > > + }
> > > > + } else {
> > > > + /* The request itself is too big for the cache. */
> > > > + rte_errno = EINVAL;
> > > > + return NULL;
> > > > + }
> > > > +
> > > > + cache->len = len;
> > > > +
> > > > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> > > > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> > > > +
> > > > + return &cache->objs[len];
> > > > +}
> > > > +
> > > > /**
> > > > * @internal Put several objects back in the mempool; used
> > > internally.
> > > > * @param mp
> > > > @@ -1364,32 +1557,25 @@ rte_mempool_do_generic_put(struct
> > > > rte_mempool *mp, void * const *obj_table, {
> > > > void **cache_objs;
> > > >
> > > > - /* No cache provided */
> > > > - if (unlikely(cache == NULL))
> > > > - goto driver_enqueue;
> > > > + /* No cache provided? */
> > > > + if (unlikely(cache == NULL)) {
> > > > + /* Increment stats now, adding in mempool always succeeds.
> > > > */
> > > > + RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > > > + RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > > >
> > > > - /* increment stat now, adding in mempool always success */
> > > > - RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > > - RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > > + goto driver_enqueue;
> > > > + }
> > > >
> > > > - /* The request itself is too big for the cache */
> > > > - if (unlikely(n > cache->flushthresh))
> > > > - goto driver_enqueue_stats_incremented;
> > > > + /* Prepare to add the objects to the cache. */
> > > > + cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> > > >
> > > > - /*
> > > > - * The cache follows the following algorithm:
> > > > - * 1. If the objects cannot be added to the cache without
> > > crossing
> > > > - * the flush threshold, flush the cache to the backend.
> > > > - * 2. Add the objects to the cache.
> > > > - */
> > > > + /* The request itself is too big for the cache? */
> > > > + if (unlikely(cache_objs == NULL)) {
> > > > + /* Increment stats now, adding in mempool always succeeds.
> > > > */
> > > > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > > + RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > >
> > > > - if (cache->len + n <= cache->flushthresh) {
> > > > - cache_objs = &cache->objs[cache->len];
> > > > - cache->len += n;
> > > > - } else {
> > > > - cache_objs = &cache->objs[0];
> > > > - rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > > > >len);
> > > > - cache->len = n;
> > > > + goto driver_enqueue;
> > > > }
> > > >
> > > > /* Add the objects to the cache. */ @@ -1399,13 +1585,7 @@
> > > > rte_mempool_do_generic_put(struct rte_mempool *mp, void * const
> > > > *obj_table,
> > > >
> > > > driver_enqueue:
> > > >
> > > > - /* increment stat now, adding in mempool always success */
> > > > - RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > > > - RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > > > -
> > > > -driver_enqueue_stats_incremented:
> > > > -
> > > > - /* push objects to the backend */
> > > > + /* Push the objects to the backend. */
> > > > rte_mempool_ops_enqueue_bulk(mp, obj_table, n); }
> > > >
> > > > diff --git a/lib/mempool/rte_mempool_trace_fp.h
> > > > b/lib/mempool/rte_mempool_trace_fp.h
> > > > index ed060e887c..14666457f7 100644
> > > > --- a/lib/mempool/rte_mempool_trace_fp.h
> > > > +++ b/lib/mempool/rte_mempool_trace_fp.h
> > > > @@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
> > > > rte_trace_point_emit_ptr(mempool);
> > > > )
> > > >
> > > > +RTE_TRACE_POINT_FP(
> > > > + rte_mempool_trace_cache_zc_put_bulk,
> > > > + RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> > > > nb_objs),
> > > > + rte_trace_point_emit_ptr(cache);
> > > > + rte_trace_point_emit_ptr(mempool);
> > > > + rte_trace_point_emit_u32(nb_objs);
> > > > +)
> > > > +
> > > > +RTE_TRACE_POINT_FP(
> > > > + rte_mempool_trace_cache_zc_put_rewind,
> > > > + RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
> > > > + rte_trace_point_emit_ptr(cache);
> > > > + rte_trace_point_emit_u32(nb_objs);
> > > > +)
> > > > +
> > > > +RTE_TRACE_POINT_FP(
> > > > + rte_mempool_trace_cache_zc_get_bulk,
> > > > + RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> > > > nb_objs),
> > > > + rte_trace_point_emit_ptr(cache);
> > > > + rte_trace_point_emit_ptr(mempool);
> > > > + rte_trace_point_emit_u32(nb_objs);
> > > > +)
> > > > +
> > > > #ifdef __cplusplus
> > > > }
> > > > #endif
> > > > diff --git a/lib/mempool/version.map b/lib/mempool/version.map index
> > > > b67d7aace7..1383ae6db2 100644
> > > > --- a/lib/mempool/version.map
> > > > +++ b/lib/mempool/version.map
> > > > @@ -63,6 +63,11 @@ EXPERIMENTAL {
> > > > __rte_mempool_trace_ops_alloc;
> > > > __rte_mempool_trace_ops_free;
> > > > __rte_mempool_trace_set_ops_byname;
> > > > +
> > > > + # added in 23.03
> > > > + __rte_mempool_trace_cache_zc_put_bulk;
> > > > + __rte_mempool_trace_cache_zc_put_rewind;
> > > > + __rte_mempool_trace_cache_zc_get_bulk;
> > > > };
> > > >
> > > > INTERNAL {
> > > > --
> > > > 2.17.1
>
next prev parent reply other threads:[~2023-02-13 9:37 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-05 13:19 [RFC]: mempool: zero-copy cache get bulk Morten Brørup
2022-11-07 9:19 ` Bruce Richardson
2022-11-07 14:32 ` Morten Brørup
2022-11-15 16:18 ` [PATCH] mempool cache: add zero-copy get and put functions Morten Brørup
2022-11-16 18:04 ` [PATCH v2] " Morten Brørup
2022-11-29 20:54 ` Kamalakshitha Aligeri
2022-11-30 10:21 ` Morten Brørup
2022-12-22 15:57 ` Konstantin Ananyev
2022-12-22 17:55 ` Morten Brørup
2022-12-23 16:58 ` Konstantin Ananyev
2022-12-24 12:17 ` Morten Brørup
2022-12-24 11:49 ` [PATCH v3] " Morten Brørup
2022-12-24 11:55 ` [PATCH v4] " Morten Brørup
2022-12-27 9:24 ` Andrew Rybchenko
2022-12-27 10:31 ` Morten Brørup
2022-12-27 15:17 ` [PATCH v5] " Morten Brørup
2023-01-22 20:34 ` Konstantin Ananyev
2023-01-22 21:17 ` Morten Brørup
2023-01-23 11:53 ` Konstantin Ananyev
2023-01-23 12:23 ` Morten Brørup
2023-01-23 12:52 ` Konstantin Ananyev
2023-01-23 14:30 ` Bruce Richardson
2023-01-24 1:53 ` Kamalakshitha Aligeri
2023-02-09 14:39 ` [PATCH v6] " Morten Brørup
2023-02-09 14:52 ` [PATCH v7] " Morten Brørup
2023-02-09 14:58 ` [PATCH v8] " Morten Brørup
2023-02-10 8:35 ` fengchengwen
2023-02-12 19:56 ` Honnappa Nagarahalli
2023-02-12 23:15 ` Morten Brørup
2023-02-13 4:29 ` Honnappa Nagarahalli
2023-02-13 9:30 ` Morten Brørup
2023-02-13 9:37 ` Olivier Matz [this message]
2023-02-13 10:25 ` Morten Brørup
2023-02-14 14:16 ` Andrew Rybchenko
2023-02-13 12:24 ` [PATCH v9] " Morten Brørup
2023-02-13 14:33 ` Kamalakshitha Aligeri
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y+oEvWDEwWcR7wLp@platinum \
--to=olivier.matz@6wind.com \
--cc=Honnappa.Nagarahalli@arm.com \
--cc=Kamalakshitha.Aligeri@arm.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=bruce.richardson@intel.com \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=konstantin.ananyev@huawei.com \
--cc=mb@smartsharesystems.com \
--cc=nd@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).