DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Honnappa Nagarahalli" <Honnappa.Nagarahalli@arm.com>,
	<olivier.matz@6wind.com>, <andrew.rybchenko@oktetlabs.ru>,
	"Kamalakshitha Aligeri" <Kamalakshitha.Aligeri@arm.com>,
	<bruce.richardson@intel.com>, <konstantin.ananyev@huawei.com>,
	<dev@dpdk.org>
Cc: "nd" <nd@arm.com>, <david.marchand@redhat.com>, "nd" <nd@arm.com>
Subject: RE: [PATCH v8] mempool cache: add zero-copy get and put functions
Date: Mon, 13 Feb 2023 10:30:34 +0100	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D87733@smartserver.smartshare.dk> (raw)
In-Reply-To: <DBAPR08MB581485125EF8FA24E2B0176A98DD9@DBAPR08MB5814.eurprd08.prod.outlook.com>

> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Monday, 13 February 2023 05.30
> 
> <snip>
> 
> > > > +/**
> > > > + * @internal used by rte_mempool_cache_zc_put_bulk() and
> > > > rte_mempool_do_generic_put().
> > > > + *
> > > > + * Zero-copy put objects in a mempool cache backed by the
> specified
> > > > mempool.
> > > > + *
> > > > + * @param cache
> > > > + *   A pointer to the mempool cache.
> > > > + * @param mp
> > > > + *   A pointer to the mempool.
> > > > + * @param n
> > > > + *   The number of objects to be put in the mempool cache.
> > > > + * @return
> > > > + *   The pointer to where to put the objects in the mempool
> cache.
> > > > + *   NULL if the request itself is too big for the cache, i.e.
> > > > + *   exceeds the cache flush threshold.
> > > > + */
> > > > +static __rte_always_inline void **
> > > > +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > > +		struct rte_mempool *mp,
> > > > +		unsigned int n)
> > > > +{
> > > > +	void **cache_objs;
> > > > +
> > > > +	RTE_ASSERT(cache != NULL);
> > > > +	RTE_ASSERT(mp != NULL);
> > > > +
> > > > +	if (n <= cache->flushthresh - cache->len) {
> > > > +		/*
> > > > +		 * The objects can be added to the cache without
> crossing
> > > the
> > > > +		 * flush threshold.
> > > > +		 */
> > > > +		cache_objs = &cache->objs[cache->len];
> > > > +		cache->len += n;
> > > > +	} else if (likely(n <= cache->flushthresh)) {
> > > > +		/*
> > > > +		 * The request itself fits into the cache.
> > > > +		 * But first, the cache must be flushed to the
> backend, so
> > > > +		 * adding the objects does not cross the flush
> threshold.
> > > > +		 */
> > > > +		cache_objs = &cache->objs[0];
> > > > +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > > > >len);
> > > This is a flush of the cache. It is probably worth having a counter
> > > for this.
> >
> > We somewhat already do. The put_common_pool_bulk counter is
> > incremented in rte_mempool_ops_enqueue_bulk() [1].
> >
> > [1]: https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/mempool/rte_mempool.h#L824
> >
> > This counter doesn't exactly count the number of times the cache was
> > flushed, because it also counts bulk put transactions not going via
> the cache.
> >
> > Thinking further about it, I agree that specific counters for cache
> flush and
> > cache refill could be useful, and should be added. However, being
> this late, I
> > would prefer postponing them for a separate patch.
> Agree, can be in a separate patch, they never existed.

OK. We have agreed to postpone the counter discussion to a separate patch. So let's resume the discussion there.

Such a patch will not make it for RC1, so I have put it on my TODO list for later.

> 
> >
> > >
> > > > +		cache->len = n;
> > > > +	} else {
> > > > +		/* The request itself is too big for the cache. */
> > > This is possibly an error condition. Do we need to set rte_errno?
> >
> > I considered this when I wrote the function, and concluded that this
> function
> > only returns NULL as normal behavior, never because of failure. E.g.
> if a cache
> > can only hold 4 mbufs, and the PMD tries to store 32 mbufs, it is
> correct
> > behavior of the PMD to call this function and learn that the cache is
> too small
> > for direct access; it is not a failure in the PMD nor in the cache.
> This condition happens when there is a mismatch between the cache
> configuration and the behavior of the PMD. From this perspective I
> think this is an error. This could go unnoticed, I do not think this
> misconfiguration is reported anywhere.

No strong objection from me.

In v9, I will also set rte_errno=EINVAL here.

> 
> >
> > If it could return NULL due to a variety of reasons, I would probably
> agree that
> > we *need* to set rte_errno, so the application could determine the
> reason.
> >
> > But since it can only return NULL due to one reason (which involves
> correct
> > use of the function), I convinced myself that setting rte_errno would
> not
> > convey any additional information than the NULL return value itself,
> and be a
> > waste of performance in the fast path. If you disagree, then it
> should be set to
> > EINVAL, like when rte_mempool_cache_zc_get_bulk() is called with a
> request
> > too big for the cache.
> >
> > > Do we need a counter here to capture that?
> >
> > Good question. I don't know. It would indicate that a cache is
> smaller than
> > expected by the users trying to access the cache directly.
> >
> > And if we add such a counter, we should probably add a similar
> counter for
> > the cache get function too.
> Agree
> 
> >
> > But again, being this late, I would postpone such counters for a
> separate
> > patch. And they should trigger more discussions about required/useful
> > counters.
> Agree, should be postponed to another patch.

Agreed. Let's move the counter discussion to the counters patch, when provided.

> 
> >
> > For reference, the rte_mempool_debug_stats is cache aligned and
> currently
> > holds 12 64-bit counters, so we can add 4 more - which is exactly the
> number
> > discussed here - without changing its size. So this is not a barrier
> to adding
> > those counters.
> >
> > Furthermore, I suppose that we only want to increase the counter when
> the
> > called through the mempool cache API, not when called indirectly
> through
> > the mempool API. This would mean that the ordinary mempool functions
> > cannot call the mempool cache functions, or the wrong counters would
> > increase. So adding such counters is not completely trivial.
> >
> > >
> > > > +		return NULL;
> > > > +	}
> > > > +
> > > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > > +
> > > > +	return cache_objs;
> > > > +}
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > prior
> > > > notice.
> > > > + *
> > > > + * Zero-copy put objects in a mempool cache backed by the
> specified
> > > > mempool.
> > > > + *
> > > > + * @param cache
> > > > + *   A pointer to the mempool cache.
> > > > + * @param mp
> > > > + *   A pointer to the mempool.
> > > > + * @param n
> > > > + *   The number of objects to be put in the mempool cache.
> > > > + * @return
> > > > + *   The pointer to where to put the objects in the mempool
> cache.
> > > > + *   NULL if the request itself is too big for the cache, i.e.
> > > > + *   exceeds the cache flush threshold.
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline void **
> > > > +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > > +		struct rte_mempool *mp,
> > > > +		unsigned int n)
> > > > +{
> > > > +	RTE_ASSERT(cache != NULL);
> > > > +	RTE_ASSERT(mp != NULL);
> > > > +
> > > > +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> > > > +	return __rte_mempool_cache_zc_put_bulk(cache, mp, n); }
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > prior
> > > > notice.
> > > > + *
> > > > + * Zero-copy un-put objects in a mempool cache.
> > > > + *
> > > > + * @param cache
> > > > + *   A pointer to the mempool cache.
> > > > + * @param n
> > > > + *   The number of objects not put in the mempool cache after
> > > calling
> > > > + *   rte_mempool_cache_zc_put_bulk().
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline void
> > > > +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> > > > +		unsigned int n)
> Earlier there was a discussion on the API name.

The discussion was here:

http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D875E8@smartserver.smartshare.dk/

> IMO, we should keep the API names similar to those in ring library.
> This would provide consistency across the libraries.
> There were some concerns expressed in PMD having to call 2 APIs. I do
> not think changing to 2 APIs will have any perf impact.

There is also the difference that the ring library implements locking with these APIs, whereas the mempool cache API do not need locking. The ring's _start() function is called to enter the critical section, and the _finish() function to leave the critical section.

I am usually in favor of consistency, but I would argue this: For functions (in the mempool cache) that are lockless, it might be confusing if they have names (from the ring library) that imply locking. DPDK is already bad at documenting the thread safeness of various functions; let's not make it worse by using confusing function names.

This is a democracy, so I am open to changing the API for consistency, if the community insists. Anyone interested, please indicate which API you prefer for the zero-copy mempool cache.

A. The unique API provided in this patch:
rte_mempool_cache_zc_get_bulk(), rte_mempool_cache_zc_put_bulk(), and the optional rte_mempool_cache_zc_put_rewind(); or

B. An API with _start and _finish, like in the ring library:
rte_mempool_cache_zc_get_bulk_start and _finish(), and rte_mempool_cache_zc_put_bulk_start() and _finish().

I am in favor of A: Keep the unique names as provided in the patch.

Konstantin accepted A, so unless he says otherwise, I'll treat that as a vote for A.

> 
> Also, what is the use case for the 'rewind' API?

It is for use cases where the application/PMD is opportunistic and prepares for zero-copying a full burst, but while zero-copying realizes that there was not a full burst to zero-copy.

E.g. a PMD freeing mbufs after transmit. It hopes to "fast free" the entire burst, and prepares for zero-copying the full burst to the mempool; but while zero-copying the mbufs to the mempool, a few of the mbufs don't live up to the criteria for "fast free", so they cannot be zero-copied, and thus the PMD frees those few mbufs normally instead. Then the PMD must call rewind() to adjust the zero-copy burst size accordingly when done.


  reply	other threads:[~2023-02-13  9:30 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-05 13:19 [RFC]: mempool: zero-copy cache get bulk Morten Brørup
2022-11-07  9:19 ` Bruce Richardson
2022-11-07 14:32   ` Morten Brørup
2022-11-15 16:18 ` [PATCH] mempool cache: add zero-copy get and put functions Morten Brørup
2022-11-16 18:04 ` [PATCH v2] " Morten Brørup
2022-11-29 20:54   ` Kamalakshitha Aligeri
2022-11-30 10:21     ` Morten Brørup
2022-12-22 15:57   ` Konstantin Ananyev
2022-12-22 17:55     ` Morten Brørup
2022-12-23 16:58       ` Konstantin Ananyev
2022-12-24 12:17         ` Morten Brørup
2022-12-24 11:49 ` [PATCH v3] " Morten Brørup
2022-12-24 11:55 ` [PATCH v4] " Morten Brørup
2022-12-27  9:24   ` Andrew Rybchenko
2022-12-27 10:31     ` Morten Brørup
2022-12-27 15:17 ` [PATCH v5] " Morten Brørup
2023-01-22 20:34   ` Konstantin Ananyev
2023-01-22 21:17     ` Morten Brørup
2023-01-23 11:53       ` Konstantin Ananyev
2023-01-23 12:23         ` Morten Brørup
2023-01-23 12:52           ` Konstantin Ananyev
2023-01-23 14:30           ` Bruce Richardson
2023-01-24  1:53             ` Kamalakshitha Aligeri
2023-02-09 14:39 ` [PATCH v6] " Morten Brørup
2023-02-09 14:52 ` [PATCH v7] " Morten Brørup
2023-02-09 14:58 ` [PATCH v8] " Morten Brørup
2023-02-10  8:35   ` fengchengwen
2023-02-12 19:56   ` Honnappa Nagarahalli
2023-02-12 23:15     ` Morten Brørup
2023-02-13  4:29       ` Honnappa Nagarahalli
2023-02-13  9:30         ` Morten Brørup [this message]
2023-02-13  9:37         ` Olivier Matz
2023-02-13 10:25           ` Morten Brørup
2023-02-14 14:16             ` Andrew Rybchenko
2023-02-13 12:24 ` [PATCH v9] " Morten Brørup
2023-02-13 14:33   ` Kamalakshitha Aligeri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35D87733@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=Kamalakshitha.Aligeri@arm.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=bruce.richardson@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@huawei.com \
    --cc=nd@arm.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).