DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Honnappa Nagarahalli" <Honnappa.Nagarahalli@arm.com>,
	<dev@dpdk.org>, <olivier.matz@6wind.com>,
	<andrew.rybchenko@oktetlabs.ru>,
	"Kamalakshitha Aligeri" <Kamalakshitha.Aligeri@arm.com>,
	"Bruce Richardson" <bruce.richardson@intel.com>
Cc: "nd" <nd@arm.com>
Subject: RE: [RFC] mempool: zero-copy cache put bulk
Date: Wed, 9 Nov 2022 21:36:57 +0100	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D874A9@smartserver.smartshare.dk> (raw)
In-Reply-To: <DBAPR08MB581457B13C4A13A3465BC839983E9@DBAPR08MB5814.eurprd08.prod.outlook.com>

+To: Bruce also showed interest in this topic, and might have more insights.

> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Wednesday, 9 November 2022 18.58
> 
> <snip>
> 
> >
> > > From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> > > Sent: Sunday, 6 November 2022 00.11
> > >
> > > + Akshitha, she is working on similar patch
> > >
> > > Few comments inline
> > >
> > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > Sent: Saturday, November 5, 2022 8:40 AM
> > > >
> > > > Zero-copy access to the mempool cache is beneficial for PMD
> > > performance,
> > > > and must be provided by the mempool library to fix [Bug 1052]
> > > > without
> > > a
> > > > performance regression.
> > > >
> > > > [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> > > >
> > > >
> > > > This RFC offers a conceptual zero-copy put function, where the
> > > application
> > > > promises to store some objects, and in return gets an address
> where
> > > to store
> > > > them.
> > > >
> > > > I would like some early feedback.
> > > >
> > > > Notes:
> > > > * Allowing the 'cache' parameter to be NULL, and getting it from
> the
> > > > mempool instead, was inspired by rte_mempool_cache_flush().
> > > I am not sure why the 'cache' parameter is required for this API.
> This
> > > API should take the mem pool as the parameter.
> > >
> > > We have based our API on 'rte_mempool_do_generic_put' and removed
> > the
> > > 'cache' parameter.
> >
> > I thoroughly considered omitting the 'cache' parameter, but included
> it for
> > two reasons:
> >
> > 1. The function is a "mempool cache" function (i.e. primarily working
> on the
> > mempool cache), not a "mempool" function.
> >
> > So it is appropriate to have a pointer directly to the structure it
> is working on.
> > Following this through, I also made 'cache' the first parameter and
> 'mp' the
> > second, like in rte_mempool_cache_flush().
> I am wondering if the PMD should be aware of the cache or not. For ex:
> in the case of pipeline mode, the RX and TX side of the PMD are running
> on different cores.

In that example, the PMD can store two cache pointers, one for each of the RX and TX side.

And if the PMD is unaware of the cache pointer, it can look it up at runtime using rte_lcore_id(), like it does in the current Intel PMDs.

> However, since the rte_mempool_cache_flush API is provided, may be that
> decision is already done? Interestingly, rte_mempool_cache_flush is
> called by just a single PMD.

I intentionally aligned this RFC with rte_mempool_cache_flush() to maintain consistency.

However, the API is not set in stone. It should always be acceptable to consider improved alternatives.

> 
> So, the question is, should we allow zero-copy only for per-core cache
> or for other cases as well.

I suppose that the mempool library was designed to have a mempool associated with exactly one mempool cache per core. (Alternatively, the mempool can be configured with no mempool caches at all.)

We should probably stay loyal to that design concept, and only allow zero-copy for per-core cache.

If you can come up with an example of the opposite, I would like to explore that option too... I can't think of a good example myself, and perhaps I'm overlooking a relevant use case.

> 
> >
> > 2. In most cases, the function only accesses the mempool structure in
> order to
> > get the cache pointer. Skipping this step improves performance.
> >
> > And since the cache is created along with the mempool itself (and
> thus never
> > changes for a mempool), it would be safe for the PMD to store the
> 'cache'
> > pointer along with the 'mp' pointer in the PMD's queue structure.
> Agreed
> 
> >
> > E.g. in the i40e PMD the i40e_rx_queue structure could include a
> "struct
> > rte_mempool_cache *cache" field, which could be used i40e_rxq_rearm()
> [1]
> > instead of "cache = rte_mempool_default_cache(rxq->mp,
> rte_lcore_id())".
> >
> > [1] https://elixir.bootlin.com/dpdk/v22.11-
> > rc2/source/drivers/net/i40e/i40e_rxtx_vec_avx512.c#L31
> >
> > > This new API, on success, returns the pointer to memory where the
> > > objects are copied. On failure it returns NULL and the caller has
> to
> > > call 'rte_mempool_ops_enqueue_bulk'. Alternatively, the new API
> could
> > > do this as well and PMD does not need to do anything if it gets a
> NULL
> > > pointer.
> >
> > Yes, we agree about these two details:
> >
> > 1. The function should return a pointer, not an integer.
> > It would be a waste to use a another CPU register to convey a
> success/error
> > integer value, when the success/failure information is just as easily
> conveyed
> > by the pointer return value (non-NULL/NULL), and rte_errno for
> various error
> > values in the unlikely cases.
> >
> > 2. The function should leave it up to the PMD what to do if direct
> access to
> > the cache is unavailable.
> Just wondering about the advantage of this. I do not think PMD's have
> much of a choice other than calling 'rte_mempool_ops_enqueue_bulk'

I agree, but that was not my point. Let me try to rephrase:

The PMD is more likely to know how to efficiently build the array of mbufs to pass to rte_mempool_ops_enqueue_bulk() than the mempool library - many PMDs already implement a variety of vector instruction variants to do exactly that. So we should not try to be clever and add a fallback path - this job belongs in the PMD.

The PMD might not even have the array of mbufs lined up when calling rte_mempool_cache_put_bulk_promise(). The PMD could have an array of internal structures, where the mbuf pointer is an element in that structure.

> 
> >
> > >
> > > We should think about providing  similar API on the RX side to keep
> it
> > > symmetric.
> >
> > I sent an RFC for that too:
> > http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D87488@
> > smartserver.smartshare.dk/T/#u
> >
> >
> > >
> > > > * Asserting that the 'mp' parameter is not NULL is not done by
> other
> > > > functions, so I omitted it here too.
> > > >
> > > > NB: Please ignore formatting. Also, this code has not even been
> > > compile
> > > > tested.
> > > We are little bit ahead, tested the changes with i40e PF PMD, wrote
> > > unit test cases, going through internal review, will send out RFC
> on
> > > Monday
> >
> > Sounds good. Looking forward to review.
> >
> > >
> > > >
> > > > /**
> > > >  * Promise to put objects in a mempool via zero-copy access to a
> > > user-owned
> > > > mempool cache.
> > > >  *
> > > >  * @param cache
> > > >  *   A pointer to the mempool cache.
> > > >  * @param mp
> > > >  *   A pointer to the mempool.
> > > >  * @param n
> > > >  *   The number of objects to be put in the mempool cache.
> > > >  * @return
> > > >  *   The pointer to where to put the objects in the mempool
> cache.
> > > >  *   NULL on error
> > > >  *   with rte_errno set appropriately.
> > > >  */
> > > > static __rte_always_inline void *
> > > > rte_mempool_cache_put_bulk_promise(struct rte_mempool_cache
> > *cache,
> > > >         struct rte_mempool *mp,
> > > >         unsigned int n)
> > > > {
> > > >     void **cache_objs;
> > > >
> > > >     if (cache == NULL)
> > > >         cache = rte_mempool_default_cache(mp, rte_lcore_id());
> Any reason we need this? If we are expecting the PMD to store the
> pointer to cache and a NULL is passed, it would mean it is a mempool
> with no per-core cache?
> We could also leave the NULL check to the PMD.

Personally, I would strongly prefer requiring the cache pointer to be valid, and use RTE_ASSERT() here, instead of allowing a NULL pointer as a special case to look it up inside the function. I consider this special NULL case useless bloat, which does not belong in a fast path library function.

But I copied this approach from rte_mempool_cache_flush().

We could expose an "unsafe" function where is not allowed to pass NULL pointers, and a "safe" function (fixing the cache pointer if NULL) for consistency.

If the rte_mempool_cache_flush() function is popular, we could also expose an "unsafe" variant where passing NULL pointers are disallowed.

I wonder if there are any examples of such safe/unsafe variants in DPDK? It would be nice with a common naming convention for such function variants.

> 
> > > >     if (cache == NULL) {
> > > >         rte_errno = EINVAL;
> > > >         return NULL;
> > > >     }
> > > >
> > > >     rte_mempool_trace_cache_put_bulk_promise(cache, mp, n);
> > > >
> > > >     /* The request itself is too big for the cache */
> > > >     if (unlikely(n > cache->flushthresh)) {
> > > >         rte_errno = EINVAL;
> > > >         return NULL;
> > > >     }
> > > >
> > > >     /*
> > > >      * The cache follows the following algorithm:
> > > >      *   1. If the objects cannot be added to the cache without
> > > crossing
> > > >      *      the flush threshold, flush the cache to the backend.
> > > >      *   2. Add the objects to the cache.
> > > >      */
> > > >
> > > >     if (cache->len + n <= cache->flushthresh) {
> > > >         cache_objs = &cache->objs[cache->len];
> > > >         cache->len += n;
> > > >     } else {
> > > >         cache_objs = &cache->objs[0];
> > > >         rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
> > > >         cache->len = n;
> > > >     }
> > > >
> > > >     RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > >     RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> These are new stats. Do these break ABI compatibility (though these are
> under DEBUG flag)?

They are not mempool cache stats, they are only kept in the cache structure to provide alternative (i.e. faster) update access to some (i.e. the most often updated) of the existing mempool stats. The patch is [1], and part of a series currently being discussed if should go into 22.11-rc3 or not [2].

[1]: https://patchwork.dpdk.org/project/dpdk/patch/20221109181852.109856-3-mb@smartsharesystems.com/
[2]: http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D874A6@smartserver.smartshare.dk/T/#m41bf4e8bd886db49f11c8dbd63741b353277082f

> 
> > > >
> > > >     return cache_objs;
> > > > }
> > > >
> > > >
> > > > Med venlig hilsen / Kind regards,
> > > > -Morten Brørup
> > > >
> > >
> 


  reply	other threads:[~2022-11-09 20:37 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-05 13:40 Morten Brørup
2022-11-05 23:11 ` Honnappa Nagarahalli
2022-11-06  6:57   ` Morten Brørup
2022-11-09 17:57     ` Honnappa Nagarahalli
2022-11-09 20:36       ` Morten Brørup [this message]
2022-11-09 22:45         ` Honnappa Nagarahalli
2022-11-10 10:15           ` Morten Brørup
2022-11-10 11:00             ` Bruce Richardson
2022-11-11  4:24               ` Honnappa Nagarahalli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35D874A9@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=Kamalakshitha.Aligeri@arm.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=nd@arm.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).