RE: [RFC] mempool: zero-copy cache put bulk

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
To: "Morten Brørup" <mb@smartsharesystems.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"olivier.matz@6wind.com" <olivier.matz@6wind.com>,
	"andrew.rybchenko@oktetlabs.ru" <andrew.rybchenko@oktetlabs.ru>,
	"Kamalakshitha Aligeri" <Kamalakshitha.Aligeri@arm.com>,
	"Bruce Richardson" <bruce.richardson@intel.com>
Cc: nd <nd@arm.com>, nd <nd@arm.com>
Subject: RE: [RFC] mempool: zero-copy cache put bulk
Date: Wed, 9 Nov 2022 22:45:39 +0000	[thread overview]
Message-ID: <DBAPR08MB5814C6F76518FF160DFA7771983E9@DBAPR08MB5814.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D874A9@smartserver.smartshare.dk>

<snip>

> 
> +To: Bruce also showed interest in this topic, and might have more insights.
> 
> > From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> > Sent: Wednesday, 9 November 2022 18.58
> >
> > <snip>
> >
> > >
> > > > From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> > > > Sent: Sunday, 6 November 2022 00.11
> > > >
> > > > + Akshitha, she is working on similar patch
> > > >
> > > > Few comments inline
> > > >
> > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > Sent: Saturday, November 5, 2022 8:40 AM
> > > > >
> > > > > Zero-copy access to the mempool cache is beneficial for PMD
> > > > performance,
> > > > > and must be provided by the mempool library to fix [Bug 1052]
> > > > > without
> > > > a
> > > > > performance regression.
> > > > >
> > > > > [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> > > > >
> > > > >
> > > > > This RFC offers a conceptual zero-copy put function, where the
> > > > application
> > > > > promises to store some objects, and in return gets an address
> > where
> > > > to store
> > > > > them.
> > > > >
> > > > > I would like some early feedback.
> > > > >
> > > > > Notes:
> > > > > * Allowing the 'cache' parameter to be NULL, and getting it from
> > the
> > > > > mempool instead, was inspired by rte_mempool_cache_flush().
> > > > I am not sure why the 'cache' parameter is required for this API.
> > This
> > > > API should take the mem pool as the parameter.
> > > >
> > > > We have based our API on 'rte_mempool_do_generic_put' and removed
> > > the
> > > > 'cache' parameter.
> > >
> > > I thoroughly considered omitting the 'cache' parameter, but included
> > it for
> > > two reasons:
> > >
> > > 1. The function is a "mempool cache" function (i.e. primarily
> > > working
> > on the
> > > mempool cache), not a "mempool" function.
> > >
> > > So it is appropriate to have a pointer directly to the structure it
> > is working on.
> > > Following this through, I also made 'cache' the first parameter and
> > 'mp' the
> > > second, like in rte_mempool_cache_flush().
> > I am wondering if the PMD should be aware of the cache or not. For ex:
> > in the case of pipeline mode, the RX and TX side of the PMD are
> > running on different cores.
> 
> In that example, the PMD can store two cache pointers, one for each of the
> RX and TX side.
I did not understand this. If RX core and TX core have their own per-core caches the logic would not work. For ex: the RX core cache would not get filled.

In the case of pipeline mode, there will not be a per-core cache. The buffers would be allocated and freed from a global ring or a global lockless stack.

> 
> And if the PMD is unaware of the cache pointer, it can look it up at runtime
> using rte_lcore_id(), like it does in the current Intel PMDs.
> 
> > However, since the rte_mempool_cache_flush API is provided, may be
> > that decision is already done? Interestingly, rte_mempool_cache_flush
> > is called by just a single PMD.
> 
> I intentionally aligned this RFC with rte_mempool_cache_flush() to maintain
> consistency.
> 
> However, the API is not set in stone. It should always be acceptable to
> consider improved alternatives.
> 
> >
> > So, the question is, should we allow zero-copy only for per-core cache
> > or for other cases as well.
> 
> I suppose that the mempool library was designed to have a mempool
> associated with exactly one mempool cache per core. (Alternatively, the
> mempool can be configured with no mempool caches at all.)
> 
> We should probably stay loyal to that design concept, and only allow zero-
> copy for per-core cache.
> 
> If you can come up with an example of the opposite, I would like to explore
> that option too... I can't think of a good example myself, and perhaps I'm
> overlooking a relevant use case.
The use case I am talking about is the pipeline mode as I mentioned above. Let me know if you agree.

> 
> >
> > >
> > > 2. In most cases, the function only accesses the mempool structure
> > > in
> > order to
> > > get the cache pointer. Skipping this step improves performance.
> > >
> > > And since the cache is created along with the mempool itself (and
> > thus never
> > > changes for a mempool), it would be safe for the PMD to store the
> > 'cache'
> > > pointer along with the 'mp' pointer in the PMD's queue structure.
> > Agreed
> >
> > >
> > > E.g. in the i40e PMD the i40e_rx_queue structure could include a
> > "struct
> > > rte_mempool_cache *cache" field, which could be used
> > > i40e_rxq_rearm()
> > [1]
> > > instead of "cache = rte_mempool_default_cache(rxq->mp,
> > rte_lcore_id())".
> > >
> > > [1] https://elixir.bootlin.com/dpdk/v22.11-
> > > rc2/source/drivers/net/i40e/i40e_rxtx_vec_avx512.c#L31
> > >
> > > > This new API, on success, returns the pointer to memory where the
> > > > objects are copied. On failure it returns NULL and the caller has
> > to
> > > > call 'rte_mempool_ops_enqueue_bulk'. Alternatively, the new API
> > could
> > > > do this as well and PMD does not need to do anything if it gets a
> > NULL
> > > > pointer.
> > >
> > > Yes, we agree about these two details:
> > >
> > > 1. The function should return a pointer, not an integer.
> > > It would be a waste to use a another CPU register to convey a
> > success/error
> > > integer value, when the success/failure information is just as
> > > easily
> > conveyed
> > > by the pointer return value (non-NULL/NULL), and rte_errno for
> > various error
> > > values in the unlikely cases.
> > >
> > > 2. The function should leave it up to the PMD what to do if direct
> > access to
> > > the cache is unavailable.
> > Just wondering about the advantage of this. I do not think PMD's have
> > much of a choice other than calling 'rte_mempool_ops_enqueue_bulk'
> 
> I agree, but that was not my point. Let me try to rephrase:
> 
> The PMD is more likely to know how to efficiently build the array of mbufs to
> pass to rte_mempool_ops_enqueue_bulk() than the mempool library - many
> PMDs already implement a variety of vector instruction variants to do exactly
> that. So we should not try to be clever and add a fallback path - this job
> belongs in the PMD.
> 
> The PMD might not even have the array of mbufs lined up when calling
> rte_mempool_cache_put_bulk_promise(). The PMD could have an array of
> internal structures, where the mbuf pointer is an element in that structure.
Agree, you are correct. We should leave it to PMD to handle the failure case.

> 
> >
> > >
> > > >
> > > > We should think about providing  similar API on the RX side to
> > > > keep
> > it
> > > > symmetric.
> > >
> > > I sent an RFC for that too:
> > >
> http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D87488@
> > > smartserver.smartshare.dk/T/#u
> > >
> > >
> > > >
> > > > > * Asserting that the 'mp' parameter is not NULL is not done by
> > other
> > > > > functions, so I omitted it here too.
> > > > >
> > > > > NB: Please ignore formatting. Also, this code has not even been
> > > > compile
> > > > > tested.
> > > > We are little bit ahead, tested the changes with i40e PF PMD,
> > > > wrote unit test cases, going through internal review, will send
> > > > out RFC
> > on
> > > > Monday
> > >
> > > Sounds good. Looking forward to review.
> > >
> > > >
> > > > >
> > > > > /**
> > > > >  * Promise to put objects in a mempool via zero-copy access to a
> > > > user-owned
> > > > > mempool cache.
> > > > >  *
> > > > >  * @param cache
> > > > >  *   A pointer to the mempool cache.
> > > > >  * @param mp
> > > > >  *   A pointer to the mempool.
> > > > >  * @param n
> > > > >  *   The number of objects to be put in the mempool cache.
> > > > >  * @return
> > > > >  *   The pointer to where to put the objects in the mempool
> > cache.
> > > > >  *   NULL on error
> > > > >  *   with rte_errno set appropriately.
> > > > >  */
> > > > > static __rte_always_inline void *
> > > > > rte_mempool_cache_put_bulk_promise(struct rte_mempool_cache
> > > *cache,
> > > > >         struct rte_mempool *mp,
> > > > >         unsigned int n)
> > > > > {
> > > > >     void **cache_objs;
> > > > >
> > > > >     if (cache == NULL)
> > > > >         cache = rte_mempool_default_cache(mp, rte_lcore_id());
> > Any reason we need this? If we are expecting the PMD to store the
> > pointer to cache and a NULL is passed, it would mean it is a mempool
> > with no per-core cache?
> > We could also leave the NULL check to the PMD.
> 
> Personally, I would strongly prefer requiring the cache pointer to be valid,
> and use RTE_ASSERT() here, instead of allowing a NULL pointer as a special
> case to look it up inside the function. I consider this special NULL case useless
> bloat, which does not belong in a fast path library function.
> 
> But I copied this approach from rte_mempool_cache_flush().
The API definition does not bind it to do this check. We might be able to delete the check in rte_mempool_cache_flush.

> 
> We could expose an "unsafe" function where is not allowed to pass NULL
> pointers, and a "safe" function (fixing the cache pointer if NULL) for
> consistency.
> 
> If the rte_mempool_cache_flush() function is popular, we could also expose
> an "unsafe" variant where passing NULL pointers are disallowed.
> 
> I wonder if there are any examples of such safe/unsafe variants in DPDK? It
> would be nice with a common naming convention for such function variants.
> 
> >
> > > > >     if (cache == NULL) {
> > > > >         rte_errno = EINVAL;
> > > > >         return NULL;
> > > > >     }
> > > > >
> > > > >     rte_mempool_trace_cache_put_bulk_promise(cache, mp, n);
> > > > >
> > > > >     /* The request itself is too big for the cache */
> > > > >     if (unlikely(n > cache->flushthresh)) {
> > > > >         rte_errno = EINVAL;
> > > > >         return NULL;
> > > > >     }
> > > > >
> > > > >     /*
> > > > >      * The cache follows the following algorithm:
> > > > >      *   1. If the objects cannot be added to the cache without
> > > > crossing
> > > > >      *      the flush threshold, flush the cache to the backend.
> > > > >      *   2. Add the objects to the cache.
> > > > >      */
> > > > >
> > > > >     if (cache->len + n <= cache->flushthresh) {
> > > > >         cache_objs = &cache->objs[cache->len];
> > > > >         cache->len += n;
> > > > >     } else {
> > > > >         cache_objs = &cache->objs[0];
> > > > >         rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
> > > > >         cache->len = n;
> > > > >     }
> > > > >
> > > > >     RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > > >     RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > These are new stats. Do these break ABI compatibility (though these
> > are under DEBUG flag)?
> 
> They are not mempool cache stats, they are only kept in the cache structure
> to provide alternative (i.e. faster) update access to some (i.e. the most often
> updated) of the existing mempool stats. The patch is [1], and part of a series
> currently being discussed if should go into 22.11-rc3 or not [2].
> 
> [1]:
> https://patchwork.dpdk.org/project/dpdk/patch/20221109181852.109856-3-
> mb@smartsharesystems.com/
> [2]:
> http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D874A6
> @smartserver.smartshare.dk/T/#m41bf4e8bd886db49f11c8dbd63741b35327
> 7082f
> 
> >
> > > > >
> > > > >     return cache_objs;
> > > > > }
> > > > >
> > > > >
> > > > > Med venlig hilsen / Kind regards, -Morten Brørup
> > > > >
> > > >
> >

next prev parent reply	other threads:[~2022-11-09 22:46 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-05 13:40 Morten Brørup
2022-11-05 23:11 ` Honnappa Nagarahalli
2022-11-06  6:57   ` Morten Brørup
2022-11-09 17:57     ` Honnappa Nagarahalli
2022-11-09 20:36       ` Morten Brørup
2022-11-09 22:45         ` Honnappa Nagarahalli [this message]
2022-11-10 10:15           ` Morten Brørup
2022-11-10 11:00             ` Bruce Richardson
2022-11-11  4:24               ` Honnappa Nagarahalli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DBAPR08MB5814C6F76518FF160DFA7771983E9@DBAPR08MB5814.eurprd08.prod.outlook.com \
    --to=honnappa.nagarahalli@arm.com \
    --cc=Kamalakshitha.Aligeri@arm.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=mb@smartsharesystems.com \
    --cc=nd@arm.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).