From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
To: "Bruce Richardson" <bruce.richardson@intel.com>,
"Morten Brørup" <mb@smartsharesystems.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
"olivier.matz@6wind.com" <olivier.matz@6wind.com>,
"andrew.rybchenko@oktetlabs.ru" <andrew.rybchenko@oktetlabs.ru>,
Kamalakshitha Aligeri <Kamalakshitha.Aligeri@arm.com>,
nd <nd@arm.com>, nd <nd@arm.com>
Subject: RE: [RFC] mempool: zero-copy cache put bulk
Date: Fri, 11 Nov 2022 04:24:52 +0000 [thread overview]
Message-ID: <DBAPR08MB5814E30CEC55189691DF38AA98009@DBAPR08MB5814.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <Y2zZ2r5BwI95An32@bricha3-MOBL.ger.corp.intel.com>
<snip>
> > > > > > > From: Honnappa Nagarahalli
> > > [mailto:Honnappa.Nagarahalli@arm.com]
> > > > > > > Sent: Sunday, 6 November 2022 00.11
> > > > > > >
> > > > > > > + Akshitha, she is working on similar patch
> > > > > > >
> > > > > > > Few comments inline
> > > > > > >
> > > > > > > > From: Morten Br�rup <mb@smartsharesystems.com>
> > > > > > > > Sent: Saturday, November 5, 2022 8:40 AM
> > > > > > > >
> > > > > > > > Zero-copy access to the mempool cache is beneficial for
> > > > > > > > PMD
> > > > > > > performance,
> > > > > > > > and must be provided by the mempool library to fix [Bug
> > > > > > > > 1052] without
> > > > > > > a
> > > > > > > > performance regression.
> > > > > > > >
> > > > > > > > [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> > > > > > > >
> > > > > > > >
> > > > > > > > This RFC offers a conceptual zero-copy put function, where
> > > the
> > > > > > > application
> > > > > > > > promises to store some objects, and in return gets an
> > > > > > > > address
> > > > > where
> > > > > > > to store
> > > > > > > > them.
> > > > > > > >
> > > > > > > > I would like some early feedback.
> > > > > > > >
> > > > > > > > Notes:
> > > > > > > > * Allowing the 'cache' parameter to be NULL, and getting
> > > > > > > > it
> > > from
> > > > > the
> > > > > > > > mempool instead, was inspired by rte_mempool_cache_flush().
> > > > > > > I am not sure why the 'cache' parameter is required for this
> > > API.
> > > > > This
> > > > > > > API should take the mem pool as the parameter.
> > > > > > >
> > > > > > > We have based our API on 'rte_mempool_do_generic_put' and
> > > removed
> > > > > > the
> > > > > > > 'cache' parameter.
> > > > > >
> > > > > > I thoroughly considered omitting the 'cache' parameter, but
> > > included
> > > > > it for
> > > > > > two reasons:
> > > > > >
> > > > > > 1. The function is a "mempool cache" function (i.e. primarily
> > > > > > working
> > > > > on the
> > > > > > mempool cache), not a "mempool" function.
> > > > > >
> > > > > > So it is appropriate to have a pointer directly to the
> > > > > > structure
> > > it
> > > > > is working on.
> > > > > > Following this through, I also made 'cache' the first
> > > > > > parameter
> > > and
> > > > > 'mp' the
> > > > > > second, like in rte_mempool_cache_flush().
> > > > > I am wondering if the PMD should be aware of the cache or not.
> > > > > For
> > > ex:
> > > > > in the case of pipeline mode, the RX and TX side of the PMD are
> > > > > running on different cores.
> > > >
> > > > In that example, the PMD can store two cache pointers, one for
> > > > each
> > > of the
> > > > RX and TX side.
> > > I did not understand this. If RX core and TX core have their own
> > > per- core caches the logic would not work. For ex: the RX core cache
> > > would not get filled.
> > >
> > > In the case of pipeline mode, there will not be a per-core cache.
> > > The buffers would be allocated and freed from a global ring or a
> > > global lockless stack.
> >
> > Aha... Now I understand what you mean: You are referring to use cases
> where the mempool is configured to *not* have a mempool cache.
> >
> > For a mempool without a mempool cache, the proposed "mempool cache"
> zero-copy functions can obviously not be used.
> >
> > We need "mempool" zero-copy functions for the mempools that have no
> mempool cache.
> >
> > However, those functions depend on the mempool's underlying backing
> store.
> >
> > E.g. zero-copy access to a ring has certain requirements [1].
> >
> > [1]:
> > http://doc.dpdk.org/guides/prog_guide/ring_lib.html#ring-peek-zero-cop
> > y-api
> >
> > For a stack, I think it is possible to locklessly zero-copy pop objects. But it is
> impossible to locklessly zero-copy push elements to a stack; another thread
> can race to pop some objects from the stack before the pushing thread has
> finished writing them into the stack.
> >
> > Furthermore, the ring zero-copy get function cannot return a consecutive
> array of objects when wrapping, and PMD functions using vector instructions
> usually rely on handling chunks of e.g. 8 objects.
> >
> > Just for a second, let me theorize into the absurd: Even worse, if a
> mempool's underlying backing store does not use an array of pointers as its
> internal storage structure, it is impossible to use a pointer to an array of
> pointers for zero-copy transactions. E.g. if the backing store uses a list or a
> tree structure for its storage, a pointer to somewhere in the list or tree
> structure is not an array of objects pointers.
> >
> > Anyway, we could consider designing a generic API for zero-copy mempool
> get/put; but it should be compatible with all underlying backing stores - or
> return failure, so the PMD can fall back to the standard functions, if the
> mempool is in a state where zero-copy access to a contiguous burst cannot
> be provided. E.g. zero-copy get from a ring can return failure when zero-copy
> access to the ring is temporarily unavailable due to being at a point where it
> would wrap.
> >
> > Here is a conceptual proposal for such an API.
> >
> > /* Mempool zero-copy transaction state. Opaque outside the mempool
> > API. */ struct rte_mempool_zc_transaction_state {
> > char opaque[RTE_CACHE_LINE_SIZE];
> > };
> >
> > /** Start zero-copy get/put bulk transaction.
> > *
> > * @param[in] mp
> > * Pointer to the mempool.
> > * @param[out] obj_table_ptr
> > * Where to store the pointer to
> > * the zero-copy array of objects in the mempool.
> > * @param[in] n
> > * Number of objects in the transaction.
> > * @param[in] cache
> > * Pointer to the mempool cache. May be NULL if unknown.
> > * @param[out] transaction
> > * Where to store the opaque transaction information.
> > * Used internally by the mempool library.
> > * @return
> > * - 1: Transaction completed;
> > * '_finish' must not be called.
> > * - 0: Transaction started;
> > * '_finish' must be called to complete the transaction.
> > * - <0: Error; failure code.
> > */
> > static __rte_always_inline int
> > rte_mempool_get/put_zc_bulk_start(
> > struct rte_mempool *mp,
> > void ***obj_table_ptr,
> > unsigned int n,
> > struct rte_mempool_cache *cache,
> > rte_mempool_zc_transaction_state *transaction);
> >
> > /** Finish zero-copy get/put bulk transaction.
> > *
> > * @param[in] mp
> > * Pointer to mempool.
> > * @param[in] obj_table_ptr
> > * Pointer to the zero-copy array of objects in the mempool,
> > * returned by the 'start' function.
> > * @param[in] n
> > * Number of objects in the transaction.
> > * Must be the same as for the 'start' function.
> > * @param[in] transaction
> > * Opaque transaction information,
> > * returned by the 'start' function.
> > * Used internally by the mempool library.
> > */
> > static __rte_always_inline void
> > rte_mempool_get/put_zc_bulk_finish(
> > struct rte_mempool *mp,
> > void **obj_table,
> > unsigned int n,
> > rte_mempool_zc_transaction_state *transaction);
> >
> > Note that these are *bulk* functions, so 'n' has to remain the same for a
> 'finish' call as it was for the 'start' call of a transaction.
> >
> > And then the underlying backing stores would need to provide callbacks
> that implement these functions, if they offer zero-copy functionality.
> >
> > The mempool implementation of these could start by checking for a
> mempool cache, and use the "mempool cache" zero-copy if present.
> >
> > Some internal state information (from the mempool library or the
> underlying mempool backing store) may need to be carried over from the
> 'start' to the 'finish' function, so I have added a transaction state parameter.
> The transaction state must be held by the application for thread safety
> reasons. Think of this like the 'flags' parameter to the Linux kernel's
> spin_lock_irqsave/irqrestore() functions.
> >
> > We could omit the 'obj_table' and 'n' parameters from the 'finish' functions
> and store them in the transaction state if needed; but we might possibly
> achieve higher performance by passing them as parameters instead.
> >
> > >
> > > >
> > > > And if the PMD is unaware of the cache pointer, it can look it up
> > > > at
> > > runtime
> > > > using rte_lcore_id(), like it does in the current Intel PMDs.
> > > >
> > > > > However, since the rte_mempool_cache_flush API is provided, may
> > > > > be that decision is already done? Interestingly,
> > > rte_mempool_cache_flush
> > > > > is called by just a single PMD.
> > > >
> > > > I intentionally aligned this RFC with rte_mempool_cache_flush() to
> > > maintain
> > > > consistency.
> > > >
> > > > However, the API is not set in stone. It should always be
> > > > acceptable
> > > to
> > > > consider improved alternatives.
> > > >
> > > > >
> > > > > So, the question is, should we allow zero-copy only for per-core
> > > cache
> > > > > or for other cases as well.
> > > >
> > > > I suppose that the mempool library was designed to have a mempool
> > > > associated with exactly one mempool cache per core.
> > > > (Alternatively,
> > > the
> > > > mempool can be configured with no mempool caches at all.)
> > > >
> > > > We should probably stay loyal to that design concept, and only
> > > > allow
> > > zero-
> > > > copy for per-core cache.
> > > >
> > > > If you can come up with an example of the opposite, I would like
> > > > to
> > > explore
> > > > that option too... I can't think of a good example myself, and
> > > perhaps I'm
> > > > overlooking a relevant use case.
> > > The use case I am talking about is the pipeline mode as I mentioned
> > > above. Let me know if you agree.
> >
> > I see what you mean, and I don't object. :-)
> >
> > However, I still think the "mempool cache" zero-copy functions could be
> useful.
> >
> > They would be needed for the generic "mempool" zero-copy functions
> anyway.
> >
> > And the "mempool cache" zero-copy functions are much simpler to design,
> implement and use than the "mempool" zero-copy functions, so it is a good
> first step.
> >
> I would think that even in pipeline mode applications a mempool cache
> would still be very useful, as it would like reduce the number of accesses to
> the underlying shared ring or stack resources.
>
> For example, if, as is common, the application works in bursts of up to 32, but
> the mempool cache was configured as 128, it means that the TX side would
> flush buffers to the shared pool at most every 4 bursts and likely less
> frequently than that due to the bursts themselves not always being the full
> 32. Since accesses to the underlying ring and stack tend to be slow due to
> locking or atomic operations, the more you can reduce the accesses the
> better.
>
> Therefore, I would very much consider use of a mempool without a cache as
> an edge-case - but one we need to support, but not optimize for, since
> mempool accesses without cache would already be rather slow.
Understood. Looks like supporting APIs for per-core cache is enough for now. I do not see a lot of advantages for mempool without a cache.
>
> My 2c here.
> /Bruce
prev parent reply other threads:[~2022-11-11 4:25 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-05 13:40 Morten Brørup
2022-11-05 23:11 ` Honnappa Nagarahalli
2022-11-06 6:57 ` Morten Brørup
2022-11-09 17:57 ` Honnappa Nagarahalli
2022-11-09 20:36 ` Morten Brørup
2022-11-09 22:45 ` Honnappa Nagarahalli
2022-11-10 10:15 ` Morten Brørup
2022-11-10 11:00 ` Bruce Richardson
2022-11-11 4:24 ` Honnappa Nagarahalli [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DBAPR08MB5814E30CEC55189691DF38AA98009@DBAPR08MB5814.eurprd08.prod.outlook.com \
--to=honnappa.nagarahalli@arm.com \
--cc=Kamalakshitha.Aligeri@arm.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=mb@smartsharesystems.com \
--cc=nd@arm.com \
--cc=olivier.matz@6wind.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).