[RFC]: mempool: zero-copy cache get bulk

DPDK patches and discussions
 help / color / mirror / Atom feed

* [RFC]: mempool: zero-copy cache get bulk
@ 2022-11-05 13:19 Morten Brørup
  2022-11-07  9:19 ` Bruce Richardson
                   ` (9 more replies)
  0 siblings, 10 replies; 36+ messages in thread
From: Morten Brørup @ 2022-11-05 13:19 UTC (permalink / raw)
  To: dev, olivier.matz, andrew.rybchenko, honnappa.nagarahalli

Zero-copy access to the mempool cache is beneficial for PMD performance, and must be provided by the mempool library to fix [Bug 1052] without a performance regression.

[Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052


This RFC offers two conceptual variants of zero-copy get:
1. A simple version.
2. A version where existing (hot) objects in the cache are moved to the top of the cache before new objects from the backend driver are pulled in.

I would like some early feedback. Also, which variant do you prefer?

Notes:
* Allowing the 'cache' parameter to be NULL, and getting it from the mempool instead, was inspired by rte_mempool_cache_flush().
* Asserting that the 'mp' parameter is not NULL is not done by other functions, so I omitted it here too.

NB: Please ignore formatting. Also, this code has not even been compile tested.


PS: No promises, but I expect to offer an RFC for zero-copy put too. :-)


1. Simple version:

/**
 * Get objects from a mempool via zero-copy access to a user-owned mempool cache.
 *
 * @param cache
 *   A pointer to the mempool cache.
 * @param mp
 *   A pointer to the mempool.
 * @param n
 *   The number of objects to prefetch into the mempool cache.
 * @return
 *   The pointer to the objects in the mempool cache.
 *   NULL on error
 *   with rte_errno set appropriately.
 */
static __rte_always_inline void *
rte_mempool_cache_get_bulk(struct rte_mempool_cache *cache,
        struct rte_mempool *mp,
        unsigned int n)
{
    unsigned int len;

    if (cache == NULL)
        cache = rte_mempool_default_cache(mp, rte_lcore_id());
    if (cache == NULL) {
        rte_errno = EINVAL;
        goto fail;
    }

    rte_mempool_trace_cache_get_bulk(cache, mp, n);

    len = cache->len;

    if (unlikely(n > len)) {
        unsigned int size;

        if (unlikely(n > RTE_MEMPOOL_CACHE_MAX_SIZE)) {
            rte_errno = EINVAL;
            goto fail;
        }

        /* Fill the cache from the backend; fetch size + requested - len objects. */
        size = cache->size;

        ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], size + n - len);
        if (unlikely(ret < 0)) {
            /*
             * We are buffer constrained.
             * Do not fill the cache, just satisfy the request.
             */
            ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], n - len);
            if (unlikely(ret < 0)) {
                rte_errno = -ret;
                goto fail;
            }

            len = 0;
        } else
            len = size;
    } else
        len -= n;

    cache->len = len;

    RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
    RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);

    return &cache->objs[len];

fail:

    RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
    RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);

    return NULL;
}


2. Advanced version:

/**
 * Get objects from a mempool via zero-copy access to a user-owned mempool cache.
 *
 * @param cache
 *   A pointer to the mempool cache.
 * @param mp
 *   A pointer to the mempool.
 * @param n
 *   The number of objects to prefetch into the mempool cache.
 * @return
 *   The pointer to the objects in the mempool cache.
 *   NULL on error
 *   with rte_errno set appropriately.
 */
static __rte_always_inline void *
rte_mempool_cache_get_bulk(struct rte_mempool_cache *cache,
        struct rte_mempool *mp,
        unsigned int n)
{
    unsigned int len;

    if (cache == NULL)
        cache = rte_mempool_default_cache(mp, rte_lcore_id());
    if (cache == NULL) {
        rte_errno = EINVAL;
        goto fail;
    }

    rte_mempool_trace_cache_get_bulk(cache, mp, n);

    len = cache->len;

    if (unlikely(n > len)) {
        unsigned int size;

        if (unlikely(n > RTE_MEMPOOL_CACHE_MAX_SIZE)) {
            rte_errno = EINVAL;
            goto fail;
        }

        /* Fill the cache from the backend; fetch size + requested - len objects. */
        size = cache->size;

        if (likely(size + n >= 2 * len)) {
            /*
             * No overlap when copying (dst >= len): size + n - len >= len.
             * Move (i.e. copy) the existing objects in the cache to the
             * coming top of the cache, to make room for new objects below.
             */
            rte_memcpy(&cache->objs[size + n - len], &cache->objs[0], len);

            /* Fill the cache below the existing objects in the cache. */
            ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[0], size + n - len);
            if (unlikely(ret < 0)) {
                goto constrained;
            } else
                len = size;
        } else {
            /* Fill the cache on top of any objects in it. */
            ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], size + n - len);
            if (unlikely(ret < 0)) {

constrained:
                /*
                 * We are buffer constrained.
                 * Do not fill the cache, just satisfy the request.
                 */
                ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], n - len);
                if (unlikely(ret < 0)) {
                    rte_errno = -ret;
                    goto fail;
                }

                len = 0;
            } else
                len = size;
        }
    } else
        len -= n;

    cache->len = len;

    RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
    RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);

    return &cache->objs[len];

fail:

    RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
    RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);

    return NULL;
}


Med venlig hilsen / Kind regards,
-Morten Brørup


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC]: mempool: zero-copy cache get bulk
  2022-11-05 13:19 [RFC]: mempool: zero-copy cache get bulk Morten Brørup
@ 2022-11-07  9:19 ` Bruce Richardson
  2022-11-07 14:32   ` Morten Brørup
  2022-11-15 16:18 ` [PATCH] mempool cache: add zero-copy get and put functions Morten Brørup
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Bruce Richardson @ 2022-11-07  9:19 UTC (permalink / raw)
  To: Morten Brørup
  Cc: dev, olivier.matz, andrew.rybchenko, honnappa.nagarahalli

On Sat, Nov 05, 2022 at 02:19:13PM +0100, Morten Brørup wrote:
> Zero-copy access to the mempool cache is beneficial for PMD performance, and must be provided by the mempool library to fix [Bug 1052] without a performance regression.
> 
> [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> 
> 
> This RFC offers two conceptual variants of zero-copy get:
> 1. A simple version.
> 2. A version where existing (hot) objects in the cache are moved to the top of the cache before new objects from the backend driver are pulled in.
> 
> I would like some early feedback. Also, which variant do you prefer?
> 
> Notes:
> * Allowing the 'cache' parameter to be NULL, and getting it from the mempool instead, was inspired by rte_mempool_cache_flush().
> * Asserting that the 'mp' parameter is not NULL is not done by other functions, so I omitted it here too.
> 
> NB: Please ignore formatting. Also, this code has not even been compile tested.
> 
> 
> PS: No promises, but I expect to offer an RFC for zero-copy put too. :-)
> 

Thanks for this work, I think it's good to have. The existing functions
could probably be reworked to use this new code too, right, since the copy
at the end would be all that is needed to complete the implementation?

Only real comment I have on this version is that I am not sure about the
naming. I think having "cache_get_bulk" doesn't really make it very clear
what the function does, that is removes items from the cache without
copying them elsewhere. How about:

- rte_mempool_cache_pop?
- rte_mempool_cache_reserve?

I would tend to prefer the former, since the latter implies that there
needs to be a follow-up call to clear the reservation. On the other hand,
reserve does give the correct impression that the elements are still there
in the mempool cache.

Others may have better suggestions, since, as we know, naming things is
hard! :)

Overall, though, I think this is very good to have.
/Bruce

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [RFC]: mempool: zero-copy cache get bulk
  2022-11-07  9:19 ` Bruce Richardson
@ 2022-11-07 14:32   ` Morten Brørup
  0 siblings, 0 replies; 36+ messages in thread
From: Morten Brørup @ 2022-11-07 14:32 UTC (permalink / raw)
  To: Bruce Richardson, Kamalakshitha Aligeri
  Cc: dev, olivier.matz, andrew.rybchenko, honnappa.nagarahalli, nd

+ Akshitha, apparently working on similar patches

> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Monday, 7 November 2022 10.19
> 
> On Sat, Nov 05, 2022 at 02:19:13PM +0100, Morten Brørup wrote:
> > Zero-copy access to the mempool cache is beneficial for PMD
> performance, and must be provided by the mempool library to fix [Bug
> 1052] without a performance regression.
> >
> > [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> >
> >
> > This RFC offers two conceptual variants of zero-copy get:
> > 1. A simple version.
> > 2. A version where existing (hot) objects in the cache are moved to
> the top of the cache before new objects from the backend driver are
> pulled in.
> >
> > I would like some early feedback. Also, which variant do you prefer?
> >
> > Notes:
> > * Allowing the 'cache' parameter to be NULL, and getting it from the
> mempool instead, was inspired by rte_mempool_cache_flush().
> > * Asserting that the 'mp' parameter is not NULL is not done by other
> functions, so I omitted it here too.
> >
> > NB: Please ignore formatting. Also, this code has not even been
> compile tested.
> >
> >
> > PS: No promises, but I expect to offer an RFC for zero-copy put too.
> :-)
> >
> 
> Thanks for this work, I think it's good to have. The existing functions
> could probably be reworked to use this new code too, right, since the
> copy
> at the end would be all that is needed to complete the implementation?

Only for the likely case, where the request can be fulfilled entirely from the cache.

Not for the corner case, where only some of the objects are in the cache, so the cache needs to be refilled from the backing store.

E.g. requesting 32 objects, and 8 objects are in the cache. (Those 8 object are assumed to be hot, as opposed to the cold objects pulled in from the backing store, and were given preferential treatment with commit [a2833ecc5ea4adcbc3b77e7aeac2a6fd945da6a0].)

[a2833ecc5ea4adcbc3b77e7aeac2a6fd945da6a0]: http://git.dpdk.org/dpdk/commit/lib/mempool/rte_mempool.h?id=a2833ecc5ea4adcbc3b77e7aeac2a6fd945da6a0

The existing function copies the 8 existing objects directly to the final destination, then refills the cache from the backing store, and then copies the remaining 24 objects directly to the final destination.

The "2. variant" in this RFC handles this corner case by moving the 8 objects in the cache to the new top of the cache, and then refilling the cache from the backing store. And it can only move those 8 objects around in the cache if there is room for them. (The 32 returned objects are, ordered from top to bottom of the stack: 8 hot and 24 new.)

On other words: If we replaced the existing function with this function plus copying at the end, the corner case will perform additional copying (moving the objects around in the stack), whereas the existing function only copies each object once.

While I usually agree 100 % about avoiding code duplication, I think the difference in behavior between the existing and the new functions warrants two separate implementations.

Please also note: The cache is a stack, so when accessing the cache directly, objects should be retrieved in reverse order. (This should be mentioned in the function description!) The existing function reverses the order of the objects when returning them, so the application can use them in normal order.

> 
> Only real comment I have on this version is that I am not sure about
> the
> naming. I think having "cache_get_bulk" doesn't really make it very
> clear
> what the function does, that is removes items from the cache without
> copying them elsewhere. How about:
> 
> - rte_mempool_cache_pop?
> - rte_mempool_cache_reserve?
> 
> I would tend to prefer the former, since the latter implies that there
> needs to be a follow-up call to clear the reservation. On the other
> hand,
> reserve does give the correct impression that the elements are still
> there
> in the mempool cache.
> 
> Others may have better suggestions, since, as we know, naming things is
> hard! :)

- rte_mempool_cache_prefetch_bulk?
- rte_mempool_cache_get_bulk_promise?

When I came up with the function name rte_mempool_cache_put_bulk_promise for the sister function [1], I thought along the same lines as you, Bruce. It is important that the function name doesn't imply that there is a follow-up function to indicate that the transaction has been completed. (Before working on that, I assumed that a "prepare" and "commit" pair of functions were required, but the function turned out to be much simpler than anticipated.)

[1]: http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D87489@smartserver.smartshare.dk/#t

The mempool library offers single-object functions, so _bulk should be part of the function name, to indicate that the function operates on more than one object.

> 
> Overall, though, I think this is very good to have.
> /Bruce

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] mempool cache: add zero-copy get and put functions
  2022-11-05 13:19 [RFC]: mempool: zero-copy cache get bulk Morten Brørup
  2022-11-07  9:19 ` Bruce Richardson
@ 2022-11-15 16:18 ` Morten Brørup
  2022-11-16 18:04 ` [PATCH v2] " Morten Brørup
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 36+ messages in thread
From: Morten Brørup @ 2022-11-15 16:18 UTC (permalink / raw)
  To: olivier.matz, andrew.rybchenko, honnappa.nagarahalli,
	kamalakshitha.aligeri, bruce.richardson, dev
  Cc: nd, Morten Brørup

Zero-copy access to mempool caches is beneficial for PMD performance, and must be provided by the mempool library to fix [Bug 1052] without a performance regression.

[Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052

Changes from the RFC:
* Removed run-time parameter checks. (Honnappa)
  This is a hot fast path function; requiring correct application
  behaviour, i.e. function parameters must be valid.
* Added RTE_ASSERT for parameters instead.
  Code for this is only generated if built with RTE_ENABLE_ASSERT.
* Removed fallback when 'cache' parameter is not set. (Honnappa)
* Chose the simple get function; i.e. do not move the existing objects in
  the cache to the top of the new stack, just leave them at the bottom.
* Renamed the functions. Other suggestions are welcome, of course. ;-)
* Updated the function descriptions.
* Added the functions to trace_fp and version.map.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
 lib/mempool/rte_mempool.h          | 124 +++++++++++++++++++++++++++++
 lib/mempool/rte_mempool_trace_fp.h |  16 ++++
 lib/mempool/version.map            |   6 ++
 3 files changed, 146 insertions(+)

diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 9f530db24b..7254ecab2a 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -47,6 +47,7 @@
 #include <rte_ring.h>
 #include <rte_memcpy.h>
 #include <rte_common.h>
+#include <rte_errno.h>
 
 #include "rte_mempool_trace_fp.h"
 
@@ -1346,6 +1347,129 @@ rte_mempool_cache_flush(struct rte_mempool_cache *cache,
 	cache->len = 0;
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy put objects in a user-owned mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to be put in the mempool cache.
+ *   Must not exceed RTE_MEMPOOL_CACHE_MAX_SIZE.
+ * @return
+ *   The pointer to where to put the objects in the mempool cache.
+ */
+ __rte_experimental
+static __rte_always_inline void *
+rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	void **cache_objs;
+
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+	RTE_ASSERT(n <= RTE_MEMPOOL_CACHE_MAX_SIZE);
+
+	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
+
+	/* Increment stats now, adding in mempool always succeeds. */
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
+
+	/*
+	 * The cache follows the following algorithm:
+	 *   1. If the objects cannot be added to the cache without crossing
+	 *      the flush threshold, flush the cache to the backend.
+	 *   2. Add the objects to the cache.
+	 */
+
+	if (cache->len + n <= cache->flushthresh) {
+		cache_objs = &cache->objs[cache->len];
+		cache->len += n;
+	} else {
+		cache_objs = &cache->objs[0];
+		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
+		cache->len = n;
+	}
+
+	return cache_objs;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy get objects from a user-owned mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to prefetch into the mempool cache.
+ *   Must not exceed RTE_MEMPOOL_CACHE_MAX_SIZE.
+ * @return
+ *   The pointer to the objects in the mempool cache.
+ *   NULL on error; i.e. the cache + the pool does not contain n objects.
+ *   With rte_errno set to the error code of the mempool dequeue function.
+ */
+ __rte_experimental
+static __rte_always_inline void *
+rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	unsigned int len;
+
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+	RTE_ASSERT(n <= RTE_MEMPOOL_CACHE_MAX_SIZE);
+
+	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
+
+	len = cache->len;
+
+	if (unlikely(n > len)) {
+		/* Fill the cache from the backend; fetch size + requested - len objects. */
+		int ret;
+		const unsigned int size = cache->size;
+
+		ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], size + n - len);
+		if (unlikely(ret < 0)) {
+			/*
+			 * We are buffer constrained.
+			 * Do not fill the cache, just satisfy the request.
+			 */
+			ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], n - len);
+			if (unlikely(ret < 0)) {
+				/* Unable to satisfy the request. */
+
+				RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
+				RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
+
+				rte_errno = -ret;
+				return NULL;
+			}
+
+			len = 0;
+		} else
+			len = size;
+	} else
+		len -= n;
+
+	cache->len = len;
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
+	return &cache->objs[len];
+}
+
 /**
  * @internal Put several objects back in the mempool; used internally.
  * @param mp
diff --git a/lib/mempool/rte_mempool_trace_fp.h b/lib/mempool/rte_mempool_trace_fp.h
index ed060e887c..00567fb1cf 100644
--- a/lib/mempool/rte_mempool_trace_fp.h
+++ b/lib/mempool/rte_mempool_trace_fp.h
@@ -109,6 +109,22 @@ RTE_TRACE_POINT_FP(
 	rte_trace_point_emit_ptr(mempool);
 )
 
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_put_bulk,
+	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_ptr(mempool);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_get_bulk,
+	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_ptr(mempool);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/mempool/version.map b/lib/mempool/version.map
index b67d7aace7..967ded619f 100644
--- a/lib/mempool/version.map
+++ b/lib/mempool/version.map
@@ -63,6 +63,12 @@ EXPERIMENTAL {
 	__rte_mempool_trace_ops_alloc;
 	__rte_mempool_trace_ops_free;
 	__rte_mempool_trace_set_ops_byname;
+
+	# added in 23.03
+	rte_mempool_cache_zc_put_bulk;
+	__rte_mempool_trace_cache_zc_put_bulk;
+	rte_mempool_cache_zc_get_bulk;
+	__rte_mempool_trace_cache_zc_get_bulk;
 };
 
 INTERNAL {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2] mempool cache: add zero-copy get and put functions
  2022-11-05 13:19 [RFC]: mempool: zero-copy cache get bulk Morten Brørup
  2022-11-07  9:19 ` Bruce Richardson
  2022-11-15 16:18 ` [PATCH] mempool cache: add zero-copy get and put functions Morten Brørup
@ 2022-11-16 18:04 ` Morten Brørup
  2022-11-29 20:54   ` Kamalakshitha Aligeri
  2022-12-22 15:57   ` Konstantin Ananyev
  2022-12-24 11:49 ` [PATCH v3] " Morten Brørup
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 36+ messages in thread
From: Morten Brørup @ 2022-11-16 18:04 UTC (permalink / raw)
  To: olivier.matz, andrew.rybchenko, honnappa.nagarahalli,
	kamalakshitha.aligeri, bruce.richardson, dev
  Cc: nd, Morten Brørup

Zero-copy access to mempool caches is beneficial for PMD performance, and
must be provided by the mempool library to fix [Bug 1052] without a
performance regression.

[Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052

v2:
* Fix checkpatch warnings.
* Fix missing registration of trace points.
* The functions are inline, so they don't go into the map file.
v1 changes from the RFC:
* Removed run-time parameter checks. (Honnappa)
  This is a hot fast path function; requiring correct application
  behaviour, i.e. function parameters must be valid.
* Added RTE_ASSERT for parameters instead.
  Code for this is only generated if built with RTE_ENABLE_ASSERT.
* Removed fallback when 'cache' parameter is not set. (Honnappa)
* Chose the simple get function; i.e. do not move the existing objects in
  the cache to the top of the new stack, just leave them at the bottom.
* Renamed the functions. Other suggestions are welcome, of course. ;-)
* Updated the function descriptions.
* Added the functions to trace_fp and version.map.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
 lib/mempool/mempool_trace_points.c |   6 ++
 lib/mempool/rte_mempool.h          | 124 +++++++++++++++++++++++++++++
 lib/mempool/rte_mempool_trace_fp.h |  16 ++++
 lib/mempool/version.map            |   4 +
 4 files changed, 150 insertions(+)

diff --git a/lib/mempool/mempool_trace_points.c b/lib/mempool/mempool_trace_points.c
index 4ad76deb34..a6070799af 100644
--- a/lib/mempool/mempool_trace_points.c
+++ b/lib/mempool/mempool_trace_points.c
@@ -77,3 +77,9 @@ RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free,
 
 RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname,
 	lib.mempool.set.ops.byname)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk,
+	lib.mempool.cache.zc.put.bulk)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk,
+	lib.mempool.cache.zc.get.bulk)
diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 9f530db24b..5e6da06bc7 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -47,6 +47,7 @@
 #include <rte_ring.h>
 #include <rte_memcpy.h>
 #include <rte_common.h>
+#include <rte_errno.h>
 
 #include "rte_mempool_trace_fp.h"
 
@@ -1346,6 +1347,129 @@ rte_mempool_cache_flush(struct rte_mempool_cache *cache,
 	cache->len = 0;
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy put objects in a user-owned mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to be put in the mempool cache.
+ *   Must not exceed RTE_MEMPOOL_CACHE_MAX_SIZE.
+ * @return
+ *   The pointer to where to put the objects in the mempool cache.
+ */
+__rte_experimental
+static __rte_always_inline void *
+rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	void **cache_objs;
+
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+	RTE_ASSERT(n <= RTE_MEMPOOL_CACHE_MAX_SIZE);
+
+	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
+
+	/* Increment stats now, adding in mempool always succeeds. */
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
+
+	/*
+	 * The cache follows the following algorithm:
+	 *   1. If the objects cannot be added to the cache without crossing
+	 *      the flush threshold, flush the cache to the backend.
+	 *   2. Add the objects to the cache.
+	 */
+
+	if (cache->len + n <= cache->flushthresh) {
+		cache_objs = &cache->objs[cache->len];
+		cache->len += n;
+	} else {
+		cache_objs = &cache->objs[0];
+		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
+		cache->len = n;
+	}
+
+	return cache_objs;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy get objects from a user-owned mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to prefetch into the mempool cache.
+ *   Must not exceed RTE_MEMPOOL_CACHE_MAX_SIZE.
+ * @return
+ *   The pointer to the objects in the mempool cache.
+ *   NULL on error; i.e. the cache + the pool does not contain n objects.
+ *   With rte_errno set to the error code of the mempool dequeue function.
+ */
+__rte_experimental
+static __rte_always_inline void *
+rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	unsigned int len;
+
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+	RTE_ASSERT(n <= RTE_MEMPOOL_CACHE_MAX_SIZE);
+
+	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
+
+	len = cache->len;
+
+	if (unlikely(n > len)) {
+		/* Fill the cache from the backend; fetch size + requested - len objects. */
+		int ret;
+		const unsigned int size = cache->size;
+
+		ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], size + n - len);
+		if (unlikely(ret < 0)) {
+			/*
+			 * We are buffer constrained.
+			 * Do not fill the cache, just satisfy the request.
+			 */
+			ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], n - len);
+			if (unlikely(ret < 0)) {
+				/* Unable to satisfy the request. */
+
+				RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
+				RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
+
+				rte_errno = -ret;
+				return NULL;
+			}
+
+			len = 0;
+		} else
+			len = size;
+	} else
+		len -= n;
+
+	cache->len = len;
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
+	return &cache->objs[len];
+}
+
 /**
  * @internal Put several objects back in the mempool; used internally.
  * @param mp
diff --git a/lib/mempool/rte_mempool_trace_fp.h b/lib/mempool/rte_mempool_trace_fp.h
index ed060e887c..00567fb1cf 100644
--- a/lib/mempool/rte_mempool_trace_fp.h
+++ b/lib/mempool/rte_mempool_trace_fp.h
@@ -109,6 +109,22 @@ RTE_TRACE_POINT_FP(
 	rte_trace_point_emit_ptr(mempool);
 )
 
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_put_bulk,
+	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_ptr(mempool);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_get_bulk,
+	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_ptr(mempool);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/mempool/version.map b/lib/mempool/version.map
index b67d7aace7..927477b977 100644
--- a/lib/mempool/version.map
+++ b/lib/mempool/version.map
@@ -63,6 +63,10 @@ EXPERIMENTAL {
 	__rte_mempool_trace_ops_alloc;
 	__rte_mempool_trace_ops_free;
 	__rte_mempool_trace_set_ops_byname;
+
+	# added in 23.03
+	__rte_mempool_trace_cache_zc_put_bulk;
+	__rte_mempool_trace_cache_zc_get_bulk;
 };
 
 INTERNAL {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v2] mempool cache: add zero-copy get and put functions
  2022-11-16 18:04 ` [PATCH v2] " Morten Brørup
@ 2022-11-29 20:54   ` Kamalakshitha Aligeri
  2022-11-30 10:21     ` Morten Brørup
  2022-12-22 15:57   ` Konstantin Ananyev
  1 sibling, 1 reply; 36+ messages in thread
From: Kamalakshitha Aligeri @ 2022-11-29 20:54 UTC (permalink / raw)
  To: Morten Brørup, olivier.matz, andrew.rybchenko,
	Honnappa Nagarahalli, bruce.richardson, dev
  Cc: nd, nd



> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Wednesday, November 16, 2022 12:04 PM
> To: olivier.matz@6wind.com; andrew.rybchenko@oktetlabs.ru; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Kamalakshitha Aligeri
> <Kamalakshitha.Aligeri@arm.com>; bruce.richardson@intel.com;
> dev@dpdk.org
> Cc: nd <nd@arm.com>; Morten Brørup <mb@smartsharesystems.com>
> Subject: [PATCH v2] mempool cache: add zero-copy get and put functions
> 
> Zero-copy access to mempool caches is beneficial for PMD performance, and
> must be provided by the mempool library to fix [Bug 1052] without a
> performance regression.
> 
> [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> 
> v2:
> * Fix checkpatch warnings.
> * Fix missing registration of trace points.
> * The functions are inline, so they don't go into the map file.
> v1 changes from the RFC:
> * Removed run-time parameter checks. (Honnappa)
>   This is a hot fast path function; requiring correct application
>   behaviour, i.e. function parameters must be valid.
> * Added RTE_ASSERT for parameters instead.
>   Code for this is only generated if built with RTE_ENABLE_ASSERT.
> * Removed fallback when 'cache' parameter is not set. (Honnappa)
> * Chose the simple get function; i.e. do not move the existing objects in
>   the cache to the top of the new stack, just leave them at the bottom.
> * Renamed the functions. Other suggestions are welcome, of course. ;-)
> * Updated the function descriptions.
> * Added the functions to trace_fp and version.map.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
>  lib/mempool/mempool_trace_points.c |   6 ++
>  lib/mempool/rte_mempool.h          | 124 +++++++++++++++++++++++++++++
>  lib/mempool/rte_mempool_trace_fp.h |  16 ++++
>  lib/mempool/version.map            |   4 +
>  4 files changed, 150 insertions(+)
> 
> diff --git a/lib/mempool/mempool_trace_points.c
> b/lib/mempool/mempool_trace_points.c
> index 4ad76deb34..a6070799af 100644
> --- a/lib/mempool/mempool_trace_points.c
> +++ b/lib/mempool/mempool_trace_points.c
> @@ -77,3 +77,9 @@
> RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free,
> 
>  RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname,
>  	lib.mempool.set.ops.byname)
> +
> +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk,
> +	lib.mempool.cache.zc.put.bulk)
> +
> +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk,
> +	lib.mempool.cache.zc.get.bulk)
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 9f530db24b..5e6da06bc7 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -47,6 +47,7 @@
>  #include <rte_ring.h>
>  #include <rte_memcpy.h>
>  #include <rte_common.h>
> +#include <rte_errno.h>
> 
>  #include "rte_mempool_trace_fp.h"
> 
> @@ -1346,6 +1347,129 @@ rte_mempool_cache_flush(struct
> rte_mempool_cache *cache,
>  	cache->len = 0;
>  }
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: This API may change, or be removed, without prior
> notice.
> + *
> + * Zero-copy put objects in a user-owned mempool cache backed by the
> specified mempool.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param mp
> + *   A pointer to the mempool.
> + * @param n
> + *   The number of objects to be put in the mempool cache.
> + *   Must not exceed RTE_MEMPOOL_CACHE_MAX_SIZE.
> + * @return
> + *   The pointer to where to put the objects in the mempool cache.
> + */

rte_mempool_cache_zc_put_bulk function takes *cache as an input parameter, which means rte_mempool_default_cache function must be called in the PMD code, because there is no pointer to mempool stored in i40e_tx_queue. Its there in i40e_rx_queue though.
So, should we change the API's ?

> +__rte_experimental
> +static __rte_always_inline void *
> +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> +		struct rte_mempool *mp,
> +		unsigned int n)
> +{
> +	void **cache_objs;
> +
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(mp != NULL);
> +	RTE_ASSERT(n <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> +
> +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> +
> +	/* Increment stats now, adding in mempool always succeeds. */
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> +
> +	/*
> +	 * The cache follows the following algorithm:
> +	 *   1. If the objects cannot be added to the cache without crossing
> +	 *      the flush threshold, flush the cache to the backend.
> +	 *   2. Add the objects to the cache.
> +	 */
> +
> +	if (cache->len + n <= cache->flushthresh) {
> +		cache_objs = &cache->objs[cache->len];
> +		cache->len += n;
> +	} else {
> +		cache_objs = &cache->objs[0];
> +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> >len);
> +		cache->len = n;
> +	}
> +
> +	return cache_objs;
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: This API may change, or be removed, without prior
> notice.
> + *
> + * Zero-copy get objects from a user-owned mempool cache backed by the
> specified mempool.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param mp
> + *   A pointer to the mempool.
> + * @param n
> + *   The number of objects to prefetch into the mempool cache.
> + *   Must not exceed RTE_MEMPOOL_CACHE_MAX_SIZE.
> + * @return
> + *   The pointer to the objects in the mempool cache.
> + *   NULL on error; i.e. the cache + the pool does not contain n objects.
> + *   With rte_errno set to the error code of the mempool dequeue function.
> + */
> +__rte_experimental
> +static __rte_always_inline void *
> +rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
> +		struct rte_mempool *mp,
> +		unsigned int n)
> +{
> +	unsigned int len;
> +
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(mp != NULL);
> +	RTE_ASSERT(n <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> +
> +	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
> +
> +	len = cache->len;
> +
> +	if (unlikely(n > len)) {
> +		/* Fill the cache from the backend; fetch size + requested -
> len objects. */
> +		int ret;
> +		const unsigned int size = cache->size;
> +
> +		ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> >objs[len], size + n - len);
> +		if (unlikely(ret < 0)) {
> +			/*
> +			 * We are buffer constrained.
> +			 * Do not fill the cache, just satisfy the request.
> +			 */
> +			ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> >objs[len], n - len);
> +			if (unlikely(ret < 0)) {
> +				/* Unable to satisfy the request. */
> +
> +				RTE_MEMPOOL_STAT_ADD(mp,
> get_fail_bulk, 1);
> +				RTE_MEMPOOL_STAT_ADD(mp,
> get_fail_objs, n);
> +
> +				rte_errno = -ret;
> +				return NULL;
> +			}
> +
> +			len = 0;
> +		} else
> +			len = size;
> +	} else
> +		len -= n;
> +
> +	cache->len = len;
> +
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> +
> +	return &cache->objs[len];
> +}
> +
>  /**
>   * @internal Put several objects back in the mempool; used internally.
>   * @param mp
> diff --git a/lib/mempool/rte_mempool_trace_fp.h
> b/lib/mempool/rte_mempool_trace_fp.h
> index ed060e887c..00567fb1cf 100644
> --- a/lib/mempool/rte_mempool_trace_fp.h
> +++ b/lib/mempool/rte_mempool_trace_fp.h
> @@ -109,6 +109,22 @@ RTE_TRACE_POINT_FP(
>  	rte_trace_point_emit_ptr(mempool);
>  )
> 
> +RTE_TRACE_POINT_FP(
> +	rte_mempool_trace_cache_zc_put_bulk,
> +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> nb_objs),
> +	rte_trace_point_emit_ptr(cache);
> +	rte_trace_point_emit_ptr(mempool);
> +	rte_trace_point_emit_u32(nb_objs);
> +)
> +
> +RTE_TRACE_POINT_FP(
> +	rte_mempool_trace_cache_zc_get_bulk,
> +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> nb_objs),
> +	rte_trace_point_emit_ptr(cache);
> +	rte_trace_point_emit_ptr(mempool);
> +	rte_trace_point_emit_u32(nb_objs);
> +)
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/mempool/version.map b/lib/mempool/version.map index
> b67d7aace7..927477b977 100644
> --- a/lib/mempool/version.map
> +++ b/lib/mempool/version.map
> @@ -63,6 +63,10 @@ EXPERIMENTAL {
>  	__rte_mempool_trace_ops_alloc;
>  	__rte_mempool_trace_ops_free;
>  	__rte_mempool_trace_set_ops_byname;
> +
> +	# added in 23.03
> +	__rte_mempool_trace_cache_zc_put_bulk;
> +	__rte_mempool_trace_cache_zc_get_bulk;
>  };
> 
>  INTERNAL {
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v2] mempool cache: add zero-copy get and put functions
  2022-11-29 20:54   ` Kamalakshitha Aligeri
@ 2022-11-30 10:21     ` Morten Brørup
  0 siblings, 0 replies; 36+ messages in thread
From: Morten Brørup @ 2022-11-30 10:21 UTC (permalink / raw)
  To: Kamalakshitha Aligeri, olivier.matz, andrew.rybchenko,
	Honnappa Nagarahalli, bruce.richardson, dev
  Cc: nd

> From: Kamalakshitha Aligeri [mailto:Kamalakshitha.Aligeri@arm.com]
> Sent: Tuesday, 29 November 2022 21.54
> 
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: Wednesday, November 16, 2022 12:04 PM
> >
> > Zero-copy access to mempool caches is beneficial for PMD performance,
> and
> > must be provided by the mempool library to fix [Bug 1052] without a
> > performance regression.
> >
> > [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> >
> > v2:
> > * Fix checkpatch warnings.
> > * Fix missing registration of trace points.
> > * The functions are inline, so they don't go into the map file.
> > v1 changes from the RFC:
> > * Removed run-time parameter checks. (Honnappa)
> >   This is a hot fast path function; requiring correct application
> >   behaviour, i.e. function parameters must be valid.
> > * Added RTE_ASSERT for parameters instead.
> >   Code for this is only generated if built with RTE_ENABLE_ASSERT.
> > * Removed fallback when 'cache' parameter is not set. (Honnappa)
> > * Chose the simple get function; i.e. do not move the existing
> objects in
> >   the cache to the top of the new stack, just leave them at the
> bottom.
> > * Renamed the functions. Other suggestions are welcome, of course. ;-
> )
> > * Updated the function descriptions.
> > * Added the functions to trace_fp and version.map.
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > ---

[...]

> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: This API may change, or be removed, without
> prior
> > notice.
> > + *
> > + * Zero-copy put objects in a user-owned mempool cache backed by the
> > specified mempool.
> > + *
> > + * @param cache
> > + *   A pointer to the mempool cache.
> > + * @param mp
> > + *   A pointer to the mempool.
> > + * @param n
> > + *   The number of objects to be put in the mempool cache.
> > + *   Must not exceed RTE_MEMPOOL_CACHE_MAX_SIZE.
> > + * @return
> > + *   The pointer to where to put the objects in the mempool cache.
> > + */
> 
> rte_mempool_cache_zc_put_bulk function takes *cache as an input
> parameter, which means rte_mempool_default_cache function must be
> called in the PMD code, because there is no pointer to mempool stored
> in i40e_tx_queue. Its there in i40e_rx_queue though.
> So, should we change the API's ?

Excellent question!

This is a "mempool cache" API. So we must keep in mind that it can be consumed by applications and other libraries using mempool caches, not just PMDs.

If some consumer of the API, e.g. i40e_tx, doesn’t know the cache pointer, it must look it up itself before calling the function, e.g. by rte_mempool_default_cache().

I think we should keep the API clean, as proposed. Otherwise, the added lookup (although conditional) would degrade the performance of all other consumers of the API.

And there is no performance difference for the PMD whether it calls rte_mempool_default_cache() in the PMD itself, or if it is called from within the rte_mempool_cache_zc_put_bulk() function.

> 
> > +__rte_experimental
> > +static __rte_always_inline void *
> > +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > +		struct rte_mempool *mp,
> > +		unsigned int n)
> > +{
> > +	void **cache_objs;
> > +
> > +	RTE_ASSERT(cache != NULL);
> > +	RTE_ASSERT(mp != NULL);
> > +	RTE_ASSERT(n <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> > +
> > +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> > +
> > +	/* Increment stats now, adding in mempool always succeeds. */
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > +
> > +	/*
> > +	 * The cache follows the following algorithm:
> > +	 *   1. If the objects cannot be added to the cache without
> crossing
> > +	 *      the flush threshold, flush the cache to the backend.
> > +	 *   2. Add the objects to the cache.
> > +	 */
> > +
> > +	if (cache->len + n <= cache->flushthresh) {
> > +		cache_objs = &cache->objs[cache->len];
> > +		cache->len += n;
> > +	} else {
> > +		cache_objs = &cache->objs[0];
> > +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > >len);
> > +		cache->len = n;
> > +	}
> > +
> > +	return cache_objs;
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: This API may change, or be removed, without
> prior
> > notice.
> > + *
> > + * Zero-copy get objects from a user-owned mempool cache backed by
> the
> > specified mempool.
> > + *
> > + * @param cache
> > + *   A pointer to the mempool cache.
> > + * @param mp
> > + *   A pointer to the mempool.
> > + * @param n
> > + *   The number of objects to prefetch into the mempool cache.
> > + *   Must not exceed RTE_MEMPOOL_CACHE_MAX_SIZE.
> > + * @return
> > + *   The pointer to the objects in the mempool cache.
> > + *   NULL on error; i.e. the cache + the pool does not contain n
> objects.
> > + *   With rte_errno set to the error code of the mempool dequeue
> function.
> > + */
> > +__rte_experimental
> > +static __rte_always_inline void *
> > +rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
> > +		struct rte_mempool *mp,
> > +		unsigned int n)
> > +{
> > +	unsigned int len;
> > +
> > +	RTE_ASSERT(cache != NULL);
> > +	RTE_ASSERT(mp != NULL);
> > +	RTE_ASSERT(n <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> > +
> > +	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
> > +
> > +	len = cache->len;
> > +
> > +	if (unlikely(n > len)) {
> > +		/* Fill the cache from the backend; fetch size + requested
> -
> > len objects. */
> > +		int ret;
> > +		const unsigned int size = cache->size;
> > +
> > +		ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> > >objs[len], size + n - len);
> > +		if (unlikely(ret < 0)) {
> > +			/*
> > +			 * We are buffer constrained.
> > +			 * Do not fill the cache, just satisfy the request.
> > +			 */
> > +			ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> > >objs[len], n - len);
> > +			if (unlikely(ret < 0)) {
> > +				/* Unable to satisfy the request. */
> > +
> > +				RTE_MEMPOOL_STAT_ADD(mp,
> > get_fail_bulk, 1);
> > +				RTE_MEMPOOL_STAT_ADD(mp,
> > get_fail_objs, n);
> > +
> > +				rte_errno = -ret;
> > +				return NULL;
> > +			}
> > +
> > +			len = 0;
> > +		} else
> > +			len = size;
> > +	} else
> > +		len -= n;
> > +
> > +	cache->len = len;
> > +
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> > +
> > +	return &cache->objs[len];
> > +}
> > +

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v2] mempool cache: add zero-copy get and put functions
  2022-11-16 18:04 ` [PATCH v2] " Morten Brørup
  2022-11-29 20:54   ` Kamalakshitha Aligeri
@ 2022-12-22 15:57   ` Konstantin Ananyev
  2022-12-22 17:55     ` Morten Brørup
  1 sibling, 1 reply; 36+ messages in thread
From: Konstantin Ananyev @ 2022-12-22 15:57 UTC (permalink / raw)
  To: Morten Brørup, olivier.matz, andrew.rybchenko,
	honnappa.nagarahalli, kamalakshitha.aligeri, bruce.richardson,
	dev
  Cc: nd

> Zero-copy access to mempool caches is beneficial for PMD performance, and
> must be provided by the mempool library to fix [Bug 1052] without a
> performance regression.

LGTM in general, thank you for working on it.
Few comments below.

> 
> [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> 
> v2:
> * Fix checkpatch warnings.
> * Fix missing registration of trace points.
> * The functions are inline, so they don't go into the map file.
> v1 changes from the RFC:
> * Removed run-time parameter checks. (Honnappa)
>   This is a hot fast path function; requiring correct application
>   behaviour, i.e. function parameters must be valid.
> * Added RTE_ASSERT for parameters instead.

RTE_ASSERT(n <= RTE_MEMPOOL_CACHE_MAX_SIZE);
I think it is too excessive.
Just:
if (n <= RTE_MEMPOOL_CACHE_MAX_SIZE) return NULL;
seems much more convenient for the users here and
more close to other mempool/ring API behavior.
In terms of performance - I don’t think one extra comparison here
would really count.

I also think would be really good to add:
add zc_(get|put)_bulk_start(),  zc_(get|put)_bulk_finish().
Where _start would check/fill the cache and return the pointer,
while _finsih would updathe cache->len.
Similar to what we have for rte_ring _peek_ API.
That would allow to extend this API usage - let say inside PMDs
it could be used not only for MBUF_FAST_FREE case,  but for generic
TX code path (one that have to call rte_mbuf_prefree()) also.  

>   Code for this is only generated if built with RTE_ENABLE_ASSERT.
> * Removed fallback when 'cache' parameter is not set. (Honnappa)
> * Chose the simple get function; i.e. do not move the existing objects in
>   the cache to the top of the new stack, just leave them at the bottom.
> * Renamed the functions. Other suggestions are welcome, of course. ;-)
> * Updated the function descriptions.
> * Added the functions to trace_fp and version.map.

Would be great to add some test-cases in app/test to cover this new API.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v2] mempool cache: add zero-copy get and put functions
  2022-12-22 15:57   ` Konstantin Ananyev
@ 2022-12-22 17:55     ` Morten Brørup
  2022-12-23 16:58       ` Konstantin Ananyev
  0 siblings, 1 reply; 36+ messages in thread
From: Morten Brørup @ 2022-12-22 17:55 UTC (permalink / raw)
  To: Konstantin Ananyev, olivier.matz, andrew.rybchenko,
	honnappa.nagarahalli, kamalakshitha.aligeri, bruce.richardson,
	dev
  Cc: nd

> From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> Sent: Thursday, 22 December 2022 16.57
> 
> > Zero-copy access to mempool caches is beneficial for PMD performance,
> and
> > must be provided by the mempool library to fix [Bug 1052] without a
> > performance regression.
> 
> LGTM in general, thank you for working on it.
> Few comments below.
> 
> >
> > [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> >
> > v2:
> > * Fix checkpatch warnings.
> > * Fix missing registration of trace points.
> > * The functions are inline, so they don't go into the map file.
> > v1 changes from the RFC:
> > * Removed run-time parameter checks. (Honnappa)
> >   This is a hot fast path function; requiring correct application
> >   behaviour, i.e. function parameters must be valid.
> > * Added RTE_ASSERT for parameters instead.
> 
> RTE_ASSERT(n <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> I think it is too excessive.
> Just:
> if (n <= RTE_MEMPOOL_CACHE_MAX_SIZE) return NULL;
> seems much more convenient for the users here and
> more close to other mempool/ring API behavior.
> In terms of performance - I don’t think one extra comparison here
> would really count.

The insignificant performance degradation seems like a good tradeoff for making the function more generic.
I will update the function documentation and place the run-time check here, so both trace and stats reflect what happened:

	RTE_ASSERT(cache != NULL);
	RTE_ASSERT(mp != NULL);
-	RTE_ASSERT(n <= RTE_MEMPOOL_CACHE_MAX_SIZE);

	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
+
+	if (unlikely(n > RTE_MEMPOOL_CACHE_MAX_SIZE)) {
+		rte_errno = -ENOSPC; // Or EINVAL?
+		return NULL;
+	}

	/* Increment stats now, adding in mempool always succeeds. */

I will probably also be able to come up with solution for zc_get_bulk(), so both trace and stats make sense if called with n > RTE_MEMPOOL_CACHE_MAX_SIZE.

> 
> I also think would be really good to add:
> add zc_(get|put)_bulk_start(),  zc_(get|put)_bulk_finish().
> Where _start would check/fill the cache and return the pointer,
> while _finsih would updathe cache->len.
> Similar to what we have for rte_ring _peek_ API.
> That would allow to extend this API usage - let say inside PMDs
> it could be used not only for MBUF_FAST_FREE case,  but for generic
> TX code path (one that have to call rte_mbuf_prefree()) also.

I don't see a use case for zc_get_start()/_finish().

And since the mempool cache is a stack, it would *require* that the application reads the array in reverse order. In such case, the function should not return a pointer to the array of objects, but a pointer to the top of the stack.

So I prefer to stick with the single-function zero-copy get, i.e. without start/finish.

I do agree with you about the use case for zc_put_start()/_finish().

Unlike the ring, there is no need for locking with the mempool cache, so we can implement something much simpler:

Instead of requiring calling both zc_put_start() and _finish() for every zero-copy burst, we could add a zc_put_rewind() function, only to be called if some number of objects were not added anyway:

/* FIXME: Function documentation here. */
__rte_experimental
static __rte_always_inline void
rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
		unsigned int n)
{
	RTE_ASSERT(cache != NULL);
	RTE_ASSERT(n <= cache->len);

	rte_mempool_trace_cache_zc_put_rewind(cache, n);

	/* Rewind stats. */
	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, -n);

	cache->len -= n;
}

I have a strong preference for _rewind() over _start() and _finish(), because in the full burst case, it only touches the rte_mempool_cache structure once, whereas splitting it up into _start() and _finish() touches the rte_mempool_cache structure both before and after copying the array of objects.

What do you think?

I am open for other names than _rewind(), so feel free to speak up if you have a better name.

> 
> >   Code for this is only generated if built with RTE_ENABLE_ASSERT.
> > * Removed fallback when 'cache' parameter is not set. (Honnappa)
> > * Chose the simple get function; i.e. do not move the existing
> objects in
> >   the cache to the top of the new stack, just leave them at the
> bottom.
> > * Renamed the functions. Other suggestions are welcome, of course. ;-
> )
> > * Updated the function descriptions.
> > * Added the functions to trace_fp and version.map.
> 
> Would be great to add some test-cases in app/test to cover this new
> API.

Good point. I will look at it.

BTW: Akshitha already has zc_put_bulk working in the i40e PMD.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v2] mempool cache: add zero-copy get and put functions
  2022-12-22 17:55     ` Morten Brørup
@ 2022-12-23 16:58       ` Konstantin Ananyev
  2022-12-24 12:17         ` Morten Brørup
  0 siblings, 1 reply; 36+ messages in thread
From: Konstantin Ananyev @ 2022-12-23 16:58 UTC (permalink / raw)
  To: Morten Brørup, olivier.matz, andrew.rybchenko,
	honnappa.nagarahalli, kamalakshitha.aligeri, bruce.richardson,
	dev
  Cc: nd


> > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> > Sent: Thursday, 22 December 2022 16.57
> >
> > > Zero-copy access to mempool caches is beneficial for PMD performance,
> > and
> > > must be provided by the mempool library to fix [Bug 1052] without a
> > > performance regression.
> >
> > LGTM in general, thank you for working on it.
> > Few comments below.
> >
> > >
> > > [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> > >
> > > v2:
> > > * Fix checkpatch warnings.
> > > * Fix missing registration of trace points.
> > > * The functions are inline, so they don't go into the map file.
> > > v1 changes from the RFC:
> > > * Removed run-time parameter checks. (Honnappa)
> > >   This is a hot fast path function; requiring correct application
> > >   behaviour, i.e. function parameters must be valid.
> > > * Added RTE_ASSERT for parameters instead.
> >
> > RTE_ASSERT(n <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> > I think it is too excessive.
> > Just:
> > if (n <= RTE_MEMPOOL_CACHE_MAX_SIZE) return NULL;
> > seems much more convenient for the users here and
> > more close to other mempool/ring API behavior.
> > In terms of performance - I don’t think one extra comparison here
> > would really count.
> 
> The insignificant performance degradation seems like a good tradeoff for making the function more generic.
> I will update the function documentation and place the run-time check here, so both trace and stats reflect what happened:
> 
> 	RTE_ASSERT(cache != NULL);
> 	RTE_ASSERT(mp != NULL);
> -	RTE_ASSERT(n <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> 
> 	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> +
> +	if (unlikely(n > RTE_MEMPOOL_CACHE_MAX_SIZE)) {
> +		rte_errno = -ENOSPC; // Or EINVAL?
> +		return NULL;
> +	}
> 
> 	/* Increment stats now, adding in mempool always succeeds. */
> 
> I will probably also be able to come up with solution for zc_get_bulk(), so both trace and stats make sense if called with n >
> RTE_MEMPOOL_CACHE_MAX_SIZE.
> 
> >
> > I also think would be really good to add:
> > add zc_(get|put)_bulk_start(),  zc_(get|put)_bulk_finish().
> > Where _start would check/fill the cache and return the pointer,
> > while _finsih would updathe cache->len.
> > Similar to what we have for rte_ring _peek_ API.
> > That would allow to extend this API usage - let say inside PMDs
> > it could be used not only for MBUF_FAST_FREE case,  but for generic
> > TX code path (one that have to call rte_mbuf_prefree()) also.
> 
> I don't see a use case for zc_get_start()/_finish().
> 
> And since the mempool cache is a stack, it would *require* that the application reads the array in reverse order. In such case, the
> function should not return a pointer to the array of objects, but a pointer to the top of the stack.
> 
> So I prefer to stick with the single-function zero-copy get, i.e. without start/finish.

Yes, it would be more complicated than just update cache->len.
I don't have any real use-case for _get_ too - mostly just for symmetry with put.
 
> 
> 
> I do agree with you about the use case for zc_put_start()/_finish().
> 
> Unlike the ring, there is no need for locking with the mempool cache, so we can implement something much simpler:
> 
> Instead of requiring calling both zc_put_start() and _finish() for every zero-copy burst, we could add a zc_put_rewind() function, only
> to be called if some number of objects were not added anyway:
> 
> /* FIXME: Function documentation here. */
> __rte_experimental
> static __rte_always_inline void
> rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> 		unsigned int n)
> {
> 	RTE_ASSERT(cache != NULL);
> 	RTE_ASSERT(n <= cache->len);
> 
> 	rte_mempool_trace_cache_zc_put_rewind(cache, n);
> 
> 	/* Rewind stats. */
> 	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, -n);
> 
> 	cache->len -= n;
> }
> 
> I have a strong preference for _rewind() over _start() and _finish(), because in the full burst case, it only touches the
> rte_mempool_cache structure once, whereas splitting it up into _start() and _finish() touches the rte_mempool_cache structure both
> before and after copying the array of objects.
> 
> What do you think?

And your concern is that between _get_start(_C_) and get_finish(_C_) the _C_
cache line can be bumped out of CPU Dcache, right?
I don't think such situation would be a common one.
But, if you think _rewind_ is a better approach - I am ok with it. 
 

> I am open for other names than _rewind(), so feel free to speak up if you have a better name.
> 
> 
> >
> > >   Code for this is only generated if built with RTE_ENABLE_ASSERT.
> > > * Removed fallback when 'cache' parameter is not set. (Honnappa)
> > > * Chose the simple get function; i.e. do not move the existing
> > objects in
> > >   the cache to the top of the new stack, just leave them at the
> > bottom.
> > > * Renamed the functions. Other suggestions are welcome, of course. ;-
> > )
> > > * Updated the function descriptions.
> > > * Added the functions to trace_fp and version.map.
> >
> > Would be great to add some test-cases in app/test to cover this new
> > API.
> 
> Good point. I will look at it.
> 
> BTW: Akshitha already has zc_put_bulk working in the i40e PMD.

That's great news, but I suppose it would be good to have some UT for it anyway.
Konstantin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3] mempool cache: add zero-copy get and put functions
  2022-11-05 13:19 [RFC]: mempool: zero-copy cache get bulk Morten Brørup
                   ` (2 preceding siblings ...)
  2022-11-16 18:04 ` [PATCH v2] " Morten Brørup
@ 2022-12-24 11:49 ` Morten Brørup
  2022-12-24 11:55 ` [PATCH v4] " Morten Brørup
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 36+ messages in thread
From: Morten Brørup @ 2022-12-24 11:49 UTC (permalink / raw)
  To: olivier.matz, andrew.rybchenko, honnappa.nagarahalli,
	kamalakshitha.aligeri, bruce.richardson, konstantin.ananyev, dev
  Cc: nd, Morten Brørup

Zero-copy access to mempool caches is beneficial for PMD performance, and
must be provided by the mempool library to fix [Bug 1052] without a
performance regression.

[Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052

v3:
* Bugfix: Respect the cache size; compare to the flush threshold instead
  of RTE_MEMPOOL_CACHE_MAX_SIZE.
* Added 'rewind' function for incomplete 'put' operations. (Konstantin)
* Replace RTE_ASSERTs with runtime checks of the request size.
  Instead of failing, return NULL if the request is too big. (Konstantin)
* Modified comparison to prevent overflow if n is really huge and len is
  non-zero.
* Updated the comments in the code.
v2:
* Fix checkpatch warnings.
* Fix missing registration of trace points.
* The functions are inline, so they don't go into the map file.
v1 changes from the RFC:
* Removed run-time parameter checks. (Honnappa)
  This is a hot fast path function; requiring correct application
  behaviour, i.e. function parameters must be valid.
* Added RTE_ASSERT for parameters instead.
  Code for this is only generated if built with RTE_ENABLE_ASSERT.
* Removed fallback when 'cache' parameter is not set. (Honnappa)
* Chose the simple get function; i.e. do not move the existing objects in
  the cache to the top of the new stack, just leave them at the bottom.
* Renamed the functions. Other suggestions are welcome, of course. ;-)
* Updated the function descriptions.
* Added the functions to trace_fp and version.map.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
 lib/mempool/mempool_trace_points.c |   9 ++
 lib/mempool/rte_mempool.h          | 165 +++++++++++++++++++++++++++++
 lib/mempool/rte_mempool_trace_fp.h |  23 ++++
 lib/mempool/version.map            |   5 +
 4 files changed, 202 insertions(+)

diff --git a/lib/mempool/mempool_trace_points.c b/lib/mempool/mempool_trace_points.c
index 4ad76deb34..83d353a764 100644
--- a/lib/mempool/mempool_trace_points.c
+++ b/lib/mempool/mempool_trace_points.c
@@ -77,3 +77,12 @@ RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free,
 
 RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname,
 	lib.mempool.set.ops.byname)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk,
+	lib.mempool.cache.zc.put.bulk)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_rewind,
+	lib.mempool.cache.zc.put.rewind)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk,
+	lib.mempool.cache.zc.get.bulk)
diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 9f530db24b..17a90b3ba1 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -47,6 +47,7 @@
 #include <rte_ring.h>
 #include <rte_memcpy.h>
 #include <rte_common.h>
+#include <rte_errno.h>
 
 #include "rte_mempool_trace_fp.h"
 
@@ -1346,6 +1347,170 @@ rte_mempool_cache_flush(struct rte_mempool_cache *cache,
 	cache->len = 0;
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy put objects in a user-owned mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to be put in the mempool cache.
+ * @return
+ *   The pointer to where to put the objects in the mempool cache.
+ *   NULL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+ __rte_experimental
+static __rte_always_inline void *
+rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	void **cache_objs;
+
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
+
+	if (n <= cache->flushthresh - cache->len) {
+		/*
+		 * The objects can be added to the cache without crossing the
+		 * flush threshold.
+		 */
+		cache_objs = &cache->objs[cache->len];
+		cache->len += n;
+	} else if (likely(n <= cache->flushthresh)) {
+		/*
+		 * The request itself fits into the cache.
+		 * But first, the cache must be flushed to the backend, so
+		 * adding the objects does not cross the flush threshold.
+		 */
+		cache_objs = &cache->objs[0];
+		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
+		cache->len = n;
+	} else {
+		/* The request itself is too big for the cache. */
+		return NULL;
+	}
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
+
+	return cache_objs;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy un-put objects in a user-owned mempool cache.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param n
+ *   The number of objects not put in the mempool cache after calling
+ *   rte_mempool_cache_zc_put_bulk().
+ */
+ __rte_experimental
+static __rte_always_inline void
+rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
+		unsigned int n)
+{
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(n <= cache->len);
+
+	rte_mempool_trace_cache_zc_put_rewind(cache, n);
+
+	cache->len -= n;
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy get objects from a user-owned mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to prefetch into the mempool cache.
+ * @return
+ *   The pointer to the objects in the mempool cache.
+ *   NULL on error; i.e. the cache + the pool does not contain 'n' objects.
+ *   With rte_errno set to the error code of the mempool dequeue function,
+ *   or EINVAL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+ __rte_experimental
+static __rte_always_inline void *
+rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	unsigned int len;
+
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
+
+	len = cache->len;
+
+	if (n <= len) {
+		/* The request can be satisfied from the cache as is. */
+		len -= n;
+	} else if (likely(n <= cache->flushthresh)) {
+		/*
+		 * The request itself can be satisfied from the cache.
+		 * But first, the cache must be filled from the backend;
+		 * fetch size + requested - len objects.
+		 */
+		int ret;
+		const unsigned int size = cache->size;
+
+		ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], size + n - len);
+		if (unlikely(ret < 0)) {
+			/*
+			 * We are buffer constrained.
+			 * Do not fill the cache, just satisfy the request.
+			 */
+			ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], n - len);
+			if (unlikely(ret < 0)) {
+				/* Unable to satisfy the request. */
+
+				RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
+				RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
+
+				rte_errno = -ret;
+				return NULL;
+			}
+
+			len = 0;
+		} else
+			len = size;
+	} else {
+		/* The request itself is too big for the cache. */
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	cache->len = len;
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
+	return &cache->objs[len];
+}
+
 /**
  * @internal Put several objects back in the mempool; used internally.
  * @param mp
diff --git a/lib/mempool/rte_mempool_trace_fp.h b/lib/mempool/rte_mempool_trace_fp.h
index ed060e887c..14666457f7 100644
--- a/lib/mempool/rte_mempool_trace_fp.h
+++ b/lib/mempool/rte_mempool_trace_fp.h
@@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
 	rte_trace_point_emit_ptr(mempool);
 )
 
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_put_bulk,
+	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_ptr(mempool);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_put_rewind,
+	RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_get_bulk,
+	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_ptr(mempool);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/mempool/version.map b/lib/mempool/version.map
index b67d7aace7..1383ae6db2 100644
--- a/lib/mempool/version.map
+++ b/lib/mempool/version.map
@@ -63,6 +63,11 @@ EXPERIMENTAL {
 	__rte_mempool_trace_ops_alloc;
 	__rte_mempool_trace_ops_free;
 	__rte_mempool_trace_set_ops_byname;
+
+	# added in 23.03
+	__rte_mempool_trace_cache_zc_put_bulk;
+	__rte_mempool_trace_cache_zc_put_rewind;
+	__rte_mempool_trace_cache_zc_get_bulk;
 };
 
 INTERNAL {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4] mempool cache: add zero-copy get and put functions
  2022-11-05 13:19 [RFC]: mempool: zero-copy cache get bulk Morten Brørup
                   ` (3 preceding siblings ...)
  2022-12-24 11:49 ` [PATCH v3] " Morten Brørup
@ 2022-12-24 11:55 ` Morten Brørup
  2022-12-27  9:24   ` Andrew Rybchenko
  2022-12-27 15:17 ` [PATCH v5] " Morten Brørup
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Morten Brørup @ 2022-12-24 11:55 UTC (permalink / raw)
  To: olivier.matz, andrew.rybchenko, honnappa.nagarahalli,
	kamalakshitha.aligeri, bruce.richardson, konstantin.ananyev, dev
  Cc: nd, Morten Brørup

Zero-copy access to mempool caches is beneficial for PMD performance, and
must be provided by the mempool library to fix [Bug 1052] without a
performance regression.

[Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052

v4:
* Fix checkpatch warnings.
v3:
* Bugfix: Respect the cache size; compare to the flush threshold instead
  of RTE_MEMPOOL_CACHE_MAX_SIZE.
* Added 'rewind' function for incomplete 'put' operations. (Konstantin)
* Replace RTE_ASSERTs with runtime checks of the request size.
  Instead of failing, return NULL if the request is too big. (Konstantin)
* Modified comparison to prevent overflow if n is really huge and len is
  non-zero.
* Updated the comments in the code.
v2:
* Fix checkpatch warnings.
* Fix missing registration of trace points.
* The functions are inline, so they don't go into the map file.
v1 changes from the RFC:
* Removed run-time parameter checks. (Honnappa)
  This is a hot fast path function; requiring correct application
  behaviour, i.e. function parameters must be valid.
* Added RTE_ASSERT for parameters instead.
  Code for this is only generated if built with RTE_ENABLE_ASSERT.
* Removed fallback when 'cache' parameter is not set. (Honnappa)
* Chose the simple get function; i.e. do not move the existing objects in
  the cache to the top of the new stack, just leave them at the bottom.
* Renamed the functions. Other suggestions are welcome, of course. ;-)
* Updated the function descriptions.
* Added the functions to trace_fp and version.map.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
 lib/mempool/mempool_trace_points.c |   9 ++
 lib/mempool/rte_mempool.h          | 165 +++++++++++++++++++++++++++++
 lib/mempool/rte_mempool_trace_fp.h |  23 ++++
 lib/mempool/version.map            |   5 +
 4 files changed, 202 insertions(+)

diff --git a/lib/mempool/mempool_trace_points.c b/lib/mempool/mempool_trace_points.c
index 4ad76deb34..83d353a764 100644
--- a/lib/mempool/mempool_trace_points.c
+++ b/lib/mempool/mempool_trace_points.c
@@ -77,3 +77,12 @@ RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free,
 
 RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname,
 	lib.mempool.set.ops.byname)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk,
+	lib.mempool.cache.zc.put.bulk)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_rewind,
+	lib.mempool.cache.zc.put.rewind)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk,
+	lib.mempool.cache.zc.get.bulk)
diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 9f530db24b..00387e7543 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -47,6 +47,7 @@
 #include <rte_ring.h>
 #include <rte_memcpy.h>
 #include <rte_common.h>
+#include <rte_errno.h>
 
 #include "rte_mempool_trace_fp.h"
 
@@ -1346,6 +1347,170 @@ rte_mempool_cache_flush(struct rte_mempool_cache *cache,
 	cache->len = 0;
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy put objects in a user-owned mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to be put in the mempool cache.
+ * @return
+ *   The pointer to where to put the objects in the mempool cache.
+ *   NULL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+__rte_experimental
+static __rte_always_inline void *
+rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	void **cache_objs;
+
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
+
+	if (n <= cache->flushthresh - cache->len) {
+		/*
+		 * The objects can be added to the cache without crossing the
+		 * flush threshold.
+		 */
+		cache_objs = &cache->objs[cache->len];
+		cache->len += n;
+	} else if (likely(n <= cache->flushthresh)) {
+		/*
+		 * The request itself fits into the cache.
+		 * But first, the cache must be flushed to the backend, so
+		 * adding the objects does not cross the flush threshold.
+		 */
+		cache_objs = &cache->objs[0];
+		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
+		cache->len = n;
+	} else {
+		/* The request itself is too big for the cache. */
+		return NULL;
+	}
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
+
+	return cache_objs;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy un-put objects in a user-owned mempool cache.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param n
+ *   The number of objects not put in the mempool cache after calling
+ *   rte_mempool_cache_zc_put_bulk().
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
+		unsigned int n)
+{
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(n <= cache->len);
+
+	rte_mempool_trace_cache_zc_put_rewind(cache, n);
+
+	cache->len -= n;
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy get objects from a user-owned mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to prefetch into the mempool cache.
+ * @return
+ *   The pointer to the objects in the mempool cache.
+ *   NULL on error; i.e. the cache + the pool does not contain 'n' objects.
+ *   With rte_errno set to the error code of the mempool dequeue function,
+ *   or EINVAL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+__rte_experimental
+static __rte_always_inline void *
+rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	unsigned int len;
+
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
+
+	len = cache->len;
+
+	if (n <= len) {
+		/* The request can be satisfied from the cache as is. */
+		len -= n;
+	} else if (likely(n <= cache->flushthresh)) {
+		/*
+		 * The request itself can be satisfied from the cache.
+		 * But first, the cache must be filled from the backend;
+		 * fetch size + requested - len objects.
+		 */
+		int ret;
+		const unsigned int size = cache->size;
+
+		ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], size + n - len);
+		if (unlikely(ret < 0)) {
+			/*
+			 * We are buffer constrained.
+			 * Do not fill the cache, just satisfy the request.
+			 */
+			ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], n - len);
+			if (unlikely(ret < 0)) {
+				/* Unable to satisfy the request. */
+
+				RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
+				RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
+
+				rte_errno = -ret;
+				return NULL;
+			}
+
+			len = 0;
+		} else
+			len = size;
+	} else {
+		/* The request itself is too big for the cache. */
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	cache->len = len;
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
+	return &cache->objs[len];
+}
+
 /**
  * @internal Put several objects back in the mempool; used internally.
  * @param mp
diff --git a/lib/mempool/rte_mempool_trace_fp.h b/lib/mempool/rte_mempool_trace_fp.h
index ed060e887c..14666457f7 100644
--- a/lib/mempool/rte_mempool_trace_fp.h
+++ b/lib/mempool/rte_mempool_trace_fp.h
@@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
 	rte_trace_point_emit_ptr(mempool);
 )
 
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_put_bulk,
+	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_ptr(mempool);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_put_rewind,
+	RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_get_bulk,
+	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_ptr(mempool);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/mempool/version.map b/lib/mempool/version.map
index b67d7aace7..1383ae6db2 100644
--- a/lib/mempool/version.map
+++ b/lib/mempool/version.map
@@ -63,6 +63,11 @@ EXPERIMENTAL {
 	__rte_mempool_trace_ops_alloc;
 	__rte_mempool_trace_ops_free;
 	__rte_mempool_trace_set_ops_byname;
+
+	# added in 23.03
+	__rte_mempool_trace_cache_zc_put_bulk;
+	__rte_mempool_trace_cache_zc_put_rewind;
+	__rte_mempool_trace_cache_zc_get_bulk;
 };
 
 INTERNAL {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v2] mempool cache: add zero-copy get and put functions
  2022-12-23 16:58       ` Konstantin Ananyev
@ 2022-12-24 12:17         ` Morten Brørup
  0 siblings, 0 replies; 36+ messages in thread
From: Morten Brørup @ 2022-12-24 12:17 UTC (permalink / raw)
  To: Konstantin Ananyev, olivier.matz, andrew.rybchenko,
	honnappa.nagarahalli, kamalakshitha.aligeri, bruce.richardson,
	dev
  Cc: nd

> From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> Sent: Friday, 23 December 2022 17.58
> 
> > > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> > > Sent: Thursday, 22 December 2022 16.57
> > >
> > > > Zero-copy access to mempool caches is beneficial for PMD
> performance,
> > > and
> > > > must be provided by the mempool library to fix [Bug 1052] without
> a
> > > > performance regression.
> > >
> > > LGTM in general, thank you for working on it.
> > > Few comments below.

[...]

> > > RTE_ASSERT(n <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> > > I think it is too excessive.
> > > Just:
> > > if (n <= RTE_MEMPOOL_CACHE_MAX_SIZE) return NULL;
> > > seems much more convenient for the users here and
> > > more close to other mempool/ring API behavior.
> > > In terms of performance - I don’t think one extra comparison here
> > > would really count.
> >
> > The insignificant performance degradation seems like a good tradeoff
> for making the function more generic.
> > I will update the function documentation and place the run-time check
> here, so both trace and stats reflect what happened:
> >
> > 	RTE_ASSERT(cache != NULL);
> > 	RTE_ASSERT(mp != NULL);
> > -	RTE_ASSERT(n <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> >
> > 	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> > +
> > +	if (unlikely(n > RTE_MEMPOOL_CACHE_MAX_SIZE)) {
> > +		rte_errno = -ENOSPC; // Or EINVAL?
> > +		return NULL;
> > +	}
> >
> > 	/* Increment stats now, adding in mempool always succeeds. */
> >
> > I will probably also be able to come up with solution for
> zc_get_bulk(), so both trace and stats make sense if called with n >
> > RTE_MEMPOOL_CACHE_MAX_SIZE.

I have sent a new patch, where I switched to the same code flow as in the micro-optimization patch, so this run-time check doesn't affect the most common case.

Also, I realized that I need to compare to the cache flush threshold instead of RTE_MEMPOOL_CACHE_MAX_SIZE, to respect the cache size. Otherwise, a zc_cache_get() operation could deplete a small mempool; and zc_cache_put() could leave the cache with too many objects, thus violating the invariant that cache->len <= cache->flushthreshold.

> >
> > >
> > > I also think would be really good to add:
> > > add zc_(get|put)_bulk_start(),  zc_(get|put)_bulk_finish().
> > > Where _start would check/fill the cache and return the pointer,
> > > while _finsih would updathe cache->len.
> > > Similar to what we have for rte_ring _peek_ API.
> > > That would allow to extend this API usage - let say inside PMDs
> > > it could be used not only for MBUF_FAST_FREE case,  but for generic
> > > TX code path (one that have to call rte_mbuf_prefree()) also.
> >
> > I don't see a use case for zc_get_start()/_finish().
> >
> > And since the mempool cache is a stack, it would *require* that the
> application reads the array in reverse order. In such case, the
> > function should not return a pointer to the array of objects, but a
> pointer to the top of the stack.
> >
> > So I prefer to stick with the single-function zero-copy get, i.e.
> without start/finish.
> 
> Yes, it would be more complicated than just update cache->len.
> I don't have any real use-case for _get_ too - mostly just for symmetry
> with put.
> 
> >
> >
> > I do agree with you about the use case for zc_put_start()/_finish().
> >
> > Unlike the ring, there is no need for locking with the mempool cache,
> so we can implement something much simpler:
> >
> > Instead of requiring calling both zc_put_start() and _finish() for
> every zero-copy burst, we could add a zc_put_rewind() function, only
> > to be called if some number of objects were not added anyway:
> >
> > /* FIXME: Function documentation here. */
> > __rte_experimental
> > static __rte_always_inline void
> > rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> > 		unsigned int n)
> > {
> > 	RTE_ASSERT(cache != NULL);
> > 	RTE_ASSERT(n <= cache->len);
> >
> > 	rte_mempool_trace_cache_zc_put_rewind(cache, n);
> >
> > 	/* Rewind stats. */
> > 	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, -n);
> >
> > 	cache->len -= n;
> > }
> >
> > I have a strong preference for _rewind() over _start() and _finish(),
> because in the full burst case, it only touches the
> > rte_mempool_cache structure once, whereas splitting it up into
> _start() and _finish() touches the rte_mempool_cache structure both
> > before and after copying the array of objects.
> >
> > What do you think?
> 
> And your concern is that between _get_start(_C_) and get_finish(_C_)
> the _C_
> cache line can be bumped out of CPU Dcache, right?
> I don't think such situation would be a common one.

Yes, that is the essence of my concern. And I agree that it is probably uncommon.

There might also be some performance benefits by having the load/store/modify of _C_ closely together; but I don't know enough about CPU internals to determine if significant or not.

> But, if you think _rewind_ is a better approach - I am ok with it.

Thank you.

[...]

> > > Would be great to add some test-cases in app/test to cover this new
> > > API.
> >
> > Good point. I will look at it.
> >
> > BTW: Akshitha already has zc_put_bulk working in the i40e PMD.
> 
> That's great news, but I suppose it would be good to have some UT for
> it anyway.
> Konstantin

I don't have time for adding unit tests now, but sent an updated patch anyway, so the invariant bug doesn't bite Akshitha.

Merry Christmas, everyone!

-Morten

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4] mempool cache: add zero-copy get and put functions
  2022-12-24 11:55 ` [PATCH v4] " Morten Brørup
@ 2022-12-27  9:24   ` Andrew Rybchenko
  2022-12-27 10:31     ` Morten Brørup
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Rybchenko @ 2022-12-27  9:24 UTC (permalink / raw)
  To: Morten Brørup, olivier.matz, honnappa.nagarahalli,
	kamalakshitha.aligeri, bruce.richardson, konstantin.ananyev, dev
  Cc: nd

On 12/24/22 14:55, Morten Brørup wrote:
> Zero-copy access to mempool caches is beneficial for PMD performance, and
> must be provided by the mempool library to fix [Bug 1052] without a
> performance regression.
> 
> [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052

Bugzilla ID: 1052

> 
> v4:
> * Fix checkpatch warnings.
> v3:
> * Bugfix: Respect the cache size; compare to the flush threshold instead
>    of RTE_MEMPOOL_CACHE_MAX_SIZE.
> * Added 'rewind' function for incomplete 'put' operations. (Konstantin)
> * Replace RTE_ASSERTs with runtime checks of the request size.
>    Instead of failing, return NULL if the request is too big. (Konstantin)
> * Modified comparison to prevent overflow if n is really huge and len is
>    non-zero.
> * Updated the comments in the code.
> v2:
> * Fix checkpatch warnings.
> * Fix missing registration of trace points.
> * The functions are inline, so they don't go into the map file.
> v1 changes from the RFC:
> * Removed run-time parameter checks. (Honnappa)
>    This is a hot fast path function; requiring correct application
>    behaviour, i.e. function parameters must be valid.
> * Added RTE_ASSERT for parameters instead.
>    Code for this is only generated if built with RTE_ENABLE_ASSERT.
> * Removed fallback when 'cache' parameter is not set. (Honnappa)
> * Chose the simple get function; i.e. do not move the existing objects in
>    the cache to the top of the new stack, just leave them at the bottom.
> * Renamed the functions. Other suggestions are welcome, of course. ;-)
> * Updated the function descriptions.
> * Added the functions to trace_fp and version.map.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>

[snip]

> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 9f530db24b..00387e7543 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h

[snip]

> @@ -1346,6 +1347,170 @@ rte_mempool_cache_flush(struct rte_mempool_cache *cache,
>   	cache->len = 0;
>   }
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
> + *
> + * Zero-copy put objects in a user-owned mempool cache backed by the specified mempool.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param mp
> + *   A pointer to the mempool.
> + * @param n
> + *   The number of objects to be put in the mempool cache.
> + * @return
> + *   The pointer to where to put the objects in the mempool cache.
> + *   NULL if the request itself is too big for the cache, i.e.
> + *   exceeds the cache flush threshold.
> + */
> +__rte_experimental
> +static __rte_always_inline void *
> +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> +		struct rte_mempool *mp,
> +		unsigned int n)
> +{
> +	void **cache_objs;
> +
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(mp != NULL);
> +
> +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);

Wouldn't it be better to do tracing nearby return value to
trace return value as well?

> +
> +	if (n <= cache->flushthresh - cache->len) {
> +		/*
> +		 * The objects can be added to the cache without crossing the
> +		 * flush threshold.
> +		 */
> +		cache_objs = &cache->objs[cache->len];
> +		cache->len += n;
> +	} else if (likely(n <= cache->flushthresh)) {
> +		/*
> +		 * The request itself fits into the cache.
> +		 * But first, the cache must be flushed to the backend, so
> +		 * adding the objects does not cross the flush threshold.
> +		 */
> +		cache_objs = &cache->objs[0];
> +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
> +		cache->len = n;
> +	} else {
> +		/* The request itself is too big for the cache. */
> +		return NULL;
> +	}
> +
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);

It duplicates put function code. Shouldn't put function use
this one to avoid duplication?

> +
> +	return cache_objs;
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
> + *
> + * Zero-copy un-put objects in a user-owned mempool cache.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param n
> + *   The number of objects not put in the mempool cache after calling
> + *   rte_mempool_cache_zc_put_bulk().
> + */
> +__rte_experimental
> +static __rte_always_inline void
> +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> +		unsigned int n)
> +{
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(n <= cache->len);
> +
> +	rte_mempool_trace_cache_zc_put_rewind(cache, n);
> +
> +	cache->len -= n;
> +
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
> + *
> + * Zero-copy get objects from a user-owned mempool cache backed by the specified mempool.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param mp
> + *   A pointer to the mempool.
> + * @param n
> + *   The number of objects to prefetch into the mempool cache.
> + * @return
> + *   The pointer to the objects in the mempool cache.
> + *   NULL on error; i.e. the cache + the pool does not contain 'n' objects.
> + *   With rte_errno set to the error code of the mempool dequeue function,
> + *   or EINVAL if the request itself is too big for the cache, i.e.
> + *   exceeds the cache flush threshold.
> + */
> +__rte_experimental
> +static __rte_always_inline void *
> +rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
> +		struct rte_mempool *mp,
> +		unsigned int n)
> +{
> +	unsigned int len;
> +
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(mp != NULL);
> +
> +	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
> +
> +	len = cache->len;
> +
> +	if (n <= len) {
> +		/* The request can be satisfied from the cache as is. */
> +		len -= n;
> +	} else if (likely(n <= cache->flushthresh)) {
> +		/*
> +		 * The request itself can be satisfied from the cache.
> +		 * But first, the cache must be filled from the backend;
> +		 * fetch size + requested - len objects.
> +		 */
> +		int ret;
> +		const unsigned int size = cache->size;
> +
> +		ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], size + n - len);

If size is RTE_MEMPOOL_CACHE_MAX_SIZE and n is cache->flushthres (i.e. 
RTE_MEMPOOL_CACHE_MAX_SIZE * 1.5), we'll dequeue up to 
RTE_MEMPOOL_CACHE_MAX_SIZE * 2.5 objects
whereas cache objects size is just
RTE_MEMPOOL_CACHE_MAX_SIZE * 2. Am I missing something?


> +		if (unlikely(ret < 0)) {
> +			/*
> +			 * We are buffer constrained.
> +			 * Do not fill the cache, just satisfy the request.
> +			 */
> +			ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], n - len);
> +			if (unlikely(ret < 0)) {
> +				/* Unable to satisfy the request. */
> +
> +				RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
> +				RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
> +
> +				rte_errno = -ret;
> +				return NULL;
> +			}
> +
> +			len = 0;
> +		} else
> +			len = size;

Curly brackets are required in else branch since if branch
above has curly brackets.

> +	} else {
> +		/* The request itself is too big for the cache. */
> +		rte_errno = EINVAL;
> +		return NULL;
> +	}
> +
> +	cache->len = len;
> +
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> +
> +	return &cache->objs[len];
> +}
> +
>   /**
>    * @internal Put several objects back in the mempool; used internally.
>    * @param mp


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v4] mempool cache: add zero-copy get and put functions
  2022-12-27  9:24   ` Andrew Rybchenko
@ 2022-12-27 10:31     ` Morten Brørup
  0 siblings, 0 replies; 36+ messages in thread
From: Morten Brørup @ 2022-12-27 10:31 UTC (permalink / raw)
  To: Andrew Rybchenko, olivier.matz, honnappa.nagarahalli,
	kamalakshitha.aligeri, bruce.richardson, konstantin.ananyev, dev
  Cc: nd

> From: Andrew Rybchenko [mailto:andrew.rybchenko@oktetlabs.ru]
> Sent: Tuesday, 27 December 2022 10.24
> 
> On 12/24/22 14:55, Morten Brørup wrote:
> > Zero-copy access to mempool caches is beneficial for PMD performance,
> and
> > must be provided by the mempool library to fix [Bug 1052] without a
> > performance regression.
> >
> > [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> 
> Bugzilla ID: 1052
> 
> >
> > v4:
> > * Fix checkpatch warnings.
> > v3:
> > * Bugfix: Respect the cache size; compare to the flush threshold
> instead
> >    of RTE_MEMPOOL_CACHE_MAX_SIZE.
> > * Added 'rewind' function for incomplete 'put' operations.
> (Konstantin)
> > * Replace RTE_ASSERTs with runtime checks of the request size.
> >    Instead of failing, return NULL if the request is too big.
> (Konstantin)
> > * Modified comparison to prevent overflow if n is really huge and len
> is
> >    non-zero.
> > * Updated the comments in the code.
> > v2:
> > * Fix checkpatch warnings.
> > * Fix missing registration of trace points.
> > * The functions are inline, so they don't go into the map file.
> > v1 changes from the RFC:
> > * Removed run-time parameter checks. (Honnappa)
> >    This is a hot fast path function; requiring correct application
> >    behaviour, i.e. function parameters must be valid.
> > * Added RTE_ASSERT for parameters instead.
> >    Code for this is only generated if built with RTE_ENABLE_ASSERT.
> > * Removed fallback when 'cache' parameter is not set. (Honnappa)
> > * Chose the simple get function; i.e. do not move the existing
> objects in
> >    the cache to the top of the new stack, just leave them at the
> bottom.
> > * Renamed the functions. Other suggestions are welcome, of course. ;-
> )
> > * Updated the function descriptions.
> > * Added the functions to trace_fp and version.map.
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> 
> [snip]
> 
> > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> > index 9f530db24b..00387e7543 100644
> > --- a/lib/mempool/rte_mempool.h
> > +++ b/lib/mempool/rte_mempool.h
> 
> [snip]
> 
> > @@ -1346,6 +1347,170 @@ rte_mempool_cache_flush(struct
> rte_mempool_cache *cache,
> >   	cache->len = 0;
> >   }
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: This API may change, or be removed, without
> prior notice.
> > + *
> > + * Zero-copy put objects in a user-owned mempool cache backed by the
> specified mempool.
> > + *
> > + * @param cache
> > + *   A pointer to the mempool cache.
> > + * @param mp
> > + *   A pointer to the mempool.
> > + * @param n
> > + *   The number of objects to be put in the mempool cache.
> > + * @return
> > + *   The pointer to where to put the objects in the mempool cache.
> > + *   NULL if the request itself is too big for the cache, i.e.
> > + *   exceeds the cache flush threshold.
> > + */
> > +__rte_experimental
> > +static __rte_always_inline void *
> > +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > +		struct rte_mempool *mp,
> > +		unsigned int n)
> > +{
> > +	void **cache_objs;
> > +
> > +	RTE_ASSERT(cache != NULL);
> > +	RTE_ASSERT(mp != NULL);
> > +
> > +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> 
> Wouldn't it be better to do tracing nearby return value to
> trace return value as well?

The existing mempool put operations have trace at the beginning, so I followed that convention. Considering that rte_mempool_do_generic_put() may call rte_mempool_ops_enqueue_bulk(), this convention ensures that trace output is emitted in the same order as the functions are called. I think this is the correct way of tracing.

> 
> > +
> > +	if (n <= cache->flushthresh - cache->len) {
> > +		/*
> > +		 * The objects can be added to the cache without crossing
> the
> > +		 * flush threshold.
> > +		 */
> > +		cache_objs = &cache->objs[cache->len];
> > +		cache->len += n;
> > +	} else if (likely(n <= cache->flushthresh)) {
> > +		/*
> > +		 * The request itself fits into the cache.
> > +		 * But first, the cache must be flushed to the backend, so
> > +		 * adding the objects does not cross the flush threshold.
> > +		 */
> > +		cache_objs = &cache->objs[0];
> > +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
> > +		cache->len = n;
> > +	} else {
> > +		/* The request itself is too big for the cache. */
> > +		return NULL;
> > +	}
> > +
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> 
> It duplicates put function code. Shouldn't put function use
> this one to avoid duplication?

Calling it from the put function would emit confusing trace output, indicating that the zero-copy put function was called - which it was, but not from the application.

I get your point about code duplication, so I will move its innards to an internal function (without trace), and call that function from both the existing put function and the new zero-copy put function (which will become a thin wrapper function, to also call the trace).

> 
> > +
> > +	return cache_objs;
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: This API may change, or be removed, without
> prior notice.
> > + *
> > + * Zero-copy un-put objects in a user-owned mempool cache.
> > + *
> > + * @param cache
> > + *   A pointer to the mempool cache.
> > + * @param n
> > + *   The number of objects not put in the mempool cache after
> calling
> > + *   rte_mempool_cache_zc_put_bulk().
> > + */
> > +__rte_experimental
> > +static __rte_always_inline void
> > +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> > +		unsigned int n)
> > +{
> > +	RTE_ASSERT(cache != NULL);
> > +	RTE_ASSERT(n <= cache->len);
> > +
> > +	rte_mempool_trace_cache_zc_put_rewind(cache, n);
> > +
> > +	cache->len -= n;
> > +
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n);
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: This API may change, or be removed, without
> prior notice.
> > + *
> > + * Zero-copy get objects from a user-owned mempool cache backed by
> the specified mempool.
> > + *
> > + * @param cache
> > + *   A pointer to the mempool cache.
> > + * @param mp
> > + *   A pointer to the mempool.
> > + * @param n
> > + *   The number of objects to prefetch into the mempool cache.
> > + * @return
> > + *   The pointer to the objects in the mempool cache.
> > + *   NULL on error; i.e. the cache + the pool does not contain 'n'
> objects.
> > + *   With rte_errno set to the error code of the mempool dequeue
> function,
> > + *   or EINVAL if the request itself is too big for the cache, i.e.
> > + *   exceeds the cache flush threshold.
> > + */
> > +__rte_experimental
> > +static __rte_always_inline void *
> > +rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
> > +		struct rte_mempool *mp,
> > +		unsigned int n)
> > +{
> > +	unsigned int len;
> > +
> > +	RTE_ASSERT(cache != NULL);
> > +	RTE_ASSERT(mp != NULL);
> > +
> > +	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
> > +
> > +	len = cache->len;
> > +
> > +	if (n <= len) {
> > +		/* The request can be satisfied from the cache as is. */
> > +		len -= n;
> > +	} else if (likely(n <= cache->flushthresh)) {
> > +		/*
> > +		 * The request itself can be satisfied from the cache.
> > +		 * But first, the cache must be filled from the backend;
> > +		 * fetch size + requested - len objects.
> > +		 */
> > +		int ret;
> > +		const unsigned int size = cache->size;
> > +
> > +		ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len],
> size + n - len);
> 
> If size is RTE_MEMPOOL_CACHE_MAX_SIZE and n is cache->flushthres (i.e.
> RTE_MEMPOOL_CACHE_MAX_SIZE * 1.5), we'll dequeue up to
> RTE_MEMPOOL_CACHE_MAX_SIZE * 2.5 objects
> whereas cache objects size is just
> RTE_MEMPOOL_CACHE_MAX_SIZE * 2. Am I missing something?

Good catch! I must compare n to cache->size to avoid this.

> 
> 
> > +		if (unlikely(ret < 0)) {
> > +			/*
> > +			 * We are buffer constrained.
> > +			 * Do not fill the cache, just satisfy the request.
> > +			 */
> > +			ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> >objs[len], n - len);
> > +			if (unlikely(ret < 0)) {
> > +				/* Unable to satisfy the request. */
> > +
> > +				RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
> > +				RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
> > +
> > +				rte_errno = -ret;
> > +				return NULL;
> > +			}
> > +
> > +			len = 0;
> > +		} else
> > +			len = size;
> 
> Curly brackets are required in else branch since if branch
> above has curly brackets.

Thanks. I also thought this looked silly. It would be nice if checkpatch emitted a warning here.

> 
> > +	} else {
> > +		/* The request itself is too big for the cache. */
> > +		rte_errno = EINVAL;
> > +		return NULL;
> > +	}
> > +
> > +	cache->len = len;
> > +
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> > +
> > +	return &cache->objs[len];
> > +}
> > +
> >   /**
> >    * @internal Put several objects back in the mempool; used
> internally.
> >    * @param mp
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v5] mempool cache: add zero-copy get and put functions
  2022-11-05 13:19 [RFC]: mempool: zero-copy cache get bulk Morten Brørup
                   ` (4 preceding siblings ...)
  2022-12-24 11:55 ` [PATCH v4] " Morten Brørup
@ 2022-12-27 15:17 ` Morten Brørup
  2023-01-22 20:34   ` Konstantin Ananyev
  2023-02-09 14:39 ` [PATCH v6] " Morten Brørup
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 36+ messages in thread
From: Morten Brørup @ 2022-12-27 15:17 UTC (permalink / raw)
  To: olivier.matz, andrew.rybchenko, honnappa.nagarahalli,
	kamalakshitha.aligeri, bruce.richardson, konstantin.ananyev, dev
  Cc: nd, Morten Brørup

Zero-copy access to mempool caches is beneficial for PMD performance, and
must be provided by the mempool library to fix [Bug 1052] without a
performance regression.

[Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052

Bugzilla ID: 1052

v5:
* Bugfix: Compare zero-copy get request to the cache size instead of the
  flush threshold; otherwise refill could overflow the memory allocated
  for the cache. (Andrew)
* Split the zero-copy put function into an internal function doing the
  work, and a public function with trace.
* Avoid code duplication by rewriting rte_mempool_do_generic_put() to use
  the internal zero-copy put function. (Andrew)
* Corrected the return type of rte_mempool_cache_zc_put_bulk() from void *
  to void **; it returns a pointer to an array of objects.
* Fix coding style: Add missing curly brackets. (Andrew)
v4:
* Fix checkpatch warnings.
v3:
* Bugfix: Respect the cache size; compare to the flush threshold instead
  of RTE_MEMPOOL_CACHE_MAX_SIZE.
* Added 'rewind' function for incomplete 'put' operations. (Konstantin)
* Replace RTE_ASSERTs with runtime checks of the request size.
  Instead of failing, return NULL if the request is too big. (Konstantin)
* Modified comparison to prevent overflow if n is really huge and len is
  non-zero.
* Updated the comments in the code.
v2:
* Fix checkpatch warnings.
* Fix missing registration of trace points.
* The functions are inline, so they don't go into the map file.
v1 changes from the RFC:
* Removed run-time parameter checks. (Honnappa)
  This is a hot fast path function; requiring correct application
  behaviour, i.e. function parameters must be valid.
* Added RTE_ASSERT for parameters instead.
  Code for this is only generated if built with RTE_ENABLE_ASSERT.
* Removed fallback when 'cache' parameter is not set. (Honnappa)
* Chose the simple get function; i.e. do not move the existing objects in
  the cache to the top of the new stack, just leave them at the bottom.
* Renamed the functions. Other suggestions are welcome, of course. ;-)
* Updated the function descriptions.
* Added the functions to trace_fp and version.map.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
 lib/mempool/mempool_trace_points.c |   9 ++
 lib/mempool/rte_mempool.h          | 237 +++++++++++++++++++++++++----
 lib/mempool/rte_mempool_trace_fp.h |  23 +++
 lib/mempool/version.map            |   5 +
 4 files changed, 245 insertions(+), 29 deletions(-)

diff --git a/lib/mempool/mempool_trace_points.c b/lib/mempool/mempool_trace_points.c
index 4ad76deb34..83d353a764 100644
--- a/lib/mempool/mempool_trace_points.c
+++ b/lib/mempool/mempool_trace_points.c
@@ -77,3 +77,12 @@ RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free,
 
 RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname,
 	lib.mempool.set.ops.byname)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk,
+	lib.mempool.cache.zc.put.bulk)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_rewind,
+	lib.mempool.cache.zc.put.rewind)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk,
+	lib.mempool.cache.zc.get.bulk)
diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 9f530db24b..5efd3c2b5b 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -47,6 +47,7 @@
 #include <rte_ring.h>
 #include <rte_memcpy.h>
 #include <rte_common.h>
+#include <rte_errno.h>
 
 #include "rte_mempool_trace_fp.h"
 
@@ -1346,6 +1347,197 @@ rte_mempool_cache_flush(struct rte_mempool_cache *cache,
 	cache->len = 0;
 }
 
+/**
+ * @internal used by rte_mempool_cache_zc_put_bulk() and rte_mempool_do_generic_put().
+ *
+ * Zero-copy put objects in a user-owned mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to be put in the mempool cache.
+ * @return
+ *   The pointer to where to put the objects in the mempool cache.
+ *   NULL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+static __rte_always_inline void **
+__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	void **cache_objs;
+
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	if (n <= cache->flushthresh - cache->len) {
+		/*
+		 * The objects can be added to the cache without crossing the
+		 * flush threshold.
+		 */
+		cache_objs = &cache->objs[cache->len];
+		cache->len += n;
+	} else if (likely(n <= cache->flushthresh)) {
+		/*
+		 * The request itself fits into the cache.
+		 * But first, the cache must be flushed to the backend, so
+		 * adding the objects does not cross the flush threshold.
+		 */
+		cache_objs = &cache->objs[0];
+		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
+		cache->len = n;
+	} else {
+		/* The request itself is too big for the cache. */
+		return NULL;
+	}
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
+
+	return cache_objs;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy put objects in a user-owned mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to be put in the mempool cache.
+ * @return
+ *   The pointer to where to put the objects in the mempool cache.
+ *   NULL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+__rte_experimental
+static __rte_always_inline void **
+rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
+	return __rte_mempool_cache_zc_put_bulk(cache, mp, n);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy un-put objects in a user-owned mempool cache.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param n
+ *   The number of objects not put in the mempool cache after calling
+ *   rte_mempool_cache_zc_put_bulk().
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
+		unsigned int n)
+{
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(n <= cache->len);
+
+	rte_mempool_trace_cache_zc_put_rewind(cache, n);
+
+	cache->len -= n;
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy get objects from a user-owned mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to prefetch into the mempool cache.
+ * @return
+ *   The pointer to the objects in the mempool cache.
+ *   NULL on error; i.e. the cache + the pool does not contain 'n' objects.
+ *   With rte_errno set to the error code of the mempool dequeue function,
+ *   or EINVAL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+__rte_experimental
+static __rte_always_inline void *
+rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	unsigned int len, size;
+
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
+
+	len = cache->len;
+	size = cache->size;
+
+	if (n <= len) {
+		/* The request can be satisfied from the cache as is. */
+		len -= n;
+	} else if (likely(n <= size)) {
+		/*
+		 * The request itself can be satisfied from the cache.
+		 * But first, the cache must be filled from the backend;
+		 * fetch size + requested - len objects.
+		 */
+		int ret;
+
+		ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], size + n - len);
+		if (unlikely(ret < 0)) {
+			/*
+			 * We are buffer constrained.
+			 * Do not fill the cache, just satisfy the request.
+			 */
+			ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], n - len);
+			if (unlikely(ret < 0)) {
+				/* Unable to satisfy the request. */
+
+				RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
+				RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
+
+				rte_errno = -ret;
+				return NULL;
+			}
+
+			len = 0;
+		} else {
+			len = size;
+		}
+	} else {
+		/* The request itself is too big for the cache. */
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	cache->len = len;
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
+	return &cache->objs[len];
+}
+
 /**
  * @internal Put several objects back in the mempool; used internally.
  * @param mp
@@ -1364,32 +1556,25 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 {
 	void **cache_objs;
 
-	/* No cache provided */
-	if (unlikely(cache == NULL))
-		goto driver_enqueue;
+	/* No cache provided? */
+	if (unlikely(cache == NULL)) {
+		/* Increment stats now, adding in mempool always succeeds. */
+		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
+		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
 
-	/* increment stat now, adding in mempool always success */
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
+		goto driver_enqueue;
+	}
 
-	/* The request itself is too big for the cache */
-	if (unlikely(n > cache->flushthresh))
-		goto driver_enqueue_stats_incremented;
+	/* Prepare to add the objects to the cache. */
+	cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
 
-	/*
-	 * The cache follows the following algorithm:
-	 *   1. If the objects cannot be added to the cache without crossing
-	 *      the flush threshold, flush the cache to the backend.
-	 *   2. Add the objects to the cache.
-	 */
+	/* The request itself is too big for the cache? */
+	if (unlikely(cache_objs == NULL)) {
+		/* Increment stats now, adding in mempool always succeeds. */
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
 
-	if (cache->len + n <= cache->flushthresh) {
-		cache_objs = &cache->objs[cache->len];
-		cache->len += n;
-	} else {
-		cache_objs = &cache->objs[0];
-		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
-		cache->len = n;
+		goto driver_enqueue;
 	}
 
 	/* Add the objects to the cache. */
@@ -1399,13 +1584,7 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 
 driver_enqueue:
 
-	/* increment stat now, adding in mempool always success */
-	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
-	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
-
-driver_enqueue_stats_incremented:
-
-	/* push objects to the backend */
+	/* Push the objects to the backend. */
 	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
 }
 
diff --git a/lib/mempool/rte_mempool_trace_fp.h b/lib/mempool/rte_mempool_trace_fp.h
index ed060e887c..14666457f7 100644
--- a/lib/mempool/rte_mempool_trace_fp.h
+++ b/lib/mempool/rte_mempool_trace_fp.h
@@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
 	rte_trace_point_emit_ptr(mempool);
 )
 
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_put_bulk,
+	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_ptr(mempool);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_put_rewind,
+	RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_get_bulk,
+	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_ptr(mempool);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/mempool/version.map b/lib/mempool/version.map
index b67d7aace7..1383ae6db2 100644
--- a/lib/mempool/version.map
+++ b/lib/mempool/version.map
@@ -63,6 +63,11 @@ EXPERIMENTAL {
 	__rte_mempool_trace_ops_alloc;
 	__rte_mempool_trace_ops_free;
 	__rte_mempool_trace_set_ops_byname;
+
+	# added in 23.03
+	__rte_mempool_trace_cache_zc_put_bulk;
+	__rte_mempool_trace_cache_zc_put_rewind;
+	__rte_mempool_trace_cache_zc_get_bulk;
 };
 
 INTERNAL {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5] mempool cache: add zero-copy get and put functions
  2022-12-27 15:17 ` [PATCH v5] " Morten Brørup
@ 2023-01-22 20:34   ` Konstantin Ananyev
  2023-01-22 21:17     ` Morten Brørup
  0 siblings, 1 reply; 36+ messages in thread
From: Konstantin Ananyev @ 2023-01-22 20:34 UTC (permalink / raw)
  To: Morten Brørup, olivier.matz, andrew.rybchenko,
	honnappa.nagarahalli, kamalakshitha.aligeri, bruce.richardson,
	konstantin.ananyev, dev
  Cc: nd

Hi Morten,

Few nits, see below.
Also I still think we do need a test case for _zc_get_ before
accepting it in the mainline.
With that in place:
Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>

> Zero-copy access to mempool caches is beneficial for PMD performance, and
> must be provided by the mempool library to fix [Bug 1052] without a
> performance regression.
> 
> [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> 
> Bugzilla ID: 1052
> 
> v5:
> * Bugfix: Compare zero-copy get request to the cache size instead of the
>    flush threshold; otherwise refill could overflow the memory allocated
>    for the cache. (Andrew)
> * Split the zero-copy put function into an internal function doing the
>    work, and a public function with trace.
> * Avoid code duplication by rewriting rte_mempool_do_generic_put() to use
>    the internal zero-copy put function. (Andrew)
> * Corrected the return type of rte_mempool_cache_zc_put_bulk() from void *
>    to void **; it returns a pointer to an array of objects.
> * Fix coding style: Add missing curly brackets. (Andrew)
> v4:
> * Fix checkpatch warnings.
> v3:
> * Bugfix: Respect the cache size; compare to the flush threshold instead
>    of RTE_MEMPOOL_CACHE_MAX_SIZE.
> * Added 'rewind' function for incomplete 'put' operations. (Konstantin)
> * Replace RTE_ASSERTs with runtime checks of the request size.
>    Instead of failing, return NULL if the request is too big. (Konstantin)
> * Modified comparison to prevent overflow if n is really huge and len is
>    non-zero.
> * Updated the comments in the code.
> v2:
> * Fix checkpatch warnings.
> * Fix missing registration of trace points.
> * The functions are inline, so they don't go into the map file.
> v1 changes from the RFC:
> * Removed run-time parameter checks. (Honnappa)
>    This is a hot fast path function; requiring correct application
>    behaviour, i.e. function parameters must be valid.
> * Added RTE_ASSERT for parameters instead.
>    Code for this is only generated if built with RTE_ENABLE_ASSERT.
> * Removed fallback when 'cache' parameter is not set. (Honnappa)
> * Chose the simple get function; i.e. do not move the existing objects in
>    the cache to the top of the new stack, just leave them at the bottom.
> * Renamed the functions. Other suggestions are welcome, of course. ;-)
> * Updated the function descriptions.
> * Added the functions to trace_fp and version.map.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
>   lib/mempool/mempool_trace_points.c |   9 ++
>   lib/mempool/rte_mempool.h          | 237 +++++++++++++++++++++++++----
>   lib/mempool/rte_mempool_trace_fp.h |  23 +++
>   lib/mempool/version.map            |   5 +
>   4 files changed, 245 insertions(+), 29 deletions(-)
> 
> diff --git a/lib/mempool/mempool_trace_points.c b/lib/mempool/mempool_trace_points.c
> index 4ad76deb34..83d353a764 100644
> --- a/lib/mempool/mempool_trace_points.c
> +++ b/lib/mempool/mempool_trace_points.c
> @@ -77,3 +77,12 @@ RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free,
>   
>   RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname,
>   	lib.mempool.set.ops.byname)
> +
> +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk,
> +	lib.mempool.cache.zc.put.bulk)
> +
> +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_rewind,
> +	lib.mempool.cache.zc.put.rewind)
> +
> +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk,
> +	lib.mempool.cache.zc.get.bulk)
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 9f530db24b..5efd3c2b5b 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -47,6 +47,7 @@
>   #include <rte_ring.h>
>   #include <rte_memcpy.h>
>   #include <rte_common.h>
> +#include <rte_errno.h>
>   
>   #include "rte_mempool_trace_fp.h"
>   
> @@ -1346,6 +1347,197 @@ rte_mempool_cache_flush(struct rte_mempool_cache *cache,
>   	cache->len = 0;
>   }
>   
> +/**
> + * @internal used by rte_mempool_cache_zc_put_bulk() and rte_mempool_do_generic_put().
> + *
> + * Zero-copy put objects in a user-owned mempool cache backed by the specified mempool.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param mp
> + *   A pointer to the mempool.
> + * @param n
> + *   The number of objects to be put in the mempool cache.
> + * @return
> + *   The pointer to where to put the objects in the mempool cache.
> + *   NULL if the request itself is too big for the cache, i.e.
> + *   exceeds the cache flush threshold.
> + */
> +static __rte_always_inline void **
> +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> +		struct rte_mempool *mp,
> +		unsigned int n)
> +{
> +	void **cache_objs;
> +
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(mp != NULL);
> +
> +	if (n <= cache->flushthresh - cache->len) {
> +		/*
> +		 * The objects can be added to the cache without crossing the
> +		 * flush threshold.
> +		 */
> +		cache_objs = &cache->objs[cache->len];
> +		cache->len += n;
> +	} else if (likely(n <= cache->flushthresh)) {
> +		/*
> +		 * The request itself fits into the cache.
> +		 * But first, the cache must be flushed to the backend, so
> +		 * adding the objects does not cross the flush threshold.
> +		 */
> +		cache_objs = &cache->objs[0];
> +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
> +		cache->len = n;
> +	} else {
> +		/* The request itself is too big for the cache. */
> +		return NULL;
> +	}
> +
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> +
> +	return cache_objs;
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
> + *
> + * Zero-copy put objects in a user-owned mempool cache backed by the specified mempool.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param mp
> + *   A pointer to the mempool.
> + * @param n
> + *   The number of objects to be put in the mempool cache.
> + * @return
> + *   The pointer to where to put the objects in the mempool cache.
> + *   NULL if the request itself is too big for the cache, i.e.
> + *   exceeds the cache flush threshold.
> + */
> +__rte_experimental
> +static __rte_always_inline void **
> +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> +		struct rte_mempool *mp,
> +		unsigned int n)
> +{
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(mp != NULL);
> +
> +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> +	return __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
> + *
> + * Zero-copy un-put objects in a user-owned mempool cache.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param n
> + *   The number of objects not put in the mempool cache after calling
> + *   rte_mempool_cache_zc_put_bulk().
> + */
> +__rte_experimental
> +static __rte_always_inline void
> +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> +		unsigned int n)
> +{
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(n <= cache->len);
> +
> +	rte_mempool_trace_cache_zc_put_rewind(cache, n);
> +
> +	cache->len -= n;
> +
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
> + *
> + * Zero-copy get objects from a user-owned mempool cache backed by the specified mempool.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param mp
> + *   A pointer to the mempool.
> + * @param n
> + *   The number of objects to prefetch into the mempool cache.

Why not 'get' instead of 'prefetch'?


> + * @return
> + *   The pointer to the objects in the mempool cache.
> + *   NULL on error; i.e. the cache + the pool does not contain 'n' objects.
> + *   With rte_errno set to the error code of the mempool dequeue function,
> + *   or EINVAL if the request itself is too big for the cache, i.e.
> + *   exceeds the cache flush threshold.
> + */
> +__rte_experimental
> +static __rte_always_inline void *
> +rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
> +		struct rte_mempool *mp,
> +		unsigned int n)
> +{
> +	unsigned int len, size;
> +
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(mp != NULL);
> +
> +	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
> +
> +	len = cache->len;
> +	size = cache->size;
> +
> +	if (n <= len) {
> +		/* The request can be satisfied from the cache as is. */
> +		len -= n;
> +	} else if (likely(n <= size)) {
> +		/*
> +		 * The request itself can be satisfied from the cache.
> +		 * But first, the cache must be filled from the backend;
> +		 * fetch size + requested - len objects.
> +		 */
> +		int ret;
> +
> +		ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], size + n - len);
> +		if (unlikely(ret < 0)) {
> +			/*
> +			 * We are buffer constrained.
> +			 * Do not fill the cache, just satisfy the request.
> +			 */
> +			ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], n - len);
> +			if (unlikely(ret < 0)) {
> +				/* Unable to satisfy the request. */
> +
> +				RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
> +				RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
> +
> +				rte_errno = -ret;
> +				return NULL;
> +			}
> +
> +			len = 0;
> +		} else {
> +			len = size;
> +		}
> +	} else {
> +		/* The request itself is too big for the cache. */
> +		rte_errno = EINVAL;
> +		return NULL;
> +	}
> +
> +	cache->len = len;
> +
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> +
> +	return &cache->objs[len];
> +}
> +
>   /**
>    * @internal Put several objects back in the mempool; used internally.
>    * @param mp
> @@ -1364,32 +1556,25 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
>   {
>   	void **cache_objs;
>   
> -	/* No cache provided */
> -	if (unlikely(cache == NULL))
> -		goto driver_enqueue;
> +	/* No cache provided? */
> +	if (unlikely(cache == NULL)) {
> +		/* Increment stats now, adding in mempool always succeeds. */
> +		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> +		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
>   
> -	/* increment stat now, adding in mempool always success */
> -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> +		goto driver_enqueue;
> +	}
>   
> -	/* The request itself is too big for the cache */
> -	if (unlikely(n > cache->flushthresh))
> -		goto driver_enqueue_stats_incremented;
> +	/* Prepare to add the objects to the cache. */
> +	cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
>   
> -	/*
> -	 * The cache follows the following algorithm:
> -	 *   1. If the objects cannot be added to the cache without crossing
> -	 *      the flush threshold, flush the cache to the backend.
> -	 *   2. Add the objects to the cache.
> -	 */
> +	/* The request itself is too big for the cache? */
> +	if (unlikely(cache_objs == NULL)) {
> +		/* Increment stats now, adding in mempool always succeeds. */
> +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);

Shouldn't it be RTE_MEMPOOL_STAT_ADD() here?

>   
> -	if (cache->len + n <= cache->flushthresh) {
> -		cache_objs = &cache->objs[cache->len];
> -		cache->len += n;
> -	} else {
> -		cache_objs = &cache->objs[0];
> -		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
> -		cache->len = n;
> +		goto driver_enqueue;
>   	}
>   
>   	/* Add the objects to the cache. */
> @@ -1399,13 +1584,7 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
>   
>   driver_enqueue:
>   
> -	/* increment stat now, adding in mempool always success */
> -	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> -	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> -
> -driver_enqueue_stats_incremented:
> -
> -	/* push objects to the backend */
> +	/* Push the objects to the backend. */
>   	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
>   }
>   
> diff --git a/lib/mempool/rte_mempool_trace_fp.h b/lib/mempool/rte_mempool_trace_fp.h
> index ed060e887c..14666457f7 100644
> --- a/lib/mempool/rte_mempool_trace_fp.h
> +++ b/lib/mempool/rte_mempool_trace_fp.h
> @@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
>   	rte_trace_point_emit_ptr(mempool);
>   )
>   
> +RTE_TRACE_POINT_FP(
> +	rte_mempool_trace_cache_zc_put_bulk,
> +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
> +	rte_trace_point_emit_ptr(cache);
> +	rte_trace_point_emit_ptr(mempool);
> +	rte_trace_point_emit_u32(nb_objs);
> +)
> +
> +RTE_TRACE_POINT_FP(
> +	rte_mempool_trace_cache_zc_put_rewind,
> +	RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
> +	rte_trace_point_emit_ptr(cache);
> +	rte_trace_point_emit_u32(nb_objs);
> +)
> +
> +RTE_TRACE_POINT_FP(
> +	rte_mempool_trace_cache_zc_get_bulk,
> +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
> +	rte_trace_point_emit_ptr(cache);
> +	rte_trace_point_emit_ptr(mempool);
> +	rte_trace_point_emit_u32(nb_objs);
> +)
> +
>   #ifdef __cplusplus
>   }
>   #endif
> diff --git a/lib/mempool/version.map b/lib/mempool/version.map
> index b67d7aace7..1383ae6db2 100644
> --- a/lib/mempool/version.map
> +++ b/lib/mempool/version.map
> @@ -63,6 +63,11 @@ EXPERIMENTAL {
>   	__rte_mempool_trace_ops_alloc;
>   	__rte_mempool_trace_ops_free;
>   	__rte_mempool_trace_set_ops_byname;
> +
> +	# added in 23.03
> +	__rte_mempool_trace_cache_zc_put_bulk;
> +	__rte_mempool_trace_cache_zc_put_rewind;
> +	__rte_mempool_trace_cache_zc_get_bulk;
>   };
>   
>   INTERNAL {


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v5] mempool cache: add zero-copy get and put functions
  2023-01-22 20:34   ` Konstantin Ananyev
@ 2023-01-22 21:17     ` Morten Brørup
  2023-01-23 11:53       ` Konstantin Ananyev
  0 siblings, 1 reply; 36+ messages in thread
From: Morten Brørup @ 2023-01-22 21:17 UTC (permalink / raw)
  To: Konstantin Ananyev, olivier.matz, andrew.rybchenko,
	honnappa.nagarahalli, kamalakshitha.aligeri, bruce.richardson,
	konstantin.ananyev, dev
  Cc: nd

> From: Konstantin Ananyev [mailto:konstantin.v.ananyev@yandex.ru]
> Sent: Sunday, 22 January 2023 21.35
> 
> Hi Morten,
> 
> Few nits, see below.
> Also I still think we do need a test case for _zc_get_ before
> accepting it in the mainline.

Poking at my bad conscience... :-)

It's on my todo-list. Apparently not high enough. ;-)

> With that in place:
> Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> 
> > Zero-copy access to mempool caches is beneficial for PMD performance,
> and
> > must be provided by the mempool library to fix [Bug 1052] without a
> > performance regression.
> >
> > [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> >
> > Bugzilla ID: 1052
> >
> > v5:
> > * Bugfix: Compare zero-copy get request to the cache size instead of
> the
> >    flush threshold; otherwise refill could overflow the memory
> allocated
> >    for the cache. (Andrew)
> > * Split the zero-copy put function into an internal function doing
> the
> >    work, and a public function with trace.
> > * Avoid code duplication by rewriting rte_mempool_do_generic_put() to
> use
> >    the internal zero-copy put function. (Andrew)
> > * Corrected the return type of rte_mempool_cache_zc_put_bulk() from
> void *
> >    to void **; it returns a pointer to an array of objects.
> > * Fix coding style: Add missing curly brackets. (Andrew)
> > v4:
> > * Fix checkpatch warnings.
> > v3:
> > * Bugfix: Respect the cache size; compare to the flush threshold
> instead
> >    of RTE_MEMPOOL_CACHE_MAX_SIZE.
> > * Added 'rewind' function for incomplete 'put' operations.
> (Konstantin)
> > * Replace RTE_ASSERTs with runtime checks of the request size.
> >    Instead of failing, return NULL if the request is too big.
> (Konstantin)
> > * Modified comparison to prevent overflow if n is really huge and len
> is
> >    non-zero.
> > * Updated the comments in the code.
> > v2:
> > * Fix checkpatch warnings.
> > * Fix missing registration of trace points.
> > * The functions are inline, so they don't go into the map file.
> > v1 changes from the RFC:
> > * Removed run-time parameter checks. (Honnappa)
> >    This is a hot fast path function; requiring correct application
> >    behaviour, i.e. function parameters must be valid.
> > * Added RTE_ASSERT for parameters instead.
> >    Code for this is only generated if built with RTE_ENABLE_ASSERT.
> > * Removed fallback when 'cache' parameter is not set. (Honnappa)
> > * Chose the simple get function; i.e. do not move the existing
> objects in
> >    the cache to the top of the new stack, just leave them at the
> bottom.
> > * Renamed the functions. Other suggestions are welcome, of course. ;-
> )
> > * Updated the function descriptions.
> > * Added the functions to trace_fp and version.map.
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > ---
> >   lib/mempool/mempool_trace_points.c |   9 ++
> >   lib/mempool/rte_mempool.h          | 237 +++++++++++++++++++++++++-
> ---
> >   lib/mempool/rte_mempool_trace_fp.h |  23 +++
> >   lib/mempool/version.map            |   5 +
> >   4 files changed, 245 insertions(+), 29 deletions(-)
> >
> > diff --git a/lib/mempool/mempool_trace_points.c
> b/lib/mempool/mempool_trace_points.c
> > index 4ad76deb34..83d353a764 100644
> > --- a/lib/mempool/mempool_trace_points.c
> > +++ b/lib/mempool/mempool_trace_points.c
> > @@ -77,3 +77,12 @@
> RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free,
> >
> >   RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname,
> >   	lib.mempool.set.ops.byname)
> > +
> > +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk,
> > +	lib.mempool.cache.zc.put.bulk)
> > +
> > +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_rewind,
> > +	lib.mempool.cache.zc.put.rewind)
> > +
> > +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk,
> > +	lib.mempool.cache.zc.get.bulk)
> > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> > index 9f530db24b..5efd3c2b5b 100644
> > --- a/lib/mempool/rte_mempool.h
> > +++ b/lib/mempool/rte_mempool.h
> > @@ -47,6 +47,7 @@
> >   #include <rte_ring.h>
> >   #include <rte_memcpy.h>
> >   #include <rte_common.h>
> > +#include <rte_errno.h>
> >
> >   #include "rte_mempool_trace_fp.h"
> >
> > @@ -1346,6 +1347,197 @@ rte_mempool_cache_flush(struct
> rte_mempool_cache *cache,
> >   	cache->len = 0;
> >   }
> >
> > +/**
> > + * @internal used by rte_mempool_cache_zc_put_bulk() and
> rte_mempool_do_generic_put().
> > + *
> > + * Zero-copy put objects in a user-owned mempool cache backed by the
> specified mempool.
> > + *
> > + * @param cache
> > + *   A pointer to the mempool cache.
> > + * @param mp
> > + *   A pointer to the mempool.
> > + * @param n
> > + *   The number of objects to be put in the mempool cache.
> > + * @return
> > + *   The pointer to where to put the objects in the mempool cache.
> > + *   NULL if the request itself is too big for the cache, i.e.
> > + *   exceeds the cache flush threshold.
> > + */
> > +static __rte_always_inline void **
> > +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > +		struct rte_mempool *mp,
> > +		unsigned int n)
> > +{
> > +	void **cache_objs;
> > +
> > +	RTE_ASSERT(cache != NULL);
> > +	RTE_ASSERT(mp != NULL);
> > +
> > +	if (n <= cache->flushthresh - cache->len) {
> > +		/*
> > +		 * The objects can be added to the cache without crossing
> the
> > +		 * flush threshold.
> > +		 */
> > +		cache_objs = &cache->objs[cache->len];
> > +		cache->len += n;
> > +	} else if (likely(n <= cache->flushthresh)) {
> > +		/*
> > +		 * The request itself fits into the cache.
> > +		 * But first, the cache must be flushed to the backend, so
> > +		 * adding the objects does not cross the flush threshold.
> > +		 */
> > +		cache_objs = &cache->objs[0];
> > +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
> > +		cache->len = n;
> > +	} else {
> > +		/* The request itself is too big for the cache. */
> > +		return NULL;
> > +	}
> > +
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > +
> > +	return cache_objs;
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: This API may change, or be removed, without
> prior notice.
> > + *
> > + * Zero-copy put objects in a user-owned mempool cache backed by the
> specified mempool.
> > + *
> > + * @param cache
> > + *   A pointer to the mempool cache.
> > + * @param mp
> > + *   A pointer to the mempool.
> > + * @param n
> > + *   The number of objects to be put in the mempool cache.
> > + * @return
> > + *   The pointer to where to put the objects in the mempool cache.
> > + *   NULL if the request itself is too big for the cache, i.e.
> > + *   exceeds the cache flush threshold.
> > + */
> > +__rte_experimental
> > +static __rte_always_inline void **
> > +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > +		struct rte_mempool *mp,
> > +		unsigned int n)
> > +{
> > +	RTE_ASSERT(cache != NULL);
> > +	RTE_ASSERT(mp != NULL);
> > +
> > +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> > +	return __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: This API may change, or be removed, without
> prior notice.
> > + *
> > + * Zero-copy un-put objects in a user-owned mempool cache.
> > + *
> > + * @param cache
> > + *   A pointer to the mempool cache.
> > + * @param n
> > + *   The number of objects not put in the mempool cache after
> calling
> > + *   rte_mempool_cache_zc_put_bulk().
> > + */
> > +__rte_experimental
> > +static __rte_always_inline void
> > +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> > +		unsigned int n)
> > +{
> > +	RTE_ASSERT(cache != NULL);
> > +	RTE_ASSERT(n <= cache->len);
> > +
> > +	rte_mempool_trace_cache_zc_put_rewind(cache, n);
> > +
> > +	cache->len -= n;
> > +
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n);
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: This API may change, or be removed, without
> prior notice.
> > + *
> > + * Zero-copy get objects from a user-owned mempool cache backed by
> the specified mempool.
> > + *
> > + * @param cache
> > + *   A pointer to the mempool cache.
> > + * @param mp
> > + *   A pointer to the mempool.
> > + * @param n
> > + *   The number of objects to prefetch into the mempool cache.
> 
> Why not 'get' instead of 'prefetch'?

This was my thinking:

The function "prefetches" the objects into the cache. It is the application itself that "gets" the objects from the cache after having called the function.
You might also notice that the n parameter for the zc_put() function is described as "to be put" (future), not "to put" (now) in the cache.

On the other hand, I chose "Zero-copy get" for the function headline to keep it simple.

If you think "get" is a more correct description of the n parameter, I can change it.

Alternatively, I can use the same style as zc_put(), i.e. "to be gotten from the mempool cache" - but that would require input from a natively English speaking person, because Danish and English grammar is very different, and I am highly uncertain about my English grammar here! I originally considered this phrase, but concluded that the "prefetch" description was easier to understand - especially for non-native English readers.

> 
> 
> > + * @return
> > + *   The pointer to the objects in the mempool cache.
> > + *   NULL on error; i.e. the cache + the pool does not contain 'n'
> objects.
> > + *   With rte_errno set to the error code of the mempool dequeue
> function,
> > + *   or EINVAL if the request itself is too big for the cache, i.e.
> > + *   exceeds the cache flush threshold.
> > + */
> > +__rte_experimental
> > +static __rte_always_inline void *
> > +rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
> > +		struct rte_mempool *mp,
> > +		unsigned int n)
> > +{
> > +	unsigned int len, size;
> > +
> > +	RTE_ASSERT(cache != NULL);
> > +	RTE_ASSERT(mp != NULL);
> > +
> > +	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
> > +
> > +	len = cache->len;
> > +	size = cache->size;
> > +
> > +	if (n <= len) {
> > +		/* The request can be satisfied from the cache as is. */
> > +		len -= n;
> > +	} else if (likely(n <= size)) {
> > +		/*
> > +		 * The request itself can be satisfied from the cache.
> > +		 * But first, the cache must be filled from the backend;
> > +		 * fetch size + requested - len objects.
> > +		 */
> > +		int ret;
> > +
> > +		ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len],
> size + n - len);
> > +		if (unlikely(ret < 0)) {
> > +			/*
> > +			 * We are buffer constrained.
> > +			 * Do not fill the cache, just satisfy the request.
> > +			 */
> > +			ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> >objs[len], n - len);
> > +			if (unlikely(ret < 0)) {
> > +				/* Unable to satisfy the request. */
> > +
> > +				RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
> > +				RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
> > +
> > +				rte_errno = -ret;
> > +				return NULL;
> > +			}
> > +
> > +			len = 0;
> > +		} else {
> > +			len = size;
> > +		}
> > +	} else {
> > +		/* The request itself is too big for the cache. */
> > +		rte_errno = EINVAL;
> > +		return NULL;
> > +	}
> > +
> > +	cache->len = len;
> > +
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> > +
> > +	return &cache->objs[len];
> > +}
> > +
> >   /**
> >    * @internal Put several objects back in the mempool; used
> internally.
> >    * @param mp
> > @@ -1364,32 +1556,25 @@ rte_mempool_do_generic_put(struct rte_mempool
> *mp, void * const *obj_table,
> >   {
> >   	void **cache_objs;
> >
> > -	/* No cache provided */
> > -	if (unlikely(cache == NULL))
> > -		goto driver_enqueue;
> > +	/* No cache provided? */
> > +	if (unlikely(cache == NULL)) {
> > +		/* Increment stats now, adding in mempool always succeeds.
> */
> > +		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > +		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> >
> > -	/* increment stat now, adding in mempool always success */
> > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > +		goto driver_enqueue;
> > +	}
> >
> > -	/* The request itself is too big for the cache */
> > -	if (unlikely(n > cache->flushthresh))
> > -		goto driver_enqueue_stats_incremented;
> > +	/* Prepare to add the objects to the cache. */
> > +	cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> >
> > -	/*
> > -	 * The cache follows the following algorithm:
> > -	 *   1. If the objects cannot be added to the cache without
> crossing
> > -	 *      the flush threshold, flush the cache to the backend.
> > -	 *   2. Add the objects to the cache.
> > -	 */
> > +	/* The request itself is too big for the cache? */
> > +	if (unlikely(cache_objs == NULL)) {
> > +		/* Increment stats now, adding in mempool always succeeds.
> */
> > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> 
> Shouldn't it be RTE_MEMPOOL_STAT_ADD() here?

I can see why you are wondering, but the answer is no. The statistics in mempool cache are not related to the cache, they are related to the mempool; they are there to provide faster per-lcore update access [1].

[1]: https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/mempool/rte_mempool.h#L94

> 
> >
> > -	if (cache->len + n <= cache->flushthresh) {
> > -		cache_objs = &cache->objs[cache->len];
> > -		cache->len += n;
> > -	} else {
> > -		cache_objs = &cache->objs[0];
> > -		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
> > -		cache->len = n;
> > +		goto driver_enqueue;
> >   	}
> >
> >   	/* Add the objects to the cache. */
> > @@ -1399,13 +1584,7 @@ rte_mempool_do_generic_put(struct rte_mempool
> *mp, void * const *obj_table,
> >
> >   driver_enqueue:
> >
> > -	/* increment stat now, adding in mempool always success */
> > -	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > -	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > -
> > -driver_enqueue_stats_incremented:
> > -
> > -	/* push objects to the backend */
> > +	/* Push the objects to the backend. */
> >   	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
> >   }
> >
> > diff --git a/lib/mempool/rte_mempool_trace_fp.h
> b/lib/mempool/rte_mempool_trace_fp.h
> > index ed060e887c..14666457f7 100644
> > --- a/lib/mempool/rte_mempool_trace_fp.h
> > +++ b/lib/mempool/rte_mempool_trace_fp.h
> > @@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
> >   	rte_trace_point_emit_ptr(mempool);
> >   )
> >
> > +RTE_TRACE_POINT_FP(
> > +	rte_mempool_trace_cache_zc_put_bulk,
> > +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> nb_objs),
> > +	rte_trace_point_emit_ptr(cache);
> > +	rte_trace_point_emit_ptr(mempool);
> > +	rte_trace_point_emit_u32(nb_objs);
> > +)
> > +
> > +RTE_TRACE_POINT_FP(
> > +	rte_mempool_trace_cache_zc_put_rewind,
> > +	RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
> > +	rte_trace_point_emit_ptr(cache);
> > +	rte_trace_point_emit_u32(nb_objs);
> > +)
> > +
> > +RTE_TRACE_POINT_FP(
> > +	rte_mempool_trace_cache_zc_get_bulk,
> > +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> nb_objs),
> > +	rte_trace_point_emit_ptr(cache);
> > +	rte_trace_point_emit_ptr(mempool);
> > +	rte_trace_point_emit_u32(nb_objs);
> > +)
> > +
> >   #ifdef __cplusplus
> >   }
> >   #endif
> > diff --git a/lib/mempool/version.map b/lib/mempool/version.map
> > index b67d7aace7..1383ae6db2 100644
> > --- a/lib/mempool/version.map
> > +++ b/lib/mempool/version.map
> > @@ -63,6 +63,11 @@ EXPERIMENTAL {
> >   	__rte_mempool_trace_ops_alloc;
> >   	__rte_mempool_trace_ops_free;
> >   	__rte_mempool_trace_set_ops_byname;
> > +
> > +	# added in 23.03
> > +	__rte_mempool_trace_cache_zc_put_bulk;
> > +	__rte_mempool_trace_cache_zc_put_rewind;
> > +	__rte_mempool_trace_cache_zc_get_bulk;
> >   };
> >
> >   INTERNAL {
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v5] mempool cache: add zero-copy get and put functions
  2023-01-22 21:17     ` Morten Brørup
@ 2023-01-23 11:53       ` Konstantin Ananyev
  2023-01-23 12:23         ` Morten Brørup
  0 siblings, 1 reply; 36+ messages in thread
From: Konstantin Ananyev @ 2023-01-23 11:53 UTC (permalink / raw)
  To: Morten Brørup, Konstantin Ananyev, olivier.matz,
	andrew.rybchenko, honnappa.nagarahalli, kamalakshitha.aligeri,
	bruce.richardson, dev
  Cc: nd



> > Few nits, see below.
> > Also I still think we do need a test case for _zc_get_ before
> > accepting it in the mainline.
> 
> Poking at my bad conscience... :-)
> 
> It's on my todo-list. Apparently not high enough. ;-)
> 
> > With that in place:
> > Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> >
> > > Zero-copy access to mempool caches is beneficial for PMD performance,
> > and
> > > must be provided by the mempool library to fix [Bug 1052] without a
> > > performance regression.
> > >
> > > [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> > >
> > > Bugzilla ID: 1052
> > >
> > > v5:
> > > * Bugfix: Compare zero-copy get request to the cache size instead of
> > the
> > >    flush threshold; otherwise refill could overflow the memory
> > allocated
> > >    for the cache. (Andrew)
> > > * Split the zero-copy put function into an internal function doing
> > the
> > >    work, and a public function with trace.
> > > * Avoid code duplication by rewriting rte_mempool_do_generic_put() to
> > use
> > >    the internal zero-copy put function. (Andrew)
> > > * Corrected the return type of rte_mempool_cache_zc_put_bulk() from
> > void *
> > >    to void **; it returns a pointer to an array of objects.
> > > * Fix coding style: Add missing curly brackets. (Andrew)
> > > v4:
> > > * Fix checkpatch warnings.
> > > v3:
> > > * Bugfix: Respect the cache size; compare to the flush threshold
> > instead
> > >    of RTE_MEMPOOL_CACHE_MAX_SIZE.
> > > * Added 'rewind' function for incomplete 'put' operations.
> > (Konstantin)
> > > * Replace RTE_ASSERTs with runtime checks of the request size.
> > >    Instead of failing, return NULL if the request is too big.
> > (Konstantin)
> > > * Modified comparison to prevent overflow if n is really huge and len
> > is
> > >    non-zero.
> > > * Updated the comments in the code.
> > > v2:
> > > * Fix checkpatch warnings.
> > > * Fix missing registration of trace points.
> > > * The functions are inline, so they don't go into the map file.
> > > v1 changes from the RFC:
> > > * Removed run-time parameter checks. (Honnappa)
> > >    This is a hot fast path function; requiring correct application
> > >    behaviour, i.e. function parameters must be valid.
> > > * Added RTE_ASSERT for parameters instead.
> > >    Code for this is only generated if built with RTE_ENABLE_ASSERT.
> > > * Removed fallback when 'cache' parameter is not set. (Honnappa)
> > > * Chose the simple get function; i.e. do not move the existing
> > objects in
> > >    the cache to the top of the new stack, just leave them at the
> > bottom.
> > > * Renamed the functions. Other suggestions are welcome, of course. ;-
> > )
> > > * Updated the function descriptions.
> > > * Added the functions to trace_fp and version.map.
> > >
> > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > ---
> > >   lib/mempool/mempool_trace_points.c |   9 ++
> > >   lib/mempool/rte_mempool.h          | 237 +++++++++++++++++++++++++-
> > ---
> > >   lib/mempool/rte_mempool_trace_fp.h |  23 +++
> > >   lib/mempool/version.map            |   5 +
> > >   4 files changed, 245 insertions(+), 29 deletions(-)
> > >
> > > diff --git a/lib/mempool/mempool_trace_points.c
> > b/lib/mempool/mempool_trace_points.c
> > > index 4ad76deb34..83d353a764 100644
> > > --- a/lib/mempool/mempool_trace_points.c
> > > +++ b/lib/mempool/mempool_trace_points.c
> > > @@ -77,3 +77,12 @@
> > RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free,
> > >
> > >   RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname,
> > >   	lib.mempool.set.ops.byname)
> > > +
> > > +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk,
> > > +	lib.mempool.cache.zc.put.bulk)
> > > +
> > > +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_rewind,
> > > +	lib.mempool.cache.zc.put.rewind)
> > > +
> > > +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk,
> > > +	lib.mempool.cache.zc.get.bulk)
> > > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> > > index 9f530db24b..5efd3c2b5b 100644
> > > --- a/lib/mempool/rte_mempool.h
> > > +++ b/lib/mempool/rte_mempool.h
> > > @@ -47,6 +47,7 @@
> > >   #include <rte_ring.h>
> > >   #include <rte_memcpy.h>
> > >   #include <rte_common.h>
> > > +#include <rte_errno.h>
> > >
> > >   #include "rte_mempool_trace_fp.h"
> > >
> > > @@ -1346,6 +1347,197 @@ rte_mempool_cache_flush(struct
> > rte_mempool_cache *cache,
> > >   	cache->len = 0;
> > >   }
> > >
> > > +/**
> > > + * @internal used by rte_mempool_cache_zc_put_bulk() and
> > rte_mempool_do_generic_put().
> > > + *
> > > + * Zero-copy put objects in a user-owned mempool cache backed by the
> > specified mempool.
> > > + *
> > > + * @param cache
> > > + *   A pointer to the mempool cache.
> > > + * @param mp
> > > + *   A pointer to the mempool.
> > > + * @param n
> > > + *   The number of objects to be put in the mempool cache.
> > > + * @return
> > > + *   The pointer to where to put the objects in the mempool cache.
> > > + *   NULL if the request itself is too big for the cache, i.e.
> > > + *   exceeds the cache flush threshold.
> > > + */
> > > +static __rte_always_inline void **
> > > +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > +		struct rte_mempool *mp,
> > > +		unsigned int n)
> > > +{
> > > +	void **cache_objs;
> > > +
> > > +	RTE_ASSERT(cache != NULL);
> > > +	RTE_ASSERT(mp != NULL);
> > > +
> > > +	if (n <= cache->flushthresh - cache->len) {
> > > +		/*
> > > +		 * The objects can be added to the cache without crossing
> > the
> > > +		 * flush threshold.
> > > +		 */
> > > +		cache_objs = &cache->objs[cache->len];
> > > +		cache->len += n;
> > > +	} else if (likely(n <= cache->flushthresh)) {
> > > +		/*
> > > +		 * The request itself fits into the cache.
> > > +		 * But first, the cache must be flushed to the backend, so
> > > +		 * adding the objects does not cross the flush threshold.
> > > +		 */
> > > +		cache_objs = &cache->objs[0];
> > > +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
> > > +		cache->len = n;
> > > +	} else {
> > > +		/* The request itself is too big for the cache. */
> > > +		return NULL;
> > > +	}
> > > +
> > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > +
> > > +	return cache_objs;
> > > +}
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > prior notice.
> > > + *
> > > + * Zero-copy put objects in a user-owned mempool cache backed by the
> > specified mempool.
> > > + *
> > > + * @param cache
> > > + *   A pointer to the mempool cache.
> > > + * @param mp
> > > + *   A pointer to the mempool.
> > > + * @param n
> > > + *   The number of objects to be put in the mempool cache.
> > > + * @return
> > > + *   The pointer to where to put the objects in the mempool cache.
> > > + *   NULL if the request itself is too big for the cache, i.e.
> > > + *   exceeds the cache flush threshold.
> > > + */
> > > +__rte_experimental
> > > +static __rte_always_inline void **
> > > +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > +		struct rte_mempool *mp,
> > > +		unsigned int n)
> > > +{
> > > +	RTE_ASSERT(cache != NULL);
> > > +	RTE_ASSERT(mp != NULL);
> > > +
> > > +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> > > +	return __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> > > +}
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > prior notice.
> > > + *
> > > + * Zero-copy un-put objects in a user-owned mempool cache.
> > > + *
> > > + * @param cache
> > > + *   A pointer to the mempool cache.
> > > + * @param n
> > > + *   The number of objects not put in the mempool cache after
> > calling
> > > + *   rte_mempool_cache_zc_put_bulk().
> > > + */
> > > +__rte_experimental
> > > +static __rte_always_inline void
> > > +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> > > +		unsigned int n)
> > > +{
> > > +	RTE_ASSERT(cache != NULL);
> > > +	RTE_ASSERT(n <= cache->len);
> > > +
> > > +	rte_mempool_trace_cache_zc_put_rewind(cache, n);
> > > +
> > > +	cache->len -= n;
> > > +
> > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n);
> > > +}
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > prior notice.
> > > + *
> > > + * Zero-copy get objects from a user-owned mempool cache backed by
> > the specified mempool.
> > > + *
> > > + * @param cache
> > > + *   A pointer to the mempool cache.
> > > + * @param mp
> > > + *   A pointer to the mempool.
> > > + * @param n
> > > + *   The number of objects to prefetch into the mempool cache.
> >
> > Why not 'get' instead of 'prefetch'?
> 
> This was my thinking:
> 
> The function "prefetches" the objects into the cache. It is the application itself that "gets" the objects from the cache after having
> called the function.
> You might also notice that the n parameter for the zc_put() function is described as "to be put" (future), not "to put" (now) in the
> cache.
> 
> On the other hand, I chose "Zero-copy get" for the function headline to keep it simple.
> 
> If you think "get" is a more correct description of the n parameter, I can change it.
> 
> Alternatively, I can use the same style as zc_put(), i.e. "to be gotten from the mempool cache" - but that would require input from a
> natively English speaking person, because Danish and English grammar is very different, and I am highly uncertain about my English
> grammar here! I originally considered this phrase, but concluded that the "prefetch" description was easier to understand - especially
> for non-native English readers.

For me 'prefetch' seems a bit unclear in that situation...
Probably: "number of objects that user plans to extract from the cache"?
But again, I am not native English speaker too, so might be someone can suggest a better option.

> 
> >
> >
> > > + * @return
> > > + *   The pointer to the objects in the mempool cache.
> > > + *   NULL on error; i.e. the cache + the pool does not contain 'n'
> > objects.
> > > + *   With rte_errno set to the error code of the mempool dequeue
> > function,
> > > + *   or EINVAL if the request itself is too big for the cache, i.e.
> > > + *   exceeds the cache flush threshold.
> > > + */
> > > +__rte_experimental
> > > +static __rte_always_inline void *
> > > +rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
> > > +		struct rte_mempool *mp,
> > > +		unsigned int n)
> > > +{
> > > +	unsigned int len, size;
> > > +
> > > +	RTE_ASSERT(cache != NULL);
> > > +	RTE_ASSERT(mp != NULL);
> > > +
> > > +	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
> > > +
> > > +	len = cache->len;
> > > +	size = cache->size;
> > > +
> > > +	if (n <= len) {
> > > +		/* The request can be satisfied from the cache as is. */
> > > +		len -= n;
> > > +	} else if (likely(n <= size)) {
> > > +		/*
> > > +		 * The request itself can be satisfied from the cache.
> > > +		 * But first, the cache must be filled from the backend;
> > > +		 * fetch size + requested - len objects.
> > > +		 */
> > > +		int ret;
> > > +
> > > +		ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len],
> > size + n - len);
> > > +		if (unlikely(ret < 0)) {
> > > +			/*
> > > +			 * We are buffer constrained.
> > > +			 * Do not fill the cache, just satisfy the request.
> > > +			 */
> > > +			ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> > >objs[len], n - len);
> > > +			if (unlikely(ret < 0)) {
> > > +				/* Unable to satisfy the request. */
> > > +
> > > +				RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
> > > +				RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
> > > +
> > > +				rte_errno = -ret;
> > > +				return NULL;
> > > +			}
> > > +
> > > +			len = 0;
> > > +		} else {
> > > +			len = size;
> > > +		}
> > > +	} else {
> > > +		/* The request itself is too big for the cache. */
> > > +		rte_errno = EINVAL;
> > > +		return NULL;
> > > +	}
> > > +
> > > +	cache->len = len;
> > > +
> > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> > > +
> > > +	return &cache->objs[len];
> > > +}
> > > +
> > >   /**
> > >    * @internal Put several objects back in the mempool; used
> > internally.
> > >    * @param mp
> > > @@ -1364,32 +1556,25 @@ rte_mempool_do_generic_put(struct rte_mempool
> > *mp, void * const *obj_table,
> > >   {
> > >   	void **cache_objs;
> > >
> > > -	/* No cache provided */
> > > -	if (unlikely(cache == NULL))
> > > -		goto driver_enqueue;
> > > +	/* No cache provided? */
> > > +	if (unlikely(cache == NULL)) {
> > > +		/* Increment stats now, adding in mempool always succeeds.
> > */
> > > +		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > > +		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > >
> > > -	/* increment stat now, adding in mempool always success */
> > > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > +		goto driver_enqueue;
> > > +	}
> > >
> > > -	/* The request itself is too big for the cache */
> > > -	if (unlikely(n > cache->flushthresh))
> > > -		goto driver_enqueue_stats_incremented;
> > > +	/* Prepare to add the objects to the cache. */
> > > +	cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> > >
> > > -	/*
> > > -	 * The cache follows the following algorithm:
> > > -	 *   1. If the objects cannot be added to the cache without
> > crossing
> > > -	 *      the flush threshold, flush the cache to the backend.
> > > -	 *   2. Add the objects to the cache.
> > > -	 */
> > > +	/* The request itself is too big for the cache? */
> > > +	if (unlikely(cache_objs == NULL)) {
> > > +		/* Increment stats now, adding in mempool always succeeds.
> > */
> > > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> >
> > Shouldn't it be RTE_MEMPOOL_STAT_ADD() here?
> 
> I can see why you are wondering, but the answer is no. The statistics in mempool cache are not related to the cache, they are related
> to the mempool; they are there to provide faster per-lcore update access [1].
> 
> [1]: https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/mempool/rte_mempool.h#L94

But  the condition above:
if (unlikely(cache_objs == NULL))
means that me can't put these object to the cache and have to put
objects straight to the pool (skipping cache completely), right?
If so, then why to update cache stats instead of pool stats?

> >
> > >
> > > -	if (cache->len + n <= cache->flushthresh) {
> > > -		cache_objs = &cache->objs[cache->len];
> > > -		cache->len += n;
> > > -	} else {
> > > -		cache_objs = &cache->objs[0];
> > > -		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
> > > -		cache->len = n;
> > > +		goto driver_enqueue;
> > >   	}
> > >
> > >   	/* Add the objects to the cache. */
> > > @@ -1399,13 +1584,7 @@ rte_mempool_do_generic_put(struct rte_mempool
> > *mp, void * const *obj_table,
> > >
> > >   driver_enqueue:
> > >
> > > -	/* increment stat now, adding in mempool always success */
> > > -	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > > -	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > > -
> > > -driver_enqueue_stats_incremented:
> > > -
> > > -	/* push objects to the backend */
> > > +	/* Push the objects to the backend. */
> > >   	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
> > >   }
> > >
> > > diff --git a/lib/mempool/rte_mempool_trace_fp.h
> > b/lib/mempool/rte_mempool_trace_fp.h
> > > index ed060e887c..14666457f7 100644
> > > --- a/lib/mempool/rte_mempool_trace_fp.h
> > > +++ b/lib/mempool/rte_mempool_trace_fp.h
> > > @@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
> > >   	rte_trace_point_emit_ptr(mempool);
> > >   )
> > >
> > > +RTE_TRACE_POINT_FP(
> > > +	rte_mempool_trace_cache_zc_put_bulk,
> > > +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> > nb_objs),
> > > +	rte_trace_point_emit_ptr(cache);
> > > +	rte_trace_point_emit_ptr(mempool);
> > > +	rte_trace_point_emit_u32(nb_objs);
> > > +)
> > > +
> > > +RTE_TRACE_POINT_FP(
> > > +	rte_mempool_trace_cache_zc_put_rewind,
> > > +	RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
> > > +	rte_trace_point_emit_ptr(cache);
> > > +	rte_trace_point_emit_u32(nb_objs);
> > > +)
> > > +
> > > +RTE_TRACE_POINT_FP(
> > > +	rte_mempool_trace_cache_zc_get_bulk,
> > > +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> > nb_objs),
> > > +	rte_trace_point_emit_ptr(cache);
> > > +	rte_trace_point_emit_ptr(mempool);
> > > +	rte_trace_point_emit_u32(nb_objs);
> > > +)
> > > +
> > >   #ifdef __cplusplus
> > >   }
> > >   #endif
> > > diff --git a/lib/mempool/version.map b/lib/mempool/version.map
> > > index b67d7aace7..1383ae6db2 100644
> > > --- a/lib/mempool/version.map
> > > +++ b/lib/mempool/version.map
> > > @@ -63,6 +63,11 @@ EXPERIMENTAL {
> > >   	__rte_mempool_trace_ops_alloc;
> > >   	__rte_mempool_trace_ops_free;
> > >   	__rte_mempool_trace_set_ops_byname;
> > > +
> > > +	# added in 23.03
> > > +	__rte_mempool_trace_cache_zc_put_bulk;
> > > +	__rte_mempool_trace_cache_zc_put_rewind;
> > > +	__rte_mempool_trace_cache_zc_get_bulk;
> > >   };
> > >
> > >   INTERNAL {
> >


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v5] mempool cache: add zero-copy get and put functions
  2023-01-23 11:53       ` Konstantin Ananyev
@ 2023-01-23 12:23         ` Morten Brørup
  2023-01-23 12:52           ` Konstantin Ananyev
  2023-01-23 14:30           ` Bruce Richardson
  0 siblings, 2 replies; 36+ messages in thread
From: Morten Brørup @ 2023-01-23 12:23 UTC (permalink / raw)
  To: Konstantin Ananyev, Konstantin Ananyev, olivier.matz,
	andrew.rybchenko, honnappa.nagarahalli, kamalakshitha.aligeri,
	bruce.richardson, dev
  Cc: nd

> From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> Sent: Monday, 23 January 2023 12.54
> 
> > > Few nits, see below.
> > > Also I still think we do need a test case for _zc_get_ before
> > > accepting it in the mainline.
> >
> > Poking at my bad conscience... :-)
> >
> > It's on my todo-list. Apparently not high enough. ;-)
> >
> > > With that in place:
> > > Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> > >

[...]

> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > prior notice.
> > > > + *
> > > > + * Zero-copy put objects in a user-owned mempool cache backed by
> the
> > > specified mempool.
> > > > + *
> > > > + * @param cache
> > > > + *   A pointer to the mempool cache.
> > > > + * @param mp
> > > > + *   A pointer to the mempool.
> > > > + * @param n
> > > > + *   The number of objects to be put in the mempool cache.
> > > > + * @return
> > > > + *   The pointer to where to put the objects in the mempool
> cache.
> > > > + *   NULL if the request itself is too big for the cache, i.e.
> > > > + *   exceeds the cache flush threshold.
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline void **
> > > > +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > > +		struct rte_mempool *mp,
> > > > +		unsigned int n)
> > > > +{
> > > > +	RTE_ASSERT(cache != NULL);
> > > > +	RTE_ASSERT(mp != NULL);
> > > > +
> > > > +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> > > > +	return __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> > > > +}
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > prior notice.
> > > > + *
> > > > + * Zero-copy un-put objects in a user-owned mempool cache.
> > > > + *
> > > > + * @param cache
> > > > + *   A pointer to the mempool cache.
> > > > + * @param n
> > > > + *   The number of objects not put in the mempool cache after
> > > calling
> > > > + *   rte_mempool_cache_zc_put_bulk().
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline void
> > > > +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> > > > +		unsigned int n)
> > > > +{
> > > > +	RTE_ASSERT(cache != NULL);
> > > > +	RTE_ASSERT(n <= cache->len);
> > > > +
> > > > +	rte_mempool_trace_cache_zc_put_rewind(cache, n);
> > > > +
> > > > +	cache->len -= n;
> > > > +
> > > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n);
> > > > +}
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > prior notice.
> > > > + *
> > > > + * Zero-copy get objects from a user-owned mempool cache backed
> by
> > > the specified mempool.
> > > > + *
> > > > + * @param cache
> > > > + *   A pointer to the mempool cache.
> > > > + * @param mp
> > > > + *   A pointer to the mempool.
> > > > + * @param n
> > > > + *   The number of objects to prefetch into the mempool cache.
> > >
> > > Why not 'get' instead of 'prefetch'?
> >
> > This was my thinking:
> >
> > The function "prefetches" the objects into the cache. It is the
> application itself that "gets" the objects from the cache after having
> > called the function.
> > You might also notice that the n parameter for the zc_put() function
> is described as "to be put" (future), not "to put" (now) in the
> > cache.
> >
> > On the other hand, I chose "Zero-copy get" for the function headline
> to keep it simple.
> >
> > If you think "get" is a more correct description of the n parameter,
> I can change it.
> >
> > Alternatively, I can use the same style as zc_put(), i.e. "to be
> gotten from the mempool cache" - but that would require input from a
> > natively English speaking person, because Danish and English grammar
> is very different, and I am highly uncertain about my English
> > grammar here! I originally considered this phrase, but concluded that
> the "prefetch" description was easier to understand - especially
> > for non-native English readers.
> 
> For me 'prefetch' seems a bit unclear in that situation...
> Probably: "number of objects that user plans to extract from the
> cache"?
> But again, I am not native English speaker too, so might be someone can
> suggest a better option.
> 

@Bruce (or any other native English speaking person), your input would be appreciated here!

> > > > + * @return
> > > > + *   The pointer to the objects in the mempool cache.
> > > > + *   NULL on error; i.e. the cache + the pool does not contain
> 'n'
> > > objects.
> > > > + *   With rte_errno set to the error code of the mempool dequeue
> > > function,
> > > > + *   or EINVAL if the request itself is too big for the cache,
> i.e.
> > > > + *   exceeds the cache flush threshold.
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline void *
> > > > +rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
> > > > +		struct rte_mempool *mp,
> > > > +		unsigned int n)

[...]

> > > > @@ -1364,32 +1556,25 @@ rte_mempool_do_generic_put(struct
> rte_mempool
> > > *mp, void * const *obj_table,
> > > >   {
> > > >   	void **cache_objs;
> > > >
> > > > -	/* No cache provided */
> > > > -	if (unlikely(cache == NULL))
> > > > -		goto driver_enqueue;
> > > > +	/* No cache provided? */
> > > > +	if (unlikely(cache == NULL)) {
> > > > +		/* Increment stats now, adding in mempool always
> succeeds.
> > > */
> > > > +		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > > > +		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > > >
> > > > -	/* increment stat now, adding in mempool always success */
> > > > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > > +		goto driver_enqueue;
> > > > +	}
> > > >
> > > > -	/* The request itself is too big for the cache */
> > > > -	if (unlikely(n > cache->flushthresh))
> > > > -		goto driver_enqueue_stats_incremented;
> > > > +	/* Prepare to add the objects to the cache. */
> > > > +	cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> > > >
> > > > -	/*
> > > > -	 * The cache follows the following algorithm:
> > > > -	 *   1. If the objects cannot be added to the cache without
> > > crossing
> > > > -	 *      the flush threshold, flush the cache to the
> backend.
> > > > -	 *   2. Add the objects to the cache.
> > > > -	 */
> > > > +	/* The request itself is too big for the cache? */
> > > > +	if (unlikely(cache_objs == NULL)) {
> > > > +		/* Increment stats now, adding in mempool always
> succeeds.
> > > */
> > > > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > >
> > > Shouldn't it be RTE_MEMPOOL_STAT_ADD() here?
> >
> > I can see why you are wondering, but the answer is no. The statistics
> in mempool cache are not related to the cache, they are related
> > to the mempool; they are there to provide faster per-lcore update
> access [1].
> >
> > [1]:
> https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/mempool/rte_mempool	
> .h#L94
> 
> But  the condition above:
> if (unlikely(cache_objs == NULL))
> means that me can't put these object to the cache and have to put
> objects straight to the pool (skipping cache completely), right?

Correct.

> If so, then why to update cache stats instead of pool stats?

Because updating the stats in the cache structure is faster than updating the stats in the pool structure. Refer to the two macros: RTE_MEMPOOL_STAT_ADD() [2] is effectively five lines of code, but RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) [3] is a one-liner: ((cache)->stats.name += (n)).

[2]: https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/mempool/rte_mempool.h#L325
[3]: https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/mempool/rte_mempool.h#L348

And to reiterate that this is the correct behavior here, I will rephrase my previous response: The stats kept in the cache are part of the pool stats, they are not stats for the cache itself.

> > > >
> > > > -	if (cache->len + n <= cache->flushthresh) {
> > > > -		cache_objs = &cache->objs[cache->len];
> > > > -		cache->len += n;
> > > > -	} else {
> > > > -		cache_objs = &cache->objs[0];
> > > > -		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> >len);
> > > > -		cache->len = n;
> > > > +		goto driver_enqueue;
> > > >   	}
> > > >
> > > >   	/* Add the objects to the cache. */


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v5] mempool cache: add zero-copy get and put functions
  2023-01-23 12:23         ` Morten Brørup
@ 2023-01-23 12:52           ` Konstantin Ananyev
  2023-01-23 14:30           ` Bruce Richardson
  1 sibling, 0 replies; 36+ messages in thread
From: Konstantin Ananyev @ 2023-01-23 12:52 UTC (permalink / raw)
  To: Morten Brørup, Konstantin Ananyev, olivier.matz,
	andrew.rybchenko, honnappa.nagarahalli, kamalakshitha.aligeri,
	bruce.richardson, dev
  Cc: nd



> > > > > @@ -1364,32 +1556,25 @@ rte_mempool_do_generic_put(struct
> > rte_mempool
> > > > *mp, void * const *obj_table,
> > > > >   {
> > > > >   	void **cache_objs;
> > > > >
> > > > > -	/* No cache provided */
> > > > > -	if (unlikely(cache == NULL))
> > > > > -		goto driver_enqueue;
> > > > > +	/* No cache provided? */
> > > > > +	if (unlikely(cache == NULL)) {
> > > > > +		/* Increment stats now, adding in mempool always
> > succeeds.
> > > > */
> > > > > +		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > > > > +		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > > > >
> > > > > -	/* increment stat now, adding in mempool always success */
> > > > > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > > > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > > > +		goto driver_enqueue;
> > > > > +	}
> > > > >
> > > > > -	/* The request itself is too big for the cache */
> > > > > -	if (unlikely(n > cache->flushthresh))
> > > > > -		goto driver_enqueue_stats_incremented;
> > > > > +	/* Prepare to add the objects to the cache. */
> > > > > +	cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> > > > >
> > > > > -	/*
> > > > > -	 * The cache follows the following algorithm:
> > > > > -	 *   1. If the objects cannot be added to the cache without
> > > > crossing
> > > > > -	 *      the flush threshold, flush the cache to the
> > backend.
> > > > > -	 *   2. Add the objects to the cache.
> > > > > -	 */
> > > > > +	/* The request itself is too big for the cache? */
> > > > > +	if (unlikely(cache_objs == NULL)) {
> > > > > +		/* Increment stats now, adding in mempool always
> > succeeds.
> > > > */
> > > > > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > > > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > >
> > > > Shouldn't it be RTE_MEMPOOL_STAT_ADD() here?
> > >
> > > I can see why you are wondering, but the answer is no. The statistics
> > in mempool cache are not related to the cache, they are related
> > > to the mempool; they are there to provide faster per-lcore update
> > access [1].
> > >
> > > [1]:
> > https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/mempool/rte_mempool
> > .h#L94
> >
> > But  the condition above:
> > if (unlikely(cache_objs == NULL))
> > means that me can't put these object to the cache and have to put
> > objects straight to the pool (skipping cache completely), right?
> 
> Correct.
> 
> > If so, then why to update cache stats instead of pool stats?
> 
> Because updating the stats in the cache structure is faster than updating the stats in the pool structure. Refer to the two macros:
> RTE_MEMPOOL_STAT_ADD() [2] is effectively five lines of code, but RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) [3] is a one-
> liner: ((cache)->stats.name += (n)).
> 
> [2]: https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/mempool/rte_mempool.h#L325
> [3]: https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/mempool/rte_mempool.h#L348
> 
> And to reiterate that this is the correct behavior here, I will rephrase my previous response: The stats kept in the cache are part of the
> pool stats, they are not stats for the cache itself.

Ah ok, that's  the same as current behavior.
It is still looks a bit strange to me that we incrementing cache (not pool) stats here.
But that's another story, so no extra comments from me for that case.

> 
> > > > >
> > > > > -	if (cache->len + n <= cache->flushthresh) {
> > > > > -		cache_objs = &cache->objs[cache->len];
> > > > > -		cache->len += n;
> > > > > -	} else {
> > > > > -		cache_objs = &cache->objs[0];
> > > > > -		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > >len);
> > > > > -		cache->len = n;
> > > > > +		goto driver_enqueue;
> > > > >   	}
> > > > >
> > > > >   	/* Add the objects to the cache. */


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v5] mempool cache: add zero-copy get and put functions
  2023-01-23 12:23         ` Morten Brørup
  2023-01-23 12:52           ` Konstantin Ananyev
@ 2023-01-23 14:30           ` Bruce Richardson
  2023-01-24  1:53             ` Kamalakshitha Aligeri
  1 sibling, 1 reply; 36+ messages in thread
From: Bruce Richardson @ 2023-01-23 14:30 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Konstantin Ananyev, Konstantin Ananyev, olivier.matz,
	andrew.rybchenko, honnappa.nagarahalli, kamalakshitha.aligeri,
	dev, nd

On Mon, Jan 23, 2023 at 01:23:50PM +0100, Morten Brørup wrote:
> > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> > Sent: Monday, 23 January 2023 12.54
> > 
> > > > Few nits, see below.
> > > > Also I still think we do need a test case for _zc_get_ before
> > > > accepting it in the mainline.
> > >
> > > Poking at my bad conscience... :-)
> > >
> > > It's on my todo-list. Apparently not high enough. ;-)
> > >
> > > > With that in place:
> > > > Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> > > >
> 
> [...]
> 
> > > > > +/**
> > > > > + * @warning
> > > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > > prior notice.
> > > > > + *
> > > > > + * Zero-copy put objects in a user-owned mempool cache backed by
> > the
> > > > specified mempool.
> > > > > + *
> > > > > + * @param cache
> > > > > + *   A pointer to the mempool cache.
> > > > > + * @param mp
> > > > > + *   A pointer to the mempool.
> > > > > + * @param n
> > > > > + *   The number of objects to be put in the mempool cache.
> > > > > + * @return
> > > > > + *   The pointer to where to put the objects in the mempool
> > cache.
> > > > > + *   NULL if the request itself is too big for the cache, i.e.
> > > > > + *   exceeds the cache flush threshold.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +static __rte_always_inline void **
> > > > > +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > > > +		struct rte_mempool *mp,
> > > > > +		unsigned int n)
> > > > > +{
> > > > > +	RTE_ASSERT(cache != NULL);
> > > > > +	RTE_ASSERT(mp != NULL);
> > > > > +
> > > > > +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> > > > > +	return __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> > > > > +}
> > > > > +
> > > > > +/**
> > > > > + * @warning
> > > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > > prior notice.
> > > > > + *
> > > > > + * Zero-copy un-put objects in a user-owned mempool cache.
> > > > > + *
> > > > > + * @param cache
> > > > > + *   A pointer to the mempool cache.
> > > > > + * @param n
> > > > > + *   The number of objects not put in the mempool cache after
> > > > calling
> > > > > + *   rte_mempool_cache_zc_put_bulk().
> > > > > + */
> > > > > +__rte_experimental
> > > > > +static __rte_always_inline void
> > > > > +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> > > > > +		unsigned int n)
> > > > > +{
> > > > > +	RTE_ASSERT(cache != NULL);
> > > > > +	RTE_ASSERT(n <= cache->len);
> > > > > +
> > > > > +	rte_mempool_trace_cache_zc_put_rewind(cache, n);
> > > > > +
> > > > > +	cache->len -= n;
> > > > > +
> > > > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n);
> > > > > +}
> > > > > +
> > > > > +/**
> > > > > + * @warning
> > > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > > prior notice.
> > > > > + *
> > > > > + * Zero-copy get objects from a user-owned mempool cache backed
> > by
> > > > the specified mempool.
> > > > > + *
> > > > > + * @param cache
> > > > > + *   A pointer to the mempool cache.
> > > > > + * @param mp
> > > > > + *   A pointer to the mempool.
> > > > > + * @param n
> > > > > + *   The number of objects to prefetch into the mempool cache.
> > > >
> > > > Why not 'get' instead of 'prefetch'?
> > >
> > > This was my thinking:
> > >
> > > The function "prefetches" the objects into the cache. It is the
> > application itself that "gets" the objects from the cache after having
> > > called the function.
> > > You might also notice that the n parameter for the zc_put() function
> > is described as "to be put" (future), not "to put" (now) in the
> > > cache.
> > >
> > > On the other hand, I chose "Zero-copy get" for the function headline
> > to keep it simple.
> > >
> > > If you think "get" is a more correct description of the n parameter,
> > I can change it.
> > >
> > > Alternatively, I can use the same style as zc_put(), i.e. "to be
> > gotten from the mempool cache" - but that would require input from a
> > > natively English speaking person, because Danish and English grammar
> > is very different, and I am highly uncertain about my English
> > > grammar here! I originally considered this phrase, but concluded that
> > the "prefetch" description was easier to understand - especially
> > > for non-native English readers.
> > 
> > For me 'prefetch' seems a bit unclear in that situation...
> > Probably: "number of objects that user plans to extract from the
> > cache"?
> > But again, I am not native English speaker too, so might be someone can
> > suggest a better option.
> > 
> 
> @Bruce (or any other native English speaking person), your input would be appreciated here!
> 
I was happily ignoring this thread until you went and dragged me in with a
hard question. :-)

I think the longer explanation the clearer it is likely to be. How about
"number of objects to be made available for extraction from the cache"? I
don't like the reference to "the user" in the longer suggestion above, but
otherwise consider it clearer that talking of prefetching or "getting".

My 2c.

/Bruce

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v5] mempool cache: add zero-copy get and put functions
  2023-01-23 14:30           ` Bruce Richardson
@ 2023-01-24  1:53             ` Kamalakshitha Aligeri
  0 siblings, 0 replies; 36+ messages in thread
From: Kamalakshitha Aligeri @ 2023-01-24  1:53 UTC (permalink / raw)
  To: Bruce Richardson, Morten Brørup
  Cc: Konstantin Ananyev, Konstantin Ananyev, olivier.matz,
	andrew.rybchenko, Honnappa Nagarahalli, dev, nd, nd



-----Original Message-----
From: Bruce Richardson <bruce.richardson@intel.com> 
Sent: Monday, January 23, 2023 6:31 AM
To: Morten Brørup <mb@smartsharesystems.com>
Cc: Konstantin Ananyev <konstantin.ananyev@huawei.com>; Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>; olivier.matz@6wind.com; andrew.rybchenko@oktetlabs.ru; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Kamalakshitha Aligeri <Kamalakshitha.Aligeri@arm.com>; dev@dpdk.org; nd <nd@arm.com>
Subject: Re: [PATCH v5] mempool cache: add zero-copy get and put functions

On Mon, Jan 23, 2023 at 01:23:50PM +0100, Morten Brï¿½rup wrote:
> > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> > Sent: Monday, 23 January 2023 12.54
> > 
> > > > Few nits, see below.
> > > > Also I still think we do need a test case for _zc_get_ before 
> > > > accepting it in the mainline.
> > >
I am working on the test cases. Will submit it soon

> > > Poking at my bad conscience... :-)
> > >
> > > It's on my todo-list. Apparently not high enough. ;-)
> > >
> > > > With that in place:
> > > > Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> > > >
> 
> [...]
> 
> > > > > +/**
> > > > > + * @warning
> > > > > + * @b EXPERIMENTAL: This API may change, or be removed, 
> > > > > +without
> > > > prior notice.
> > > > > + *
> > > > > + * Zero-copy put objects in a user-owned mempool cache backed 
> > > > > + by
> > the
> > > > specified mempool.
> > > > > + *
> > > > > + * @param cache
> > > > > + *   A pointer to the mempool cache.
> > > > > + * @param mp
> > > > > + *   A pointer to the mempool.
> > > > > + * @param n
> > > > > + *   The number of objects to be put in the mempool cache.
> > > > > + * @return
> > > > > + *   The pointer to where to put the objects in the mempool
> > cache.
> > > > > + *   NULL if the request itself is too big for the cache, i.e.
> > > > > + *   exceeds the cache flush threshold.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +static __rte_always_inline void ** 
> > > > > +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > > > +		struct rte_mempool *mp,
> > > > > +		unsigned int n)
> > > > > +{
> > > > > +	RTE_ASSERT(cache != NULL);
> > > > > +	RTE_ASSERT(mp != NULL);
> > > > > +
> > > > > +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> > > > > +	return __rte_mempool_cache_zc_put_bulk(cache, mp, n); }
> > > > > +
> > > > > +/**
> > > > > + * @warning
> > > > > + * @b EXPERIMENTAL: This API may change, or be removed, 
> > > > > +without
> > > > prior notice.
> > > > > + *
> > > > > + * Zero-copy un-put objects in a user-owned mempool cache.
Why is it written as user-owned mempool cache. API expects a pointer to mempool cache right, whether it is default or user-owned does not make any difference
Please correct me if I am wrong
> > > > > + *
> > > > > + * @param cache
> > > > > + *   A pointer to the mempool cache.
> > > > > + * @param n
> > > > > + *   The number of objects not put in the mempool cache after
> > > > calling
> > > > > + *   rte_mempool_cache_zc_put_bulk().
> > > > > + */
> > > > > +__rte_experimental
> > > > > +static __rte_always_inline void 
> > > > > +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> > > > > +		unsigned int n)
> > > > > +{
> > > > > +	RTE_ASSERT(cache != NULL);
> > > > > +	RTE_ASSERT(n <= cache->len);
> > > > > +
> > > > > +	rte_mempool_trace_cache_zc_put_rewind(cache, n);
> > > > > +
> > > > > +	cache->len -= n;
> > > > > +
> > > > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n); }
> > > > > +
> > > > > +/**
> > > > > + * @warning
> > > > > + * @b EXPERIMENTAL: This API may change, or be removed, 
> > > > > +without
> > > > prior notice.
> > > > > + *
> > > > > + * Zero-copy get objects from a user-owned mempool cache 
> > > > > + backed
> > by
> > > > the specified mempool.
> > > > > + *
> > > > > + * @param cache
> > > > > + *   A pointer to the mempool cache.
> > > > > + * @param mp
> > > > > + *   A pointer to the mempool.
> > > > > + * @param n
> > > > > + *   The number of objects to prefetch into the mempool cache.
> > > >
> > > > Why not 'get' instead of 'prefetch'?
> > >
> > > This was my thinking:
> > >
> > > The function "prefetches" the objects into the cache. It is the
> > application itself that "gets" the objects from the cache after 
> > having
> > > called the function.
> > > You might also notice that the n parameter for the zc_put() 
> > > function
> > is described as "to be put" (future), not "to put" (now) in the
> > > cache.
> > >
> > > On the other hand, I chose "Zero-copy get" for the function 
> > > headline
> > to keep it simple.
> > >
> > > If you think "get" is a more correct description of the n 
> > > parameter,
> > I can change it.
> > >
> > > Alternatively, I can use the same style as zc_put(), i.e. "to be
> > gotten from the mempool cache" - but that would require input from a
> > > natively English speaking person, because Danish and English 
> > > grammar
> > is very different, and I am highly uncertain about my English
> > > grammar here! I originally considered this phrase, but concluded 
> > > that
> > the "prefetch" description was easier to understand - especially
> > > for non-native English readers.
> > 
> > For me 'prefetch' seems a bit unclear in that situation...
> > Probably: "number of objects that user plans to extract from the 
> > cache"?
> > But again, I am not native English speaker too, so might be someone 
> > can suggest a better option.
> > 
> 
> @Bruce (or any other native English speaking person), your input would be appreciated here!
> 
I was happily ignoring this thread until you went and dragged me in with a hard question. :-)

I think the longer explanation the clearer it is likely to be. How about "number of objects to be made available for extraction from the cache"? I don't like the reference to "the user" in the longer suggestion above, but otherwise consider it clearer that talking of prefetching or "getting".

My 2c.

/Bruce

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6] mempool cache: add zero-copy get and put functions
  2022-11-05 13:19 [RFC]: mempool: zero-copy cache get bulk Morten Brørup
                   ` (5 preceding siblings ...)
  2022-12-27 15:17 ` [PATCH v5] " Morten Brørup
@ 2023-02-09 14:39 ` Morten Brørup
  2023-02-09 14:52 ` [PATCH v7] " Morten Brørup
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 36+ messages in thread
From: Morten Brørup @ 2023-02-09 14:39 UTC (permalink / raw)
  To: olivier.matz, andrew.rybchenko, honnappa.nagarahalli,
	kamalakshitha.aligeri, bruce.richardson, konstantin.ananyev, dev
  Cc: nd, david.marchand, Morten Brørup

Zero-copy access to mempool caches is beneficial for PMD performance, and
must be provided by the mempool library to fix [Bug 1052] without a
performance regression.

[Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052

Bugzilla ID: 1052

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>

v6:
* Improve description of the 'n' parameter to the zero-copy get function.
  (Konstantin, Bruce)
* The caches used for zero-copy may not be user-owned, so remove this word
  from the function descriptions. (Kamalakshitha)
v5:
* Bugfix: Compare zero-copy get request to the cache size instead of the
  flush threshold; otherwise refill could overflow the memory allocated
  for the cache. (Andrew)
* Split the zero-copy put function into an internal function doing the
  work, and a public function with trace.
* Avoid code duplication by rewriting rte_mempool_do_generic_put() to use
  the internal zero-copy put function. (Andrew)
* Corrected the return type of rte_mempool_cache_zc_put_bulk() from void *
  to void **; it returns a pointer to an array of objects.
* Fix coding style: Add missing curly brackets. (Andrew)
v4:
* Fix checkpatch warnings.
v3:
* Bugfix: Respect the cache size; compare to the flush threshold instead
  of RTE_MEMPOOL_CACHE_MAX_SIZE.
* Added 'rewind' function for incomplete 'put' operations. (Konstantin)
* Replace RTE_ASSERTs with runtime checks of the request size.
  Instead of failing, return NULL if the request is too big. (Konstantin)
* Modified comparison to prevent overflow if n is really huge and len is
  non-zero.
* Updated the comments in the code.
v2:
* Fix checkpatch warnings.
* Fix missing registration of trace points.
* The functions are inline, so they don't go into the map file.
v1 changes from the RFC:
* Removed run-time parameter checks. (Honnappa)
  This is a hot fast path function; requiring correct application
  behaviour, i.e. function parameters must be valid.
* Added RTE_ASSERT for parameters instead.
  Code for this is only generated if built with RTE_ENABLE_ASSERT.
* Removed fallback when 'cache' parameter is not set. (Honnappa)
* Chose the simple get function; i.e. do not move the existing objects in
  the cache to the top of the new stack, just leave them at the bottom.
* Renamed the functions. Other suggestions are welcome, of course. ;-)
* Updated the function descriptions.
* Added the functions to trace_fp and version.map.
---
 lib/mempool/mempool_trace_points.c |   9 ++
 lib/mempool/rte_mempool.h          | 238 +++++++++++++++++++++++++----
 lib/mempool/rte_mempool_trace_fp.h |  23 +++
 lib/mempool/version.map            |   5 +
 4 files changed, 246 insertions(+), 29 deletions(-)

diff --git a/lib/mempool/mempool_trace_points.c b/lib/mempool/mempool_trace_points.c
index 4ad76deb34..83d353a764 100644
--- a/lib/mempool/mempool_trace_points.c
+++ b/lib/mempool/mempool_trace_points.c
@@ -77,3 +77,12 @@ RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free,
 
 RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname,
 	lib.mempool.set.ops.byname)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk,
+	lib.mempool.cache.zc.put.bulk)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_rewind,
+	lib.mempool.cache.zc.put.rewind)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk,
+	lib.mempool.cache.zc.get.bulk)
diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 9f530db24b..b619174c2e 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -1346,6 +1346,199 @@ rte_mempool_cache_flush(struct rte_mempool_cache *cache,
 	cache->len = 0;
 }
 
+
+/**
+ * @internal used by rte_mempool_cache_zc_put_bulk() and rte_mempool_do_generic_put().
+ *
+ * Zero-copy put objects in a mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to be put in the mempool cache.
+ * @return
+ *   The pointer to where to put the objects in the mempool cache.
+ *   NULL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+static __rte_always_inline void **
+__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	void **cache_objs;
+
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	if (n <= cache->flushthresh - cache->len) {
+		/*
+		 * The objects can be added to the cache without crossing the
+		 * flush threshold.
+		 */
+		cache_objs = &cache->objs[cache->len];
+		cache->len += n;
+	} else if (likely(n <= cache->flushthresh)) {
+		/*
+		 * The request itself fits into the cache.
+		 * But first, the cache must be flushed to the backend, so
+		 * adding the objects does not cross the flush threshold.
+		 */
+		cache_objs = &cache->objs[0];
+		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
+		cache->len = n;
+	} else {
+		/* The request itself is too big for the cache. */
+		return NULL;
+	}
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
+
+	return cache_objs;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy put objects in a mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to be put in the mempool cache.
+ * @return
+ *   The pointer to where to put the objects in the mempool cache.
+ *   NULL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+__rte_experimental
+static __rte_always_inline void **
+rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
+	return __rte_mempool_cache_zc_put_bulk(cache, mp, n);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy un-put objects in a mempool cache.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param n
+ *   The number of objects not put in the mempool cache after calling
+ *   rte_mempool_cache_zc_put_bulk().
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
+		unsigned int n)
+{
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(n <= cache->len);
+
+	rte_mempool_trace_cache_zc_put_rewind(cache, n);
+
+	cache->len -= n;
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy get objects from a mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to to be made available for extraction from the
+ *   mempool cache.
+ * @return
+ *   The pointer to the objects in the mempool cache.
+ *   NULL on error; i.e. the cache + the pool does not contain 'n' objects.
+ *   With rte_errno set to the error code of the mempool dequeue function,
+ *   or EINVAL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+__rte_experimental
+static __rte_always_inline void *
+rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	unsigned int len, size;
+
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
+
+	len = cache->len;
+	size = cache->size;
+
+	if (n <= len) {
+		/* The request can be satisfied from the cache as is. */
+		len -= n;
+	} else if (likely(n <= size)) {
+		/*
+		 * The request itself can be satisfied from the cache.
+		 * But first, the cache must be filled from the backend;
+		 * fetch size + requested - len objects.
+		 */
+		int ret;
+
+		ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], size + n - len);
+		if (unlikely(ret < 0)) {
+			/*
+			 * We are buffer constrained.
+			 * Do not fill the cache, just satisfy the request.
+			 */
+			ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], n - len);
+			if (unlikely(ret < 0)) {
+				/* Unable to satisfy the request. */
+
+				RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
+				RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
+
+				rte_errno = -ret;
+				return NULL;
+			}
+
+			len = 0;
+		} else {
+			len = size;
+		}
+	} else {
+		/* The request itself is too big for the cache. */
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	cache->len = len;
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
+	return &cache->objs[len];
+}
+
 /**
  * @internal Put several objects back in the mempool; used internally.
  * @param mp
@@ -1364,32 +1557,25 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 {
 	void **cache_objs;
 
-	/* No cache provided */
-	if (unlikely(cache == NULL))
-		goto driver_enqueue;
+	/* No cache provided? */
+	if (unlikely(cache == NULL)) {
+		/* Increment stats now, adding in mempool always succeeds. */
+		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
+		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
 
-	/* increment stat now, adding in mempool always success */
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
+		goto driver_enqueue;
+	}
 
-	/* The request itself is too big for the cache */
-	if (unlikely(n > cache->flushthresh))
-		goto driver_enqueue_stats_incremented;
+	/* Prepare to add the objects to the cache. */
+	cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
 
-	/*
-	 * The cache follows the following algorithm:
-	 *   1. If the objects cannot be added to the cache without crossing
-	 *      the flush threshold, flush the cache to the backend.
-	 *   2. Add the objects to the cache.
-	 */
+	/* The request itself is too big for the cache? */
+	if (unlikely(cache_objs == NULL)) {
+		/* Increment stats now, adding in mempool always succeeds. */
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
 
-	if (cache->len + n <= cache->flushthresh) {
-		cache_objs = &cache->objs[cache->len];
-		cache->len += n;
-	} else {
-		cache_objs = &cache->objs[0];
-		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
-		cache->len = n;
+		goto driver_enqueue;
 	}
 
 	/* Add the objects to the cache. */
@@ -1399,13 +1585,7 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 
 driver_enqueue:
 
-	/* increment stat now, adding in mempool always success */
-	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
-	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
-
-driver_enqueue_stats_incremented:
-
-	/* push objects to the backend */
+	/* Push the objects to the backend. */
 	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
 }
 
diff --git a/lib/mempool/rte_mempool_trace_fp.h b/lib/mempool/rte_mempool_trace_fp.h
index ed060e887c..14666457f7 100644
--- a/lib/mempool/rte_mempool_trace_fp.h
+++ b/lib/mempool/rte_mempool_trace_fp.h
@@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
 	rte_trace_point_emit_ptr(mempool);
 )
 
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_put_bulk,
+	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_ptr(mempool);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_put_rewind,
+	RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_get_bulk,
+	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_ptr(mempool);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/mempool/version.map b/lib/mempool/version.map
index b67d7aace7..1383ae6db2 100644
--- a/lib/mempool/version.map
+++ b/lib/mempool/version.map
@@ -63,6 +63,11 @@ EXPERIMENTAL {
 	__rte_mempool_trace_ops_alloc;
 	__rte_mempool_trace_ops_free;
 	__rte_mempool_trace_set_ops_byname;
+
+	# added in 23.03
+	__rte_mempool_trace_cache_zc_put_bulk;
+	__rte_mempool_trace_cache_zc_put_rewind;
+	__rte_mempool_trace_cache_zc_get_bulk;
 };
 
 INTERNAL {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v7] mempool cache: add zero-copy get and put functions
  2022-11-05 13:19 [RFC]: mempool: zero-copy cache get bulk Morten Brørup
                   ` (6 preceding siblings ...)
  2023-02-09 14:39 ` [PATCH v6] " Morten Brørup
@ 2023-02-09 14:52 ` Morten Brørup
  2023-02-09 14:58 ` [PATCH v8] " Morten Brørup
  2023-02-13 12:24 ` [PATCH v9] " Morten Brørup
  9 siblings, 0 replies; 36+ messages in thread
From: Morten Brørup @ 2023-02-09 14:52 UTC (permalink / raw)
  To: olivier.matz, andrew.rybchenko, honnappa.nagarahalli,
	kamalakshitha.aligeri, bruce.richardson, konstantin.ananyev, dev
  Cc: nd, david.marchand, Morten Brørup

Zero-copy access to mempool caches is beneficial for PMD performance, and
must be provided by the mempool library to fix [Bug 1052] without a
performance regression.

[Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052

Bugzilla ID: 1052

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>

v7:
* Fix typo in function description. (checkpatch)
* Zero-copy functions may set rte_errno; include rte_errno header file.
  (ci/loongarch-compilation)
v6:
* Improve description of the 'n' parameter to the zero-copy get function.
  (Konstantin, Bruce)
* The caches used for zero-copy may not be user-owned, so remove this word
  from the function descriptions. (Kamalakshitha)
v5:
* Bugfix: Compare zero-copy get request to the cache size instead of the
  flush threshold; otherwise refill could overflow the memory allocated
  for the cache. (Andrew)
* Split the zero-copy put function into an internal function doing the
  work, and a public function with trace.
* Avoid code duplication by rewriting rte_mempool_do_generic_put() to use
  the internal zero-copy put function. (Andrew)
* Corrected the return type of rte_mempool_cache_zc_put_bulk() from void *
  to void **; it returns a pointer to an array of objects.
* Fix coding style: Add missing curly brackets. (Andrew)
v4:
* Fix checkpatch warnings.
v3:
* Bugfix: Respect the cache size; compare to the flush threshold instead
  of RTE_MEMPOOL_CACHE_MAX_SIZE.
* Added 'rewind' function for incomplete 'put' operations. (Konstantin)
* Replace RTE_ASSERTs with runtime checks of the request size.
  Instead of failing, return NULL if the request is too big. (Konstantin)
* Modified comparison to prevent overflow if n is really huge and len is
  non-zero.
* Updated the comments in the code.
v2:
* Fix checkpatch warnings.
* Fix missing registration of trace points.
* The functions are inline, so they don't go into the map file.
v1 changes from the RFC:
* Removed run-time parameter checks. (Honnappa)
  This is a hot fast path function; requiring correct application
  behaviour, i.e. function parameters must be valid.
* Added RTE_ASSERT for parameters instead.
  Code for this is only generated if built with RTE_ENABLE_ASSERT.
* Removed fallback when 'cache' parameter is not set. (Honnappa)
* Chose the simple get function; i.e. do not move the existing objects in
  the cache to the top of the new stack, just leave them at the bottom.
* Renamed the functions. Other suggestions are welcome, of course. ;-)
* Updated the function descriptions.
* Added the functions to trace_fp and version.map.
---
 lib/mempool/mempool_trace_points.c |   9 ++
 lib/mempool/rte_mempool.h          | 237 +++++++++++++++++++++++++----
 lib/mempool/rte_mempool_trace_fp.h |  23 +++
 lib/mempool/version.map            |   5 +
 4 files changed, 245 insertions(+), 29 deletions(-)

diff --git a/lib/mempool/mempool_trace_points.c b/lib/mempool/mempool_trace_points.c
index 4ad76deb34..83d353a764 100644
--- a/lib/mempool/mempool_trace_points.c
+++ b/lib/mempool/mempool_trace_points.c
@@ -77,3 +77,12 @@ RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free,
 
 RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname,
 	lib.mempool.set.ops.byname)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk,
+	lib.mempool.cache.zc.put.bulk)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_rewind,
+	lib.mempool.cache.zc.put.rewind)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk,
+	lib.mempool.cache.zc.get.bulk)
diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 9f530db24b..15bc0af92e 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -1346,6 +1346,198 @@ rte_mempool_cache_flush(struct rte_mempool_cache *cache,
 	cache->len = 0;
 }
 
+
+/**
+ * @internal used by rte_mempool_cache_zc_put_bulk() and rte_mempool_do_generic_put().
+ *
+ * Zero-copy put objects in a mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to be put in the mempool cache.
+ * @return
+ *   The pointer to where to put the objects in the mempool cache.
+ *   NULL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+static __rte_always_inline void **
+__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	void **cache_objs;
+
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	if (n <= cache->flushthresh - cache->len) {
+		/*
+		 * The objects can be added to the cache without crossing the
+		 * flush threshold.
+		 */
+		cache_objs = &cache->objs[cache->len];
+		cache->len += n;
+	} else if (likely(n <= cache->flushthresh)) {
+		/*
+		 * The request itself fits into the cache.
+		 * But first, the cache must be flushed to the backend, so
+		 * adding the objects does not cross the flush threshold.
+		 */
+		cache_objs = &cache->objs[0];
+		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
+		cache->len = n;
+	} else {
+		/* The request itself is too big for the cache. */
+		return NULL;
+	}
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
+
+	return cache_objs;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy put objects in a mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to be put in the mempool cache.
+ * @return
+ *   The pointer to where to put the objects in the mempool cache.
+ *   NULL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+__rte_experimental
+static __rte_always_inline void **
+rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
+	return __rte_mempool_cache_zc_put_bulk(cache, mp, n);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy un-put objects in a mempool cache.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param n
+ *   The number of objects not put in the mempool cache after calling
+ *   rte_mempool_cache_zc_put_bulk().
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
+		unsigned int n)
+{
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(n <= cache->len);
+
+	rte_mempool_trace_cache_zc_put_rewind(cache, n);
+
+	cache->len -= n;
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy get objects from a mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to be made available for extraction from the mempool cache.
+ * @return
+ *   The pointer to the objects in the mempool cache.
+ *   NULL on error; i.e. the cache + the pool does not contain 'n' objects.
+ *   With rte_errno set to the error code of the mempool dequeue function,
+ *   or EINVAL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+__rte_experimental
+static __rte_always_inline void *
+rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	unsigned int len, size;
+
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
+
+	len = cache->len;
+	size = cache->size;
+
+	if (n <= len) {
+		/* The request can be satisfied from the cache as is. */
+		len -= n;
+	} else if (likely(n <= size)) {
+		/*
+		 * The request itself can be satisfied from the cache.
+		 * But first, the cache must be filled from the backend;
+		 * fetch size + requested - len objects.
+		 */
+		int ret;
+
+		ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], size + n - len);
+		if (unlikely(ret < 0)) {
+			/*
+			 * We are buffer constrained.
+			 * Do not fill the cache, just satisfy the request.
+			 */
+			ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], n - len);
+			if (unlikely(ret < 0)) {
+				/* Unable to satisfy the request. */
+
+				RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
+				RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
+
+				rte_errno = -ret;
+				return NULL;
+			}
+
+			len = 0;
+		} else {
+			len = size;
+		}
+	} else {
+		/* The request itself is too big for the cache. */
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	cache->len = len;
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
+	return &cache->objs[len];
+}
+
 /**
  * @internal Put several objects back in the mempool; used internally.
  * @param mp
@@ -1364,32 +1556,25 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 {
 	void **cache_objs;
 
-	/* No cache provided */
-	if (unlikely(cache == NULL))
-		goto driver_enqueue;
+	/* No cache provided? */
+	if (unlikely(cache == NULL)) {
+		/* Increment stats now, adding in mempool always succeeds. */
+		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
+		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
 
-	/* increment stat now, adding in mempool always success */
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
+		goto driver_enqueue;
+	}
 
-	/* The request itself is too big for the cache */
-	if (unlikely(n > cache->flushthresh))
-		goto driver_enqueue_stats_incremented;
+	/* Prepare to add the objects to the cache. */
+	cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
 
-	/*
-	 * The cache follows the following algorithm:
-	 *   1. If the objects cannot be added to the cache without crossing
-	 *      the flush threshold, flush the cache to the backend.
-	 *   2. Add the objects to the cache.
-	 */
+	/* The request itself is too big for the cache? */
+	if (unlikely(cache_objs == NULL)) {
+		/* Increment stats now, adding in mempool always succeeds. */
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
 
-	if (cache->len + n <= cache->flushthresh) {
-		cache_objs = &cache->objs[cache->len];
-		cache->len += n;
-	} else {
-		cache_objs = &cache->objs[0];
-		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
-		cache->len = n;
+		goto driver_enqueue;
 	}
 
 	/* Add the objects to the cache. */
@@ -1399,13 +1584,7 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 
 driver_enqueue:
 
-	/* increment stat now, adding in mempool always success */
-	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
-	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
-
-driver_enqueue_stats_incremented:
-
-	/* push objects to the backend */
+	/* Push the objects to the backend. */
 	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
 }
 
diff --git a/lib/mempool/rte_mempool_trace_fp.h b/lib/mempool/rte_mempool_trace_fp.h
index ed060e887c..14666457f7 100644
--- a/lib/mempool/rte_mempool_trace_fp.h
+++ b/lib/mempool/rte_mempool_trace_fp.h
@@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
 	rte_trace_point_emit_ptr(mempool);
 )
 
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_put_bulk,
+	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_ptr(mempool);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_put_rewind,
+	RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_get_bulk,
+	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_ptr(mempool);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/mempool/version.map b/lib/mempool/version.map
index b67d7aace7..1383ae6db2 100644
--- a/lib/mempool/version.map
+++ b/lib/mempool/version.map
@@ -63,6 +63,11 @@ EXPERIMENTAL {
 	__rte_mempool_trace_ops_alloc;
 	__rte_mempool_trace_ops_free;
 	__rte_mempool_trace_set_ops_byname;
+
+	# added in 23.03
+	__rte_mempool_trace_cache_zc_put_bulk;
+	__rte_mempool_trace_cache_zc_put_rewind;
+	__rte_mempool_trace_cache_zc_get_bulk;
 };
 
 INTERNAL {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v8] mempool cache: add zero-copy get and put functions
  2022-11-05 13:19 [RFC]: mempool: zero-copy cache get bulk Morten Brørup
                   ` (7 preceding siblings ...)
  2023-02-09 14:52 ` [PATCH v7] " Morten Brørup
@ 2023-02-09 14:58 ` Morten Brørup
  2023-02-10  8:35   ` fengchengwen
  2023-02-12 19:56   ` Honnappa Nagarahalli
  2023-02-13 12:24 ` [PATCH v9] " Morten Brørup
  9 siblings, 2 replies; 36+ messages in thread
From: Morten Brørup @ 2023-02-09 14:58 UTC (permalink / raw)
  To: olivier.matz, andrew.rybchenko, honnappa.nagarahalli,
	kamalakshitha.aligeri, bruce.richardson, konstantin.ananyev, dev
  Cc: nd, david.marchand, Morten Brørup

Zero-copy access to mempool caches is beneficial for PMD performance, and
must be provided by the mempool library to fix [Bug 1052] without a
performance regression.

[Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052

Bugzilla ID: 1052

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>

v8:
* Actually include the rte_errno header file.
  Note to self: The changes only take effect on the disk after the file in
  the text editor has been saved.
v7:
* Fix typo in function description. (checkpatch)
* Zero-copy functions may set rte_errno; include rte_errno header file.
  (ci/loongarch-compilation)
v6:
* Improve description of the 'n' parameter to the zero-copy get function.
  (Konstantin, Bruce)
* The caches used for zero-copy may not be user-owned, so remove this word
  from the function descriptions. (Kamalakshitha)
v5:
* Bugfix: Compare zero-copy get request to the cache size instead of the
  flush threshold; otherwise refill could overflow the memory allocated
  for the cache. (Andrew)
* Split the zero-copy put function into an internal function doing the
  work, and a public function with trace.
* Avoid code duplication by rewriting rte_mempool_do_generic_put() to use
  the internal zero-copy put function. (Andrew)
* Corrected the return type of rte_mempool_cache_zc_put_bulk() from void *
  to void **; it returns a pointer to an array of objects.
* Fix coding style: Add missing curly brackets. (Andrew)
v4:
* Fix checkpatch warnings.
v3:
* Bugfix: Respect the cache size; compare to the flush threshold instead
  of RTE_MEMPOOL_CACHE_MAX_SIZE.
* Added 'rewind' function for incomplete 'put' operations. (Konstantin)
* Replace RTE_ASSERTs with runtime checks of the request size.
  Instead of failing, return NULL if the request is too big. (Konstantin)
* Modified comparison to prevent overflow if n is really huge and len is
  non-zero.
* Updated the comments in the code.
v2:
* Fix checkpatch warnings.
* Fix missing registration of trace points.
* The functions are inline, so they don't go into the map file.
v1 changes from the RFC:
* Removed run-time parameter checks. (Honnappa)
  This is a hot fast path function; requiring correct application
  behaviour, i.e. function parameters must be valid.
* Added RTE_ASSERT for parameters instead.
  Code for this is only generated if built with RTE_ENABLE_ASSERT.
* Removed fallback when 'cache' parameter is not set. (Honnappa)
* Chose the simple get function; i.e. do not move the existing objects in
  the cache to the top of the new stack, just leave them at the bottom.
* Renamed the functions. Other suggestions are welcome, of course. ;-)
* Updated the function descriptions.
* Added the functions to trace_fp and version.map.
---
 lib/mempool/mempool_trace_points.c |   9 ++
 lib/mempool/rte_mempool.h          | 238 +++++++++++++++++++++++++----
 lib/mempool/rte_mempool_trace_fp.h |  23 +++
 lib/mempool/version.map            |   5 +
 4 files changed, 246 insertions(+), 29 deletions(-)

diff --git a/lib/mempool/mempool_trace_points.c b/lib/mempool/mempool_trace_points.c
index 4ad76deb34..83d353a764 100644
--- a/lib/mempool/mempool_trace_points.c
+++ b/lib/mempool/mempool_trace_points.c
@@ -77,3 +77,12 @@ RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free,
 
 RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname,
 	lib.mempool.set.ops.byname)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk,
+	lib.mempool.cache.zc.put.bulk)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_rewind,
+	lib.mempool.cache.zc.put.rewind)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk,
+	lib.mempool.cache.zc.get.bulk)
diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 9f530db24b..711a9d1c16 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -42,6 +42,7 @@
 #include <rte_config.h>
 #include <rte_spinlock.h>
 #include <rte_debug.h>
+#include <rte_errno.h>
 #include <rte_lcore.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
@@ -1346,6 +1347,198 @@ rte_mempool_cache_flush(struct rte_mempool_cache *cache,
 	cache->len = 0;
 }
 
+
+/**
+ * @internal used by rte_mempool_cache_zc_put_bulk() and rte_mempool_do_generic_put().
+ *
+ * Zero-copy put objects in a mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to be put in the mempool cache.
+ * @return
+ *   The pointer to where to put the objects in the mempool cache.
+ *   NULL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+static __rte_always_inline void **
+__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	void **cache_objs;
+
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	if (n <= cache->flushthresh - cache->len) {
+		/*
+		 * The objects can be added to the cache without crossing the
+		 * flush threshold.
+		 */
+		cache_objs = &cache->objs[cache->len];
+		cache->len += n;
+	} else if (likely(n <= cache->flushthresh)) {
+		/*
+		 * The request itself fits into the cache.
+		 * But first, the cache must be flushed to the backend, so
+		 * adding the objects does not cross the flush threshold.
+		 */
+		cache_objs = &cache->objs[0];
+		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
+		cache->len = n;
+	} else {
+		/* The request itself is too big for the cache. */
+		return NULL;
+	}
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
+
+	return cache_objs;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy put objects in a mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to be put in the mempool cache.
+ * @return
+ *   The pointer to where to put the objects in the mempool cache.
+ *   NULL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+__rte_experimental
+static __rte_always_inline void **
+rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
+	return __rte_mempool_cache_zc_put_bulk(cache, mp, n);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy un-put objects in a mempool cache.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param n
+ *   The number of objects not put in the mempool cache after calling
+ *   rte_mempool_cache_zc_put_bulk().
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
+		unsigned int n)
+{
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(n <= cache->len);
+
+	rte_mempool_trace_cache_zc_put_rewind(cache, n);
+
+	cache->len -= n;
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy get objects from a mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to be made available for extraction from the mempool cache.
+ * @return
+ *   The pointer to the objects in the mempool cache.
+ *   NULL on error; i.e. the cache + the pool does not contain 'n' objects.
+ *   With rte_errno set to the error code of the mempool dequeue function,
+ *   or EINVAL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+__rte_experimental
+static __rte_always_inline void *
+rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	unsigned int len, size;
+
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
+
+	len = cache->len;
+	size = cache->size;
+
+	if (n <= len) {
+		/* The request can be satisfied from the cache as is. */
+		len -= n;
+	} else if (likely(n <= size)) {
+		/*
+		 * The request itself can be satisfied from the cache.
+		 * But first, the cache must be filled from the backend;
+		 * fetch size + requested - len objects.
+		 */
+		int ret;
+
+		ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], size + n - len);
+		if (unlikely(ret < 0)) {
+			/*
+			 * We are buffer constrained.
+			 * Do not fill the cache, just satisfy the request.
+			 */
+			ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], n - len);
+			if (unlikely(ret < 0)) {
+				/* Unable to satisfy the request. */
+
+				RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
+				RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
+
+				rte_errno = -ret;
+				return NULL;
+			}
+
+			len = 0;
+		} else {
+			len = size;
+		}
+	} else {
+		/* The request itself is too big for the cache. */
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	cache->len = len;
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
+	return &cache->objs[len];
+}
+
 /**
  * @internal Put several objects back in the mempool; used internally.
  * @param mp
@@ -1364,32 +1557,25 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 {
 	void **cache_objs;
 
-	/* No cache provided */
-	if (unlikely(cache == NULL))
-		goto driver_enqueue;
+	/* No cache provided? */
+	if (unlikely(cache == NULL)) {
+		/* Increment stats now, adding in mempool always succeeds. */
+		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
+		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
 
-	/* increment stat now, adding in mempool always success */
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
+		goto driver_enqueue;
+	}
 
-	/* The request itself is too big for the cache */
-	if (unlikely(n > cache->flushthresh))
-		goto driver_enqueue_stats_incremented;
+	/* Prepare to add the objects to the cache. */
+	cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
 
-	/*
-	 * The cache follows the following algorithm:
-	 *   1. If the objects cannot be added to the cache without crossing
-	 *      the flush threshold, flush the cache to the backend.
-	 *   2. Add the objects to the cache.
-	 */
+	/* The request itself is too big for the cache? */
+	if (unlikely(cache_objs == NULL)) {
+		/* Increment stats now, adding in mempool always succeeds. */
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
 
-	if (cache->len + n <= cache->flushthresh) {
-		cache_objs = &cache->objs[cache->len];
-		cache->len += n;
-	} else {
-		cache_objs = &cache->objs[0];
-		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
-		cache->len = n;
+		goto driver_enqueue;
 	}
 
 	/* Add the objects to the cache. */
@@ -1399,13 +1585,7 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 
 driver_enqueue:
 
-	/* increment stat now, adding in mempool always success */
-	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
-	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
-
-driver_enqueue_stats_incremented:
-
-	/* push objects to the backend */
+	/* Push the objects to the backend. */
 	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
 }
 
diff --git a/lib/mempool/rte_mempool_trace_fp.h b/lib/mempool/rte_mempool_trace_fp.h
index ed060e887c..14666457f7 100644
--- a/lib/mempool/rte_mempool_trace_fp.h
+++ b/lib/mempool/rte_mempool_trace_fp.h
@@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
 	rte_trace_point_emit_ptr(mempool);
 )
 
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_put_bulk,
+	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_ptr(mempool);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_put_rewind,
+	RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_get_bulk,
+	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_ptr(mempool);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/mempool/version.map b/lib/mempool/version.map
index b67d7aace7..1383ae6db2 100644
--- a/lib/mempool/version.map
+++ b/lib/mempool/version.map
@@ -63,6 +63,11 @@ EXPERIMENTAL {
 	__rte_mempool_trace_ops_alloc;
 	__rte_mempool_trace_ops_free;
 	__rte_mempool_trace_set_ops_byname;
+
+	# added in 23.03
+	__rte_mempool_trace_cache_zc_put_bulk;
+	__rte_mempool_trace_cache_zc_put_rewind;
+	__rte_mempool_trace_cache_zc_get_bulk;
 };
 
 INTERNAL {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v8] mempool cache: add zero-copy get and put functions
  2023-02-09 14:58 ` [PATCH v8] " Morten Brørup
@ 2023-02-10  8:35   ` fengchengwen
  2023-02-12 19:56   ` Honnappa Nagarahalli
  1 sibling, 0 replies; 36+ messages in thread
From: fengchengwen @ 2023-02-10  8:35 UTC (permalink / raw)
  To: Morten Brørup, olivier.matz, andrew.rybchenko,
	honnappa.nagarahalli, kamalakshitha.aligeri, bruce.richardson,
	konstantin.ananyev, dev
  Cc: nd, david.marchand

Acked-by: Chengwen Feng <fengchengwen@huawei.com>

On 2023/2/9 22:58, Morten Brørup wrote:
> Zero-copy access to mempool caches is beneficial for PMD performance, and
> must be provided by the mempool library to fix [Bug 1052] without a
> performance regression.
> 
> [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> 
> Bugzilla ID: 1052
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> 

...

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v8] mempool cache: add zero-copy get and put functions
  2023-02-09 14:58 ` [PATCH v8] " Morten Brørup
  2023-02-10  8:35   ` fengchengwen
@ 2023-02-12 19:56   ` Honnappa Nagarahalli
  2023-02-12 23:15     ` Morten Brørup
  1 sibling, 1 reply; 36+ messages in thread
From: Honnappa Nagarahalli @ 2023-02-12 19:56 UTC (permalink / raw)
  To: Morten Brørup, olivier.matz, andrew.rybchenko,
	Kamalakshitha Aligeri, bruce.richardson, konstantin.ananyev, dev
  Cc: nd, david.marchand, nd

Hi Morten,
	Apologies for the late comments.

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Thursday, February 9, 2023 8:59 AM
> To: olivier.matz@6wind.com; andrew.rybchenko@oktetlabs.ru; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Kamalakshitha Aligeri
> <Kamalakshitha.Aligeri@arm.com>; bruce.richardson@intel.com;
> konstantin.ananyev@huawei.com; dev@dpdk.org
> Cc: nd <nd@arm.com>; david.marchand@redhat.com; Morten Brørup
> <mb@smartsharesystems.com>
> Subject: [PATCH v8] mempool cache: add zero-copy get and put functions
> 
> Zero-copy access to mempool caches is beneficial for PMD performance, and
> must be provided by the mempool library to fix [Bug 1052] without a
> performance regression.
> 
> [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> 
> Bugzilla ID: 1052
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> 
> v8:
> * Actually include the rte_errno header file.
>   Note to self: The changes only take effect on the disk after the file in
>   the text editor has been saved.
> v7:
> * Fix typo in function description. (checkpatch)
> * Zero-copy functions may set rte_errno; include rte_errno header file.
>   (ci/loongarch-compilation)
> v6:
> * Improve description of the 'n' parameter to the zero-copy get function.
>   (Konstantin, Bruce)
> * The caches used for zero-copy may not be user-owned, so remove this word
>   from the function descriptions. (Kamalakshitha)
> v5:
> * Bugfix: Compare zero-copy get request to the cache size instead of the
>   flush threshold; otherwise refill could overflow the memory allocated
>   for the cache. (Andrew)
> * Split the zero-copy put function into an internal function doing the
>   work, and a public function with trace.
> * Avoid code duplication by rewriting rte_mempool_do_generic_put() to use
>   the internal zero-copy put function. (Andrew)
> * Corrected the return type of rte_mempool_cache_zc_put_bulk() from void
> *
>   to void **; it returns a pointer to an array of objects.
> * Fix coding style: Add missing curly brackets. (Andrew)
> v4:
> * Fix checkpatch warnings.
> v3:
> * Bugfix: Respect the cache size; compare to the flush threshold instead
>   of RTE_MEMPOOL_CACHE_MAX_SIZE.
> * Added 'rewind' function for incomplete 'put' operations. (Konstantin)
> * Replace RTE_ASSERTs with runtime checks of the request size.
>   Instead of failing, return NULL if the request is too big. (Konstantin)
> * Modified comparison to prevent overflow if n is really huge and len is
>   non-zero.
> * Updated the comments in the code.
> v2:
> * Fix checkpatch warnings.
> * Fix missing registration of trace points.
> * The functions are inline, so they don't go into the map file.
> v1 changes from the RFC:
> * Removed run-time parameter checks. (Honnappa)
>   This is a hot fast path function; requiring correct application
>   behaviour, i.e. function parameters must be valid.
> * Added RTE_ASSERT for parameters instead.
>   Code for this is only generated if built with RTE_ENABLE_ASSERT.
> * Removed fallback when 'cache' parameter is not set. (Honnappa)
> * Chose the simple get function; i.e. do not move the existing objects in
>   the cache to the top of the new stack, just leave them at the bottom.
> * Renamed the functions. Other suggestions are welcome, of course. ;-)
> * Updated the function descriptions.
> * Added the functions to trace_fp and version.map.
> ---
>  lib/mempool/mempool_trace_points.c |   9 ++
>  lib/mempool/rte_mempool.h          | 238 +++++++++++++++++++++++++----
>  lib/mempool/rte_mempool_trace_fp.h |  23 +++
>  lib/mempool/version.map            |   5 +
>  4 files changed, 246 insertions(+), 29 deletions(-)
> 
> diff --git a/lib/mempool/mempool_trace_points.c
> b/lib/mempool/mempool_trace_points.c
> index 4ad76deb34..83d353a764 100644
> --- a/lib/mempool/mempool_trace_points.c
> +++ b/lib/mempool/mempool_trace_points.c
> @@ -77,3 +77,12 @@
> RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free,
> 
>  RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname,
>  	lib.mempool.set.ops.byname)
> +
> +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk,
> +	lib.mempool.cache.zc.put.bulk)
> +
> +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_rewind,
> +	lib.mempool.cache.zc.put.rewind)
> +
> +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk,
> +	lib.mempool.cache.zc.get.bulk)
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 9f530db24b..711a9d1c16 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -42,6 +42,7 @@
>  #include <rte_config.h>
>  #include <rte_spinlock.h>
>  #include <rte_debug.h>
> +#include <rte_errno.h>
>  #include <rte_lcore.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_ring.h>
> @@ -1346,6 +1347,198 @@ rte_mempool_cache_flush(struct
> rte_mempool_cache *cache,
>  	cache->len = 0;
>  }
> 
> +
> +/**
> + * @internal used by rte_mempool_cache_zc_put_bulk() and
> rte_mempool_do_generic_put().
> + *
> + * Zero-copy put objects in a mempool cache backed by the specified
> mempool.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param mp
> + *   A pointer to the mempool.
> + * @param n
> + *   The number of objects to be put in the mempool cache.
> + * @return
> + *   The pointer to where to put the objects in the mempool cache.
> + *   NULL if the request itself is too big for the cache, i.e.
> + *   exceeds the cache flush threshold.
> + */
> +static __rte_always_inline void **
> +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> +		struct rte_mempool *mp,
> +		unsigned int n)
> +{
> +	void **cache_objs;
> +
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(mp != NULL);
> +
> +	if (n <= cache->flushthresh - cache->len) {
> +		/*
> +		 * The objects can be added to the cache without crossing the
> +		 * flush threshold.
> +		 */
> +		cache_objs = &cache->objs[cache->len];
> +		cache->len += n;
> +	} else if (likely(n <= cache->flushthresh)) {
> +		/*
> +		 * The request itself fits into the cache.
> +		 * But first, the cache must be flushed to the backend, so
> +		 * adding the objects does not cross the flush threshold.
> +		 */
> +		cache_objs = &cache->objs[0];
> +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> >len);
This is a flush of the cache. It is probably worth having a counter for this.

> +		cache->len = n;
> +	} else {
> +		/* The request itself is too big for the cache. */
This is possibly an error condition. Do we need to set rte_errno? Do we need a counter here to capture that?

> +		return NULL;
> +	}
> +
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> +
> +	return cache_objs;
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: This API may change, or be removed, without prior
> notice.
> + *
> + * Zero-copy put objects in a mempool cache backed by the specified
> mempool.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param mp
> + *   A pointer to the mempool.
> + * @param n
> + *   The number of objects to be put in the mempool cache.
> + * @return
> + *   The pointer to where to put the objects in the mempool cache.
> + *   NULL if the request itself is too big for the cache, i.e.
> + *   exceeds the cache flush threshold.
> + */
> +__rte_experimental
> +static __rte_always_inline void **
> +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> +		struct rte_mempool *mp,
> +		unsigned int n)
> +{
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(mp != NULL);
> +
> +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> +	return __rte_mempool_cache_zc_put_bulk(cache, mp, n); }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: This API may change, or be removed, without prior
> notice.
> + *
> + * Zero-copy un-put objects in a mempool cache.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param n
> + *   The number of objects not put in the mempool cache after calling
> + *   rte_mempool_cache_zc_put_bulk().
> + */
> +__rte_experimental
> +static __rte_always_inline void
> +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> +		unsigned int n)
> +{
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(n <= cache->len);
> +
> +	rte_mempool_trace_cache_zc_put_rewind(cache, n);
> +
> +	cache->len -= n;
> +
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n); }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: This API may change, or be removed, without prior
> notice.
> + *
> + * Zero-copy get objects from a mempool cache backed by the specified
> mempool.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param mp
> + *   A pointer to the mempool.
> + * @param n
> + *   The number of objects to be made available for extraction from the
> mempool cache.
> + * @return
> + *   The pointer to the objects in the mempool cache.
> + *   NULL on error; i.e. the cache + the pool does not contain 'n' objects.
> + *   With rte_errno set to the error code of the mempool dequeue function,
> + *   or EINVAL if the request itself is too big for the cache, i.e.
> + *   exceeds the cache flush threshold.
> + */
> +__rte_experimental
> +static __rte_always_inline void *
> +rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
> +		struct rte_mempool *mp,
> +		unsigned int n)
> +{
> +	unsigned int len, size;
> +
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(mp != NULL);
> +
> +	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
> +
> +	len = cache->len;
> +	size = cache->size;
> +
> +	if (n <= len) {
> +		/* The request can be satisfied from the cache as is. */
> +		len -= n;
> +	} else if (likely(n <= size)) {
> +		/*
> +		 * The request itself can be satisfied from the cache.
> +		 * But first, the cache must be filled from the backend;
> +		 * fetch size + requested - len objects.
> +		 */
> +		int ret;
> +
> +		ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> >objs[len], size + n - len);
> +		if (unlikely(ret < 0)) {
> +			/*
> +			 * We are buffer constrained.
> +			 * Do not fill the cache, just satisfy the request.
> +			 */
> +			ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> >objs[len], n - len);
> +			if (unlikely(ret < 0)) {
> +				/* Unable to satisfy the request. */
> +
> +				RTE_MEMPOOL_STAT_ADD(mp,
> get_fail_bulk, 1);
> +				RTE_MEMPOOL_STAT_ADD(mp,
> get_fail_objs, n);
> +
> +				rte_errno = -ret;
> +				return NULL;
> +			}
> +
> +			len = 0;
> +		} else {
> +			len = size;
> +		}
> +	} else {
> +		/* The request itself is too big for the cache. */
> +		rte_errno = EINVAL;
> +		return NULL;
> +	}
> +
> +	cache->len = len;
> +
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> +
> +	return &cache->objs[len];
> +}
> +
>  /**
>   * @internal Put several objects back in the mempool; used internally.
>   * @param mp
> @@ -1364,32 +1557,25 @@ rte_mempool_do_generic_put(struct
> rte_mempool *mp, void * const *obj_table,  {
>  	void **cache_objs;
> 
> -	/* No cache provided */
> -	if (unlikely(cache == NULL))
> -		goto driver_enqueue;
> +	/* No cache provided? */
> +	if (unlikely(cache == NULL)) {
> +		/* Increment stats now, adding in mempool always succeeds.
> */
> +		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> +		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> 
> -	/* increment stat now, adding in mempool always success */
> -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> +		goto driver_enqueue;
> +	}
> 
> -	/* The request itself is too big for the cache */
> -	if (unlikely(n > cache->flushthresh))
> -		goto driver_enqueue_stats_incremented;
> +	/* Prepare to add the objects to the cache. */
> +	cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> 
> -	/*
> -	 * The cache follows the following algorithm:
> -	 *   1. If the objects cannot be added to the cache without crossing
> -	 *      the flush threshold, flush the cache to the backend.
> -	 *   2. Add the objects to the cache.
> -	 */
> +	/* The request itself is too big for the cache? */
> +	if (unlikely(cache_objs == NULL)) {
> +		/* Increment stats now, adding in mempool always succeeds.
> */
> +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> 
> -	if (cache->len + n <= cache->flushthresh) {
> -		cache_objs = &cache->objs[cache->len];
> -		cache->len += n;
> -	} else {
> -		cache_objs = &cache->objs[0];
> -		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> >len);
> -		cache->len = n;
> +		goto driver_enqueue;
>  	}
> 
>  	/* Add the objects to the cache. */
> @@ -1399,13 +1585,7 @@ rte_mempool_do_generic_put(struct
> rte_mempool *mp, void * const *obj_table,
> 
>  driver_enqueue:
> 
> -	/* increment stat now, adding in mempool always success */
> -	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> -	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> -
> -driver_enqueue_stats_incremented:
> -
> -	/* push objects to the backend */
> +	/* Push the objects to the backend. */
>  	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);  }
> 
> diff --git a/lib/mempool/rte_mempool_trace_fp.h
> b/lib/mempool/rte_mempool_trace_fp.h
> index ed060e887c..14666457f7 100644
> --- a/lib/mempool/rte_mempool_trace_fp.h
> +++ b/lib/mempool/rte_mempool_trace_fp.h
> @@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
>  	rte_trace_point_emit_ptr(mempool);
>  )
> 
> +RTE_TRACE_POINT_FP(
> +	rte_mempool_trace_cache_zc_put_bulk,
> +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> nb_objs),
> +	rte_trace_point_emit_ptr(cache);
> +	rte_trace_point_emit_ptr(mempool);
> +	rte_trace_point_emit_u32(nb_objs);
> +)
> +
> +RTE_TRACE_POINT_FP(
> +	rte_mempool_trace_cache_zc_put_rewind,
> +	RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
> +	rte_trace_point_emit_ptr(cache);
> +	rte_trace_point_emit_u32(nb_objs);
> +)
> +
> +RTE_TRACE_POINT_FP(
> +	rte_mempool_trace_cache_zc_get_bulk,
> +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> nb_objs),
> +	rte_trace_point_emit_ptr(cache);
> +	rte_trace_point_emit_ptr(mempool);
> +	rte_trace_point_emit_u32(nb_objs);
> +)
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/mempool/version.map b/lib/mempool/version.map index
> b67d7aace7..1383ae6db2 100644
> --- a/lib/mempool/version.map
> +++ b/lib/mempool/version.map
> @@ -63,6 +63,11 @@ EXPERIMENTAL {
>  	__rte_mempool_trace_ops_alloc;
>  	__rte_mempool_trace_ops_free;
>  	__rte_mempool_trace_set_ops_byname;
> +
> +	# added in 23.03
> +	__rte_mempool_trace_cache_zc_put_bulk;
> +	__rte_mempool_trace_cache_zc_put_rewind;
> +	__rte_mempool_trace_cache_zc_get_bulk;
>  };
> 
>  INTERNAL {
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v8] mempool cache: add zero-copy get and put functions
  2023-02-12 19:56   ` Honnappa Nagarahalli
@ 2023-02-12 23:15     ` Morten Brørup
  2023-02-13  4:29       ` Honnappa Nagarahalli
  0 siblings, 1 reply; 36+ messages in thread
From: Morten Brørup @ 2023-02-12 23:15 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, andrew.rybchenko,
	Kamalakshitha Aligeri, bruce.richardson, konstantin.ananyev, dev
  Cc: nd, david.marchand, nd

> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Sunday, 12 February 2023 20.57
> 
> Hi Morten,
> 	Apologies for the late comments.

Better late than never. :-)

> 
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: Thursday, February 9, 2023 8:59 AM
> >
> > Zero-copy access to mempool caches is beneficial for PMD performance,
> and
> > must be provided by the mempool library to fix [Bug 1052] without a
> > performance regression.
> >
> > [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> >
> > Bugzilla ID: 1052
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> >
> > v8:
> > * Actually include the rte_errno header file.
> >   Note to self: The changes only take effect on the disk after the
> file in
> >   the text editor has been saved.
> > v7:
> > * Fix typo in function description. (checkpatch)
> > * Zero-copy functions may set rte_errno; include rte_errno header
> file.
> >   (ci/loongarch-compilation)
> > v6:
> > * Improve description of the 'n' parameter to the zero-copy get
> function.
> >   (Konstantin, Bruce)
> > * The caches used for zero-copy may not be user-owned, so remove this
> word
> >   from the function descriptions. (Kamalakshitha)
> > v5:
> > * Bugfix: Compare zero-copy get request to the cache size instead of
> the
> >   flush threshold; otherwise refill could overflow the memory
> allocated
> >   for the cache. (Andrew)
> > * Split the zero-copy put function into an internal function doing
> the
> >   work, and a public function with trace.
> > * Avoid code duplication by rewriting rte_mempool_do_generic_put() to
> use
> >   the internal zero-copy put function. (Andrew)
> > * Corrected the return type of rte_mempool_cache_zc_put_bulk() from
> void
> > *
> >   to void **; it returns a pointer to an array of objects.
> > * Fix coding style: Add missing curly brackets. (Andrew)
> > v4:
> > * Fix checkpatch warnings.
> > v3:
> > * Bugfix: Respect the cache size; compare to the flush threshold
> instead
> >   of RTE_MEMPOOL_CACHE_MAX_SIZE.
> > * Added 'rewind' function for incomplete 'put' operations.
> (Konstantin)
> > * Replace RTE_ASSERTs with runtime checks of the request size.
> >   Instead of failing, return NULL if the request is too big.
> (Konstantin)
> > * Modified comparison to prevent overflow if n is really huge and len
> is
> >   non-zero.
> > * Updated the comments in the code.
> > v2:
> > * Fix checkpatch warnings.
> > * Fix missing registration of trace points.
> > * The functions are inline, so they don't go into the map file.
> > v1 changes from the RFC:
> > * Removed run-time parameter checks. (Honnappa)
> >   This is a hot fast path function; requiring correct application
> >   behaviour, i.e. function parameters must be valid.
> > * Added RTE_ASSERT for parameters instead.
> >   Code for this is only generated if built with RTE_ENABLE_ASSERT.
> > * Removed fallback when 'cache' parameter is not set. (Honnappa)
> > * Chose the simple get function; i.e. do not move the existing
> objects in
> >   the cache to the top of the new stack, just leave them at the
> bottom.
> > * Renamed the functions. Other suggestions are welcome, of course. ;-
> )
> > * Updated the function descriptions.
> > * Added the functions to trace_fp and version.map.
> > ---
> >  lib/mempool/mempool_trace_points.c |   9 ++
> >  lib/mempool/rte_mempool.h          | 238 +++++++++++++++++++++++++--
> --
> >  lib/mempool/rte_mempool_trace_fp.h |  23 +++
> >  lib/mempool/version.map            |   5 +
> >  4 files changed, 246 insertions(+), 29 deletions(-)
> >
> > diff --git a/lib/mempool/mempool_trace_points.c
> > b/lib/mempool/mempool_trace_points.c
> > index 4ad76deb34..83d353a764 100644
> > --- a/lib/mempool/mempool_trace_points.c
> > +++ b/lib/mempool/mempool_trace_points.c
> > @@ -77,3 +77,12 @@
> > RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free,
> >
> >  RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname,
> >  	lib.mempool.set.ops.byname)
> > +
> > +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk,
> > +	lib.mempool.cache.zc.put.bulk)
> > +
> > +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_rewind,
> > +	lib.mempool.cache.zc.put.rewind)
> > +
> > +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk,
> > +	lib.mempool.cache.zc.get.bulk)
> > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> > index 9f530db24b..711a9d1c16 100644
> > --- a/lib/mempool/rte_mempool.h
> > +++ b/lib/mempool/rte_mempool.h
> > @@ -42,6 +42,7 @@
> >  #include <rte_config.h>
> >  #include <rte_spinlock.h>
> >  #include <rte_debug.h>
> > +#include <rte_errno.h>
> >  #include <rte_lcore.h>
> >  #include <rte_branch_prediction.h>
> >  #include <rte_ring.h>
> > @@ -1346,6 +1347,198 @@ rte_mempool_cache_flush(struct
> > rte_mempool_cache *cache,
> >  	cache->len = 0;
> >  }
> >
> > +
> > +/**
> > + * @internal used by rte_mempool_cache_zc_put_bulk() and
> > rte_mempool_do_generic_put().
> > + *
> > + * Zero-copy put objects in a mempool cache backed by the specified
> > mempool.
> > + *
> > + * @param cache
> > + *   A pointer to the mempool cache.
> > + * @param mp
> > + *   A pointer to the mempool.
> > + * @param n
> > + *   The number of objects to be put in the mempool cache.
> > + * @return
> > + *   The pointer to where to put the objects in the mempool cache.
> > + *   NULL if the request itself is too big for the cache, i.e.
> > + *   exceeds the cache flush threshold.
> > + */
> > +static __rte_always_inline void **
> > +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > +		struct rte_mempool *mp,
> > +		unsigned int n)
> > +{
> > +	void **cache_objs;
> > +
> > +	RTE_ASSERT(cache != NULL);
> > +	RTE_ASSERT(mp != NULL);
> > +
> > +	if (n <= cache->flushthresh - cache->len) {
> > +		/*
> > +		 * The objects can be added to the cache without crossing
> the
> > +		 * flush threshold.
> > +		 */
> > +		cache_objs = &cache->objs[cache->len];
> > +		cache->len += n;
> > +	} else if (likely(n <= cache->flushthresh)) {
> > +		/*
> > +		 * The request itself fits into the cache.
> > +		 * But first, the cache must be flushed to the backend, so
> > +		 * adding the objects does not cross the flush threshold.
> > +		 */
> > +		cache_objs = &cache->objs[0];
> > +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > >len);
> This is a flush of the cache. It is probably worth having a counter for
> this.

We somewhat already do. The put_common_pool_bulk counter is incremented in rte_mempool_ops_enqueue_bulk() [1].

[1]: https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/mempool/rte_mempool.h#L824

This counter doesn't exactly count the number of times the cache was flushed, because it also counts bulk put transactions not going via the cache.

Thinking further about it, I agree that specific counters for cache flush and cache refill could be useful, and should be added. However, being this late, I would prefer postponing them for a separate patch.

> 
> > +		cache->len = n;
> > +	} else {
> > +		/* The request itself is too big for the cache. */
> This is possibly an error condition. Do we need to set rte_errno?

I considered this when I wrote the function, and concluded that this function only returns NULL as normal behavior, never because of failure. E.g. if a cache can only hold 4 mbufs, and the PMD tries to store 32 mbufs, it is correct behavior of the PMD to call this function and learn that the cache is too small for direct access; it is not a failure in the PMD nor in the cache.

If it could return NULL due to a variety of reasons, I would probably agree that we *need* to set rte_errno, so the application could determine the reason.

But since it can only return NULL due to one reason (which involves correct use of the function), I convinced myself that setting rte_errno would not convey any additional information than the NULL return value itself, and be a waste of performance in the fast path. If you disagree, then it should be set to EINVAL, like when rte_mempool_cache_zc_get_bulk() is called with a request too big for the cache.

> Do we need a counter here to capture that?

Good question. I don't know. It would indicate that a cache is smaller than expected by the users trying to access the cache directly.

And if we add such a counter, we should probably add a similar counter for the cache get function too.

But again, being this late, I would postpone such counters for a separate patch. And they should trigger more discussions about required/useful counters.

For reference, the rte_mempool_debug_stats is cache aligned and currently holds 12 64-bit counters, so we can add 4 more - which is exactly the number discussed here - without changing its size. So this is not a barrier to adding those counters.

Furthermore, I suppose that we only want to increase the counter when the called through the mempool cache API, not when called indirectly through the mempool API. This would mean that the ordinary mempool functions cannot call the mempool cache functions, or the wrong counters would increase. So adding such counters is not completely trivial.

> 
> > +		return NULL;
> > +	}
> > +
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > +
> > +	return cache_objs;
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: This API may change, or be removed, without
> prior
> > notice.
> > + *
> > + * Zero-copy put objects in a mempool cache backed by the specified
> > mempool.
> > + *
> > + * @param cache
> > + *   A pointer to the mempool cache.
> > + * @param mp
> > + *   A pointer to the mempool.
> > + * @param n
> > + *   The number of objects to be put in the mempool cache.
> > + * @return
> > + *   The pointer to where to put the objects in the mempool cache.
> > + *   NULL if the request itself is too big for the cache, i.e.
> > + *   exceeds the cache flush threshold.
> > + */
> > +__rte_experimental
> > +static __rte_always_inline void **
> > +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > +		struct rte_mempool *mp,
> > +		unsigned int n)
> > +{
> > +	RTE_ASSERT(cache != NULL);
> > +	RTE_ASSERT(mp != NULL);
> > +
> > +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> > +	return __rte_mempool_cache_zc_put_bulk(cache, mp, n); }
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: This API may change, or be removed, without
> prior
> > notice.
> > + *
> > + * Zero-copy un-put objects in a mempool cache.
> > + *
> > + * @param cache
> > + *   A pointer to the mempool cache.
> > + * @param n
> > + *   The number of objects not put in the mempool cache after
> calling
> > + *   rte_mempool_cache_zc_put_bulk().
> > + */
> > +__rte_experimental
> > +static __rte_always_inline void
> > +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> > +		unsigned int n)
> > +{
> > +	RTE_ASSERT(cache != NULL);
> > +	RTE_ASSERT(n <= cache->len);
> > +
> > +	rte_mempool_trace_cache_zc_put_rewind(cache, n);
> > +
> > +	cache->len -= n;
> > +
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n); }
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: This API may change, or be removed, without
> prior
> > notice.
> > + *
> > + * Zero-copy get objects from a mempool cache backed by the
> specified
> > mempool.
> > + *
> > + * @param cache
> > + *   A pointer to the mempool cache.
> > + * @param mp
> > + *   A pointer to the mempool.
> > + * @param n
> > + *   The number of objects to be made available for extraction from
> the
> > mempool cache.
> > + * @return
> > + *   The pointer to the objects in the mempool cache.
> > + *   NULL on error; i.e. the cache + the pool does not contain 'n'
> objects.
> > + *   With rte_errno set to the error code of the mempool dequeue
> function,
> > + *   or EINVAL if the request itself is too big for the cache, i.e.
> > + *   exceeds the cache flush threshold.
> > + */
> > +__rte_experimental
> > +static __rte_always_inline void *
> > +rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
> > +		struct rte_mempool *mp,
> > +		unsigned int n)
> > +{
> > +	unsigned int len, size;
> > +
> > +	RTE_ASSERT(cache != NULL);
> > +	RTE_ASSERT(mp != NULL);
> > +
> > +	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
> > +
> > +	len = cache->len;
> > +	size = cache->size;
> > +
> > +	if (n <= len) {
> > +		/* The request can be satisfied from the cache as is. */
> > +		len -= n;
> > +	} else if (likely(n <= size)) {
> > +		/*
> > +		 * The request itself can be satisfied from the cache.
> > +		 * But first, the cache must be filled from the backend;
> > +		 * fetch size + requested - len objects.
> > +		 */
> > +		int ret;
> > +
> > +		ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> > >objs[len], size + n - len);
> > +		if (unlikely(ret < 0)) {
> > +			/*
> > +			 * We are buffer constrained.
> > +			 * Do not fill the cache, just satisfy the request.
> > +			 */
> > +			ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> > >objs[len], n - len);
> > +			if (unlikely(ret < 0)) {
> > +				/* Unable to satisfy the request. */
> > +
> > +				RTE_MEMPOOL_STAT_ADD(mp,
> > get_fail_bulk, 1);
> > +				RTE_MEMPOOL_STAT_ADD(mp,
> > get_fail_objs, n);
> > +
> > +				rte_errno = -ret;
> > +				return NULL;
> > +			}
> > +
> > +			len = 0;
> > +		} else {
> > +			len = size;
> > +		}
> > +	} else {
> > +		/* The request itself is too big for the cache. */
> > +		rte_errno = EINVAL;
> > +		return NULL;
> > +	}
> > +
> > +	cache->len = len;
> > +
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> > +
> > +	return &cache->objs[len];
> > +}
> > +
> >  /**
> >   * @internal Put several objects back in the mempool; used
> internally.
> >   * @param mp
> > @@ -1364,32 +1557,25 @@ rte_mempool_do_generic_put(struct
> > rte_mempool *mp, void * const *obj_table,  {
> >  	void **cache_objs;
> >
> > -	/* No cache provided */
> > -	if (unlikely(cache == NULL))
> > -		goto driver_enqueue;
> > +	/* No cache provided? */
> > +	if (unlikely(cache == NULL)) {
> > +		/* Increment stats now, adding in mempool always succeeds.
> > */
> > +		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > +		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> >
> > -	/* increment stat now, adding in mempool always success */
> > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > +		goto driver_enqueue;
> > +	}
> >
> > -	/* The request itself is too big for the cache */
> > -	if (unlikely(n > cache->flushthresh))
> > -		goto driver_enqueue_stats_incremented;
> > +	/* Prepare to add the objects to the cache. */
> > +	cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> >
> > -	/*
> > -	 * The cache follows the following algorithm:
> > -	 *   1. If the objects cannot be added to the cache without
> crossing
> > -	 *      the flush threshold, flush the cache to the backend.
> > -	 *   2. Add the objects to the cache.
> > -	 */
> > +	/* The request itself is too big for the cache? */
> > +	if (unlikely(cache_objs == NULL)) {
> > +		/* Increment stats now, adding in mempool always succeeds.
> > */
> > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> >
> > -	if (cache->len + n <= cache->flushthresh) {
> > -		cache_objs = &cache->objs[cache->len];
> > -		cache->len += n;
> > -	} else {
> > -		cache_objs = &cache->objs[0];
> > -		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > >len);
> > -		cache->len = n;
> > +		goto driver_enqueue;
> >  	}
> >
> >  	/* Add the objects to the cache. */
> > @@ -1399,13 +1585,7 @@ rte_mempool_do_generic_put(struct
> > rte_mempool *mp, void * const *obj_table,
> >
> >  driver_enqueue:
> >
> > -	/* increment stat now, adding in mempool always success */
> > -	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > -	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > -
> > -driver_enqueue_stats_incremented:
> > -
> > -	/* push objects to the backend */
> > +	/* Push the objects to the backend. */
> >  	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);  }
> >
> > diff --git a/lib/mempool/rte_mempool_trace_fp.h
> > b/lib/mempool/rte_mempool_trace_fp.h
> > index ed060e887c..14666457f7 100644
> > --- a/lib/mempool/rte_mempool_trace_fp.h
> > +++ b/lib/mempool/rte_mempool_trace_fp.h
> > @@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
> >  	rte_trace_point_emit_ptr(mempool);
> >  )
> >
> > +RTE_TRACE_POINT_FP(
> > +	rte_mempool_trace_cache_zc_put_bulk,
> > +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> > nb_objs),
> > +	rte_trace_point_emit_ptr(cache);
> > +	rte_trace_point_emit_ptr(mempool);
> > +	rte_trace_point_emit_u32(nb_objs);
> > +)
> > +
> > +RTE_TRACE_POINT_FP(
> > +	rte_mempool_trace_cache_zc_put_rewind,
> > +	RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
> > +	rte_trace_point_emit_ptr(cache);
> > +	rte_trace_point_emit_u32(nb_objs);
> > +)
> > +
> > +RTE_TRACE_POINT_FP(
> > +	rte_mempool_trace_cache_zc_get_bulk,
> > +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> > nb_objs),
> > +	rte_trace_point_emit_ptr(cache);
> > +	rte_trace_point_emit_ptr(mempool);
> > +	rte_trace_point_emit_u32(nb_objs);
> > +)
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/lib/mempool/version.map b/lib/mempool/version.map index
> > b67d7aace7..1383ae6db2 100644
> > --- a/lib/mempool/version.map
> > +++ b/lib/mempool/version.map
> > @@ -63,6 +63,11 @@ EXPERIMENTAL {
> >  	__rte_mempool_trace_ops_alloc;
> >  	__rte_mempool_trace_ops_free;
> >  	__rte_mempool_trace_set_ops_byname;
> > +
> > +	# added in 23.03
> > +	__rte_mempool_trace_cache_zc_put_bulk;
> > +	__rte_mempool_trace_cache_zc_put_rewind;
> > +	__rte_mempool_trace_cache_zc_get_bulk;
> >  };
> >
> >  INTERNAL {
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v8] mempool cache: add zero-copy get and put functions
  2023-02-12 23:15     ` Morten Brørup
@ 2023-02-13  4:29       ` Honnappa Nagarahalli
  2023-02-13  9:30         ` Morten Brørup
  2023-02-13  9:37         ` Olivier Matz
  0 siblings, 2 replies; 36+ messages in thread
From: Honnappa Nagarahalli @ 2023-02-13  4:29 UTC (permalink / raw)
  To: Morten Brørup, olivier.matz, andrew.rybchenko,
	Kamalakshitha Aligeri, bruce.richardson, konstantin.ananyev, dev
  Cc: nd, david.marchand, nd

<snip>

> > > +/**
> > > + * @internal used by rte_mempool_cache_zc_put_bulk() and
> > > rte_mempool_do_generic_put().
> > > + *
> > > + * Zero-copy put objects in a mempool cache backed by the specified
> > > mempool.
> > > + *
> > > + * @param cache
> > > + *   A pointer to the mempool cache.
> > > + * @param mp
> > > + *   A pointer to the mempool.
> > > + * @param n
> > > + *   The number of objects to be put in the mempool cache.
> > > + * @return
> > > + *   The pointer to where to put the objects in the mempool cache.
> > > + *   NULL if the request itself is too big for the cache, i.e.
> > > + *   exceeds the cache flush threshold.
> > > + */
> > > +static __rte_always_inline void **
> > > +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > +		struct rte_mempool *mp,
> > > +		unsigned int n)
> > > +{
> > > +	void **cache_objs;
> > > +
> > > +	RTE_ASSERT(cache != NULL);
> > > +	RTE_ASSERT(mp != NULL);
> > > +
> > > +	if (n <= cache->flushthresh - cache->len) {
> > > +		/*
> > > +		 * The objects can be added to the cache without crossing
> > the
> > > +		 * flush threshold.
> > > +		 */
> > > +		cache_objs = &cache->objs[cache->len];
> > > +		cache->len += n;
> > > +	} else if (likely(n <= cache->flushthresh)) {
> > > +		/*
> > > +		 * The request itself fits into the cache.
> > > +		 * But first, the cache must be flushed to the backend, so
> > > +		 * adding the objects does not cross the flush threshold.
> > > +		 */
> > > +		cache_objs = &cache->objs[0];
> > > +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > > >len);
> > This is a flush of the cache. It is probably worth having a counter
> > for this.
> 
> We somewhat already do. The put_common_pool_bulk counter is
> incremented in rte_mempool_ops_enqueue_bulk() [1].
> 
> [1]:
> https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/mempool/rte_mempool.
> h#L824
> 
> This counter doesn't exactly count the number of times the cache was
> flushed, because it also counts bulk put transactions not going via the cache.
> 
> Thinking further about it, I agree that specific counters for cache flush and
> cache refill could be useful, and should be added. However, being this late, I
> would prefer postponing them for a separate patch.
Agree, can be in a separate patch, they never existed.

> 
> >
> > > +		cache->len = n;
> > > +	} else {
> > > +		/* The request itself is too big for the cache. */
> > This is possibly an error condition. Do we need to set rte_errno?
> 
> I considered this when I wrote the function, and concluded that this function
> only returns NULL as normal behavior, never because of failure. E.g. if a cache
> can only hold 4 mbufs, and the PMD tries to store 32 mbufs, it is correct
> behavior of the PMD to call this function and learn that the cache is too small
> for direct access; it is not a failure in the PMD nor in the cache.
This condition happens when there is a mismatch between the cache configuration and the behavior of the PMD. From this perspective I think this is an error. This could go unnoticed, I do not think this misconfiguration is reported anywhere.

> 
> If it could return NULL due to a variety of reasons, I would probably agree that
> we *need* to set rte_errno, so the application could determine the reason.
> 
> But since it can only return NULL due to one reason (which involves correct
> use of the function), I convinced myself that setting rte_errno would not
> convey any additional information than the NULL return value itself, and be a
> waste of performance in the fast path. If you disagree, then it should be set to
> EINVAL, like when rte_mempool_cache_zc_get_bulk() is called with a request
> too big for the cache.
> 
> > Do we need a counter here to capture that?
> 
> Good question. I don't know. It would indicate that a cache is smaller than
> expected by the users trying to access the cache directly.
> 
> And if we add such a counter, we should probably add a similar counter for
> the cache get function too.
Agree

> 
> But again, being this late, I would postpone such counters for a separate
> patch. And they should trigger more discussions about required/useful
> counters.
Agree, should be postponed to another patch.

> 
> For reference, the rte_mempool_debug_stats is cache aligned and currently
> holds 12 64-bit counters, so we can add 4 more - which is exactly the number
> discussed here - without changing its size. So this is not a barrier to adding
> those counters.
> 
> Furthermore, I suppose that we only want to increase the counter when the
> called through the mempool cache API, not when called indirectly through
> the mempool API. This would mean that the ordinary mempool functions
> cannot call the mempool cache functions, or the wrong counters would
> increase. So adding such counters is not completely trivial.
> 
> >
> > > +		return NULL;
> > > +	}
> > > +
> > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > +
> > > +	return cache_objs;
> > > +}
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > prior
> > > notice.
> > > + *
> > > + * Zero-copy put objects in a mempool cache backed by the specified
> > > mempool.
> > > + *
> > > + * @param cache
> > > + *   A pointer to the mempool cache.
> > > + * @param mp
> > > + *   A pointer to the mempool.
> > > + * @param n
> > > + *   The number of objects to be put in the mempool cache.
> > > + * @return
> > > + *   The pointer to where to put the objects in the mempool cache.
> > > + *   NULL if the request itself is too big for the cache, i.e.
> > > + *   exceeds the cache flush threshold.
> > > + */
> > > +__rte_experimental
> > > +static __rte_always_inline void **
> > > +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > +		struct rte_mempool *mp,
> > > +		unsigned int n)
> > > +{
> > > +	RTE_ASSERT(cache != NULL);
> > > +	RTE_ASSERT(mp != NULL);
> > > +
> > > +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> > > +	return __rte_mempool_cache_zc_put_bulk(cache, mp, n); }
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > prior
> > > notice.
> > > + *
> > > + * Zero-copy un-put objects in a mempool cache.
> > > + *
> > > + * @param cache
> > > + *   A pointer to the mempool cache.
> > > + * @param n
> > > + *   The number of objects not put in the mempool cache after
> > calling
> > > + *   rte_mempool_cache_zc_put_bulk().
> > > + */
> > > +__rte_experimental
> > > +static __rte_always_inline void
> > > +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> > > +		unsigned int n)
Earlier there was a discussion on the API name.
IMO, we should keep the API names similar to those in ring library. This would provide consistency across the libraries.
There were some concerns expressed in PMD having to call 2 APIs. I do not think changing to 2 APIs will have any perf impact.

Also, what is the use case for the 'rewind' API?

> > > +{
> > > +	RTE_ASSERT(cache != NULL);
> > > +	RTE_ASSERT(n <= cache->len);
> > > +
> > > +	rte_mempool_trace_cache_zc_put_rewind(cache, n);
> > > +
> > > +	cache->len -= n;
> > > +
> > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n); }
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > prior
> > > notice.
> > > + *
> > > + * Zero-copy get objects from a mempool cache backed by the
> > specified
> > > mempool.
> > > + *
> > > + * @param cache
> > > + *   A pointer to the mempool cache.
> > > + * @param mp
> > > + *   A pointer to the mempool.
> > > + * @param n
> > > + *   The number of objects to be made available for extraction from
> > the
> > > mempool cache.
> > > + * @return
> > > + *   The pointer to the objects in the mempool cache.
> > > + *   NULL on error; i.e. the cache + the pool does not contain 'n'
> > objects.
> > > + *   With rte_errno set to the error code of the mempool dequeue
> > function,
> > > + *   or EINVAL if the request itself is too big for the cache, i.e.
> > > + *   exceeds the cache flush threshold.
> > > + */
> > > +__rte_experimental
> > > +static __rte_always_inline void *
> > > +rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
> > > +		struct rte_mempool *mp,
> > > +		unsigned int n)
> > > +{
> > > +	unsigned int len, size;
> > > +
> > > +	RTE_ASSERT(cache != NULL);
> > > +	RTE_ASSERT(mp != NULL);
> > > +
> > > +	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
> > > +
> > > +	len = cache->len;
> > > +	size = cache->size;
> > > +
> > > +	if (n <= len) {
> > > +		/* The request can be satisfied from the cache as is. */
> > > +		len -= n;
> > > +	} else if (likely(n <= size)) {
> > > +		/*
> > > +		 * The request itself can be satisfied from the cache.
> > > +		 * But first, the cache must be filled from the backend;
> > > +		 * fetch size + requested - len objects.
> > > +		 */
> > > +		int ret;
> > > +
> > > +		ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> > > >objs[len], size + n - len);
> > > +		if (unlikely(ret < 0)) {
> > > +			/*
> > > +			 * We are buffer constrained.
> > > +			 * Do not fill the cache, just satisfy the request.
> > > +			 */
> > > +			ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> > > >objs[len], n - len);
> > > +			if (unlikely(ret < 0)) {
> > > +				/* Unable to satisfy the request. */
> > > +
> > > +				RTE_MEMPOOL_STAT_ADD(mp,
> > > get_fail_bulk, 1);
> > > +				RTE_MEMPOOL_STAT_ADD(mp,
> > > get_fail_objs, n);
> > > +
> > > +				rte_errno = -ret;
> > > +				return NULL;
> > > +			}
> > > +
> > > +			len = 0;
> > > +		} else {
> > > +			len = size;
> > > +		}
> > > +	} else {
> > > +		/* The request itself is too big for the cache. */
> > > +		rte_errno = EINVAL;
> > > +		return NULL;
> > > +	}
> > > +
> > > +	cache->len = len;
> > > +
> > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> > > +
> > > +	return &cache->objs[len];
> > > +}
> > > +
> > >  /**
> > >   * @internal Put several objects back in the mempool; used
> > internally.
> > >   * @param mp
> > > @@ -1364,32 +1557,25 @@ rte_mempool_do_generic_put(struct
> > > rte_mempool *mp, void * const *obj_table,  {
> > >  	void **cache_objs;
> > >
> > > -	/* No cache provided */
> > > -	if (unlikely(cache == NULL))
> > > -		goto driver_enqueue;
> > > +	/* No cache provided? */
> > > +	if (unlikely(cache == NULL)) {
> > > +		/* Increment stats now, adding in mempool always succeeds.
> > > */
> > > +		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > > +		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > >
> > > -	/* increment stat now, adding in mempool always success */
> > > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > +		goto driver_enqueue;
> > > +	}
> > >
> > > -	/* The request itself is too big for the cache */
> > > -	if (unlikely(n > cache->flushthresh))
> > > -		goto driver_enqueue_stats_incremented;
> > > +	/* Prepare to add the objects to the cache. */
> > > +	cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> > >
> > > -	/*
> > > -	 * The cache follows the following algorithm:
> > > -	 *   1. If the objects cannot be added to the cache without
> > crossing
> > > -	 *      the flush threshold, flush the cache to the backend.
> > > -	 *   2. Add the objects to the cache.
> > > -	 */
> > > +	/* The request itself is too big for the cache? */
> > > +	if (unlikely(cache_objs == NULL)) {
> > > +		/* Increment stats now, adding in mempool always succeeds.
> > > */
> > > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > >
> > > -	if (cache->len + n <= cache->flushthresh) {
> > > -		cache_objs = &cache->objs[cache->len];
> > > -		cache->len += n;
> > > -	} else {
> > > -		cache_objs = &cache->objs[0];
> > > -		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > > >len);
> > > -		cache->len = n;
> > > +		goto driver_enqueue;
> > >  	}
> > >
> > >  	/* Add the objects to the cache. */ @@ -1399,13 +1585,7 @@
> > > rte_mempool_do_generic_put(struct rte_mempool *mp, void * const
> > > *obj_table,
> > >
> > >  driver_enqueue:
> > >
> > > -	/* increment stat now, adding in mempool always success */
> > > -	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > > -	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > > -
> > > -driver_enqueue_stats_incremented:
> > > -
> > > -	/* push objects to the backend */
> > > +	/* Push the objects to the backend. */
> > >  	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);  }
> > >
> > > diff --git a/lib/mempool/rte_mempool_trace_fp.h
> > > b/lib/mempool/rte_mempool_trace_fp.h
> > > index ed060e887c..14666457f7 100644
> > > --- a/lib/mempool/rte_mempool_trace_fp.h
> > > +++ b/lib/mempool/rte_mempool_trace_fp.h
> > > @@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
> > >  	rte_trace_point_emit_ptr(mempool);
> > >  )
> > >
> > > +RTE_TRACE_POINT_FP(
> > > +	rte_mempool_trace_cache_zc_put_bulk,
> > > +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> > > nb_objs),
> > > +	rte_trace_point_emit_ptr(cache);
> > > +	rte_trace_point_emit_ptr(mempool);
> > > +	rte_trace_point_emit_u32(nb_objs);
> > > +)
> > > +
> > > +RTE_TRACE_POINT_FP(
> > > +	rte_mempool_trace_cache_zc_put_rewind,
> > > +	RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
> > > +	rte_trace_point_emit_ptr(cache);
> > > +	rte_trace_point_emit_u32(nb_objs);
> > > +)
> > > +
> > > +RTE_TRACE_POINT_FP(
> > > +	rte_mempool_trace_cache_zc_get_bulk,
> > > +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> > > nb_objs),
> > > +	rte_trace_point_emit_ptr(cache);
> > > +	rte_trace_point_emit_ptr(mempool);
> > > +	rte_trace_point_emit_u32(nb_objs);
> > > +)
> > > +
> > >  #ifdef __cplusplus
> > >  }
> > >  #endif
> > > diff --git a/lib/mempool/version.map b/lib/mempool/version.map index
> > > b67d7aace7..1383ae6db2 100644
> > > --- a/lib/mempool/version.map
> > > +++ b/lib/mempool/version.map
> > > @@ -63,6 +63,11 @@ EXPERIMENTAL {
> > >  	__rte_mempool_trace_ops_alloc;
> > >  	__rte_mempool_trace_ops_free;
> > >  	__rte_mempool_trace_set_ops_byname;
> > > +
> > > +	# added in 23.03
> > > +	__rte_mempool_trace_cache_zc_put_bulk;
> > > +	__rte_mempool_trace_cache_zc_put_rewind;
> > > +	__rte_mempool_trace_cache_zc_get_bulk;
> > >  };
> > >
> > >  INTERNAL {
> > > --
> > > 2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v8] mempool cache: add zero-copy get and put functions
  2023-02-13  4:29       ` Honnappa Nagarahalli
@ 2023-02-13  9:30         ` Morten Brørup
  2023-02-13  9:37         ` Olivier Matz
  1 sibling, 0 replies; 36+ messages in thread
From: Morten Brørup @ 2023-02-13  9:30 UTC (permalink / raw)
  To: Honnappa Nagarahalli, olivier.matz, andrew.rybchenko,
	Kamalakshitha Aligeri, bruce.richardson, konstantin.ananyev, dev
  Cc: nd, david.marchand, nd

> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Monday, 13 February 2023 05.30
> 
> <snip>
> 
> > > > +/**
> > > > + * @internal used by rte_mempool_cache_zc_put_bulk() and
> > > > rte_mempool_do_generic_put().
> > > > + *
> > > > + * Zero-copy put objects in a mempool cache backed by the
> specified
> > > > mempool.
> > > > + *
> > > > + * @param cache
> > > > + *   A pointer to the mempool cache.
> > > > + * @param mp
> > > > + *   A pointer to the mempool.
> > > > + * @param n
> > > > + *   The number of objects to be put in the mempool cache.
> > > > + * @return
> > > > + *   The pointer to where to put the objects in the mempool
> cache.
> > > > + *   NULL if the request itself is too big for the cache, i.e.
> > > > + *   exceeds the cache flush threshold.
> > > > + */
> > > > +static __rte_always_inline void **
> > > > +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > > +		struct rte_mempool *mp,
> > > > +		unsigned int n)
> > > > +{
> > > > +	void **cache_objs;
> > > > +
> > > > +	RTE_ASSERT(cache != NULL);
> > > > +	RTE_ASSERT(mp != NULL);
> > > > +
> > > > +	if (n <= cache->flushthresh - cache->len) {
> > > > +		/*
> > > > +		 * The objects can be added to the cache without
> crossing
> > > the
> > > > +		 * flush threshold.
> > > > +		 */
> > > > +		cache_objs = &cache->objs[cache->len];
> > > > +		cache->len += n;
> > > > +	} else if (likely(n <= cache->flushthresh)) {
> > > > +		/*
> > > > +		 * The request itself fits into the cache.
> > > > +		 * But first, the cache must be flushed to the
> backend, so
> > > > +		 * adding the objects does not cross the flush
> threshold.
> > > > +		 */
> > > > +		cache_objs = &cache->objs[0];
> > > > +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > > > >len);
> > > This is a flush of the cache. It is probably worth having a counter
> > > for this.
> >
> > We somewhat already do. The put_common_pool_bulk counter is
> > incremented in rte_mempool_ops_enqueue_bulk() [1].
> >
> > [1]: https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/mempool/rte_mempool.h#L824
> >
> > This counter doesn't exactly count the number of times the cache was
> > flushed, because it also counts bulk put transactions not going via
> the cache.
> >
> > Thinking further about it, I agree that specific counters for cache
> flush and
> > cache refill could be useful, and should be added. However, being
> this late, I
> > would prefer postponing them for a separate patch.
> Agree, can be in a separate patch, they never existed.

OK. We have agreed to postpone the counter discussion to a separate patch. So let's resume the discussion there.

Such a patch will not make it for RC1, so I have put it on my TODO list for later.

> 
> >
> > >
> > > > +		cache->len = n;
> > > > +	} else {
> > > > +		/* The request itself is too big for the cache. */
> > > This is possibly an error condition. Do we need to set rte_errno?
> >
> > I considered this when I wrote the function, and concluded that this
> function
> > only returns NULL as normal behavior, never because of failure. E.g.
> if a cache
> > can only hold 4 mbufs, and the PMD tries to store 32 mbufs, it is
> correct
> > behavior of the PMD to call this function and learn that the cache is
> too small
> > for direct access; it is not a failure in the PMD nor in the cache.
> This condition happens when there is a mismatch between the cache
> configuration and the behavior of the PMD. From this perspective I
> think this is an error. This could go unnoticed, I do not think this
> misconfiguration is reported anywhere.

No strong objection from me.

In v9, I will also set rte_errno=EINVAL here.

> 
> >
> > If it could return NULL due to a variety of reasons, I would probably
> agree that
> > we *need* to set rte_errno, so the application could determine the
> reason.
> >
> > But since it can only return NULL due to one reason (which involves
> correct
> > use of the function), I convinced myself that setting rte_errno would
> not
> > convey any additional information than the NULL return value itself,
> and be a
> > waste of performance in the fast path. If you disagree, then it
> should be set to
> > EINVAL, like when rte_mempool_cache_zc_get_bulk() is called with a
> request
> > too big for the cache.
> >
> > > Do we need a counter here to capture that?
> >
> > Good question. I don't know. It would indicate that a cache is
> smaller than
> > expected by the users trying to access the cache directly.
> >
> > And if we add such a counter, we should probably add a similar
> counter for
> > the cache get function too.
> Agree
> 
> >
> > But again, being this late, I would postpone such counters for a
> separate
> > patch. And they should trigger more discussions about required/useful
> > counters.
> Agree, should be postponed to another patch.

Agreed. Let's move the counter discussion to the counters patch, when provided.

> 
> >
> > For reference, the rte_mempool_debug_stats is cache aligned and
> currently
> > holds 12 64-bit counters, so we can add 4 more - which is exactly the
> number
> > discussed here - without changing its size. So this is not a barrier
> to adding
> > those counters.
> >
> > Furthermore, I suppose that we only want to increase the counter when
> the
> > called through the mempool cache API, not when called indirectly
> through
> > the mempool API. This would mean that the ordinary mempool functions
> > cannot call the mempool cache functions, or the wrong counters would
> > increase. So adding such counters is not completely trivial.
> >
> > >
> > > > +		return NULL;
> > > > +	}
> > > > +
> > > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > > +
> > > > +	return cache_objs;
> > > > +}
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > prior
> > > > notice.
> > > > + *
> > > > + * Zero-copy put objects in a mempool cache backed by the
> specified
> > > > mempool.
> > > > + *
> > > > + * @param cache
> > > > + *   A pointer to the mempool cache.
> > > > + * @param mp
> > > > + *   A pointer to the mempool.
> > > > + * @param n
> > > > + *   The number of objects to be put in the mempool cache.
> > > > + * @return
> > > > + *   The pointer to where to put the objects in the mempool
> cache.
> > > > + *   NULL if the request itself is too big for the cache, i.e.
> > > > + *   exceeds the cache flush threshold.
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline void **
> > > > +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > > +		struct rte_mempool *mp,
> > > > +		unsigned int n)
> > > > +{
> > > > +	RTE_ASSERT(cache != NULL);
> > > > +	RTE_ASSERT(mp != NULL);
> > > > +
> > > > +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> > > > +	return __rte_mempool_cache_zc_put_bulk(cache, mp, n); }
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > prior
> > > > notice.
> > > > + *
> > > > + * Zero-copy un-put objects in a mempool cache.
> > > > + *
> > > > + * @param cache
> > > > + *   A pointer to the mempool cache.
> > > > + * @param n
> > > > + *   The number of objects not put in the mempool cache after
> > > calling
> > > > + *   rte_mempool_cache_zc_put_bulk().
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline void
> > > > +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> > > > +		unsigned int n)
> Earlier there was a discussion on the API name.

The discussion was here:

http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D875E8@smartserver.smartshare.dk/

> IMO, we should keep the API names similar to those in ring library.
> This would provide consistency across the libraries.
> There were some concerns expressed in PMD having to call 2 APIs. I do
> not think changing to 2 APIs will have any perf impact.

There is also the difference that the ring library implements locking with these APIs, whereas the mempool cache API do not need locking. The ring's _start() function is called to enter the critical section, and the _finish() function to leave the critical section.

I am usually in favor of consistency, but I would argue this: For functions (in the mempool cache) that are lockless, it might be confusing if they have names (from the ring library) that imply locking. DPDK is already bad at documenting the thread safeness of various functions; let's not make it worse by using confusing function names.

This is a democracy, so I am open to changing the API for consistency, if the community insists. Anyone interested, please indicate which API you prefer for the zero-copy mempool cache.

A. The unique API provided in this patch:
rte_mempool_cache_zc_get_bulk(), rte_mempool_cache_zc_put_bulk(), and the optional rte_mempool_cache_zc_put_rewind(); or

B. An API with _start and _finish, like in the ring library:
rte_mempool_cache_zc_get_bulk_start and _finish(), and rte_mempool_cache_zc_put_bulk_start() and _finish().

I am in favor of A: Keep the unique names as provided in the patch.

Konstantin accepted A, so unless he says otherwise, I'll treat that as a vote for A.

> 
> Also, what is the use case for the 'rewind' API?

It is for use cases where the application/PMD is opportunistic and prepares for zero-copying a full burst, but while zero-copying realizes that there was not a full burst to zero-copy.

E.g. a PMD freeing mbufs after transmit. It hopes to "fast free" the entire burst, and prepares for zero-copying the full burst to the mempool; but while zero-copying the mbufs to the mempool, a few of the mbufs don't live up to the criteria for "fast free", so they cannot be zero-copied, and thus the PMD frees those few mbufs normally instead. Then the PMD must call rewind() to adjust the zero-copy burst size accordingly when done.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v8] mempool cache: add zero-copy get and put functions
  2023-02-13  4:29       ` Honnappa Nagarahalli
  2023-02-13  9:30         ` Morten Brørup
@ 2023-02-13  9:37         ` Olivier Matz
  2023-02-13 10:25           ` Morten Brørup
  1 sibling, 1 reply; 36+ messages in thread
From: Olivier Matz @ 2023-02-13  9:37 UTC (permalink / raw)
  To: Morten Brørup
  Cc: andrew.rybchenko, Kamalakshitha Aligeri, bruce.richardson,
	konstantin.ananyev, dev, nd, david.marchand,
	Honnappa Nagarahalli

Hello,

Thank you for this work, and sorry for the late feedback too.

On Mon, Feb 13, 2023 at 04:29:51AM +0000, Honnappa Nagarahalli wrote:
> <snip>
> 
> > > > +/**
> > > > + * @internal used by rte_mempool_cache_zc_put_bulk() and
> > > > rte_mempool_do_generic_put().
> > > > + *
> > > > + * Zero-copy put objects in a mempool cache backed by the specified
> > > > mempool.
> > > > + *
> > > > + * @param cache
> > > > + *   A pointer to the mempool cache.
> > > > + * @param mp
> > > > + *   A pointer to the mempool.
> > > > + * @param n
> > > > + *   The number of objects to be put in the mempool cache.
> > > > + * @return
> > > > + *   The pointer to where to put the objects in the mempool cache.
> > > > + *   NULL if the request itself is too big for the cache, i.e.
> > > > + *   exceeds the cache flush threshold.
> > > > + */
> > > > +static __rte_always_inline void **
> > > > +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > > +		struct rte_mempool *mp,
> > > > +		unsigned int n)
> > > > +{
> > > > +	void **cache_objs;
> > > > +
> > > > +	RTE_ASSERT(cache != NULL);
> > > > +	RTE_ASSERT(mp != NULL);
> > > > +
> > > > +	if (n <= cache->flushthresh - cache->len) {

The previous code was doing this test instead:

if (cache->len + n <= cache->flushthresh)

I know there is an invariant asserting that cache->len <= cache->threshold,
so there is no real issue, but I'll tend to say that it is a good practise
to avoid substractions on unsigned values to avoid the risk of wrapping.

I also think the previous test was a bit more readable.

> > > > +		/*
> > > > +		 * The objects can be added to the cache without crossing
> > > the
> > > > +		 * flush threshold.
> > > > +		 */
> > > > +		cache_objs = &cache->objs[cache->len];
> > > > +		cache->len += n;
> > > > +	} else if (likely(n <= cache->flushthresh)) {
> > > > +		/*
> > > > +		 * The request itself fits into the cache.
> > > > +		 * But first, the cache must be flushed to the backend, so
> > > > +		 * adding the objects does not cross the flush threshold.
> > > > +		 */
> > > > +		cache_objs = &cache->objs[0];
> > > > +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > > > >len);
> > > This is a flush of the cache. It is probably worth having a counter
> > > for this.
> > 
> > We somewhat already do. The put_common_pool_bulk counter is
> > incremented in rte_mempool_ops_enqueue_bulk() [1].
> > 
> > [1]:
> > https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/mempool/rte_mempool.
> > h#L824
> > 
> > This counter doesn't exactly count the number of times the cache was
> > flushed, because it also counts bulk put transactions not going via the cache.
> > 
> > Thinking further about it, I agree that specific counters for cache flush and
> > cache refill could be useful, and should be added. However, being this late, I
> > would prefer postponing them for a separate patch.
> Agree, can be in a separate patch, they never existed.
> 
> > 
> > >
> > > > +		cache->len = n;
> > > > +	} else {
> > > > +		/* The request itself is too big for the cache. */
> > > This is possibly an error condition. Do we need to set rte_errno?
> > 
> > I considered this when I wrote the function, and concluded that this function
> > only returns NULL as normal behavior, never because of failure. E.g. if a cache
> > can only hold 4 mbufs, and the PMD tries to store 32 mbufs, it is correct
> > behavior of the PMD to call this function and learn that the cache is too small
> > for direct access; it is not a failure in the PMD nor in the cache.
> This condition happens when there is a mismatch between the cache configuration and the behavior of the PMD. From this perspective I think this is an error. This could go unnoticed, I do not think this misconfiguration is reported anywhere.
> 
> > 
> > If it could return NULL due to a variety of reasons, I would probably agree that
> > we *need* to set rte_errno, so the application could determine the reason.
> > 
> > But since it can only return NULL due to one reason (which involves correct
> > use of the function), I convinced myself that setting rte_errno would not
> > convey any additional information than the NULL return value itself, and be a
> > waste of performance in the fast path. If you disagree, then it should be set to
> > EINVAL, like when rte_mempool_cache_zc_get_bulk() is called with a request
> > too big for the cache.
> > 
> > > Do we need a counter here to capture that?
> > 
> > Good question. I don't know. It would indicate that a cache is smaller than
> > expected by the users trying to access the cache directly.
> > 
> > And if we add such a counter, we should probably add a similar counter for
> > the cache get function too.
> Agree
> 
> > 
> > But again, being this late, I would postpone such counters for a separate
> > patch. And they should trigger more discussions about required/useful
> > counters.
> Agree, should be postponed to another patch.
> 
> > 
> > For reference, the rte_mempool_debug_stats is cache aligned and currently
> > holds 12 64-bit counters, so we can add 4 more - which is exactly the number
> > discussed here - without changing its size. So this is not a barrier to adding
> > those counters.
> > 
> > Furthermore, I suppose that we only want to increase the counter when the
> > called through the mempool cache API, not when called indirectly through
> > the mempool API. This would mean that the ordinary mempool functions
> > cannot call the mempool cache functions, or the wrong counters would
> > increase. So adding such counters is not completely trivial.
> > 
> > >
> > > > +		return NULL;
> > > > +	}
> > > > +
> > > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > > +
> > > > +	return cache_objs;
> > > > +}
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > prior
> > > > notice.
> > > > + *
> > > > + * Zero-copy put objects in a mempool cache backed by the specified
> > > > mempool.

I think we should document the differences and advantage of using this
function over the standard version, explaining which copy is avoided,
why it is faster, ...

Also, we should say that once this function is called, the user has
to copy the objects to the cache.

> > > > + *
> > > > + * @param cache
> > > > + *   A pointer to the mempool cache.
> > > > + * @param mp
> > > > + *   A pointer to the mempool.
> > > > + * @param n
> > > > + *   The number of objects to be put in the mempool cache.
> > > > + * @return
> > > > + *   The pointer to where to put the objects in the mempool cache.
> > > > + *   NULL if the request itself is too big for the cache, i.e.
> > > > + *   exceeds the cache flush threshold.
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline void **
> > > > +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > > +		struct rte_mempool *mp,
> > > > +		unsigned int n)
> > > > +{
> > > > +	RTE_ASSERT(cache != NULL);
> > > > +	RTE_ASSERT(mp != NULL);
> > > > +
> > > > +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> > > > +	return __rte_mempool_cache_zc_put_bulk(cache, mp, n); }
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > prior
> > > > notice.
> > > > + *
> > > > + * Zero-copy un-put objects in a mempool cache.
> > > > + *
> > > > + * @param cache
> > > > + *   A pointer to the mempool cache.
> > > > + * @param n
> > > > + *   The number of objects not put in the mempool cache after
> > > calling
> > > > + *   rte_mempool_cache_zc_put_bulk().
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline void
> > > > +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> > > > +		unsigned int n)
> Earlier there was a discussion on the API name.
> IMO, we should keep the API names similar to those in ring library. This would provide consistency across the libraries.
> There were some concerns expressed in PMD having to call 2 APIs. I do not think changing to 2 APIs will have any perf impact.

I'm not really convinced by the API names too. Again, sorry, I know this
comment arrives after the battle.

Your proposal is:

/* Zero-copy put objects in a mempool cache backed by the specified mempool. */
rte_mempool_cache_zc_put_bulk(cache, mp, n)

/* Zero-copy get objects from a mempool cache backed by the specified mempool. */
rte_mempool_cache_zc_get_bulk(cache, mp, n)

Here are some observations:

- This was said in the discussion previously, but the functions do not
  really get or put objects in the cache. Instead, they prepare the
  cache (filling it or flushing it if needed) and update its length so
  that the user can do the effective copy.

- The "_cache" is superfluous for me: these functions do not deal more
  with the cache than the non zero-copy version

- The order of the parameters is (cache, mp, n) while the other functions
  that take a mempool and a cache as parameters have the mp first (see
  _generic versions).

- The "_bulk" is indeed present on other functions, but not all (the generic
  version does not have it), I'm not sure it is absolutely required

What do you think about these API below?

rte_mempool_prepare_zc_put(mp, n, cache)
rte_mempool_prepare_zc_get(mp, n, cache)

> 
> Also, what is the use case for the 'rewind' API?

+1

I have the same feeling that rewind() is not required now. It can be
added later if we find a use-case.

In case we want to keep it, I think we need to better specify in the API
comments in which unique conditions the function can be called
(i.e. after a call to rte_mempool_prepare_zc_put() with the same number
of objects, given no other operations were done on the mempool in
between). A call outside of these conditions has an undefined behavior.

> 
> > > > +{
> > > > +	RTE_ASSERT(cache != NULL);
> > > > +	RTE_ASSERT(n <= cache->len);
> > > > +
> > > > +	rte_mempool_trace_cache_zc_put_rewind(cache, n);
> > > > +
> > > > +	cache->len -= n;
> > > > +
> > > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n); }
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > prior
> > > > notice.
> > > > + *
> > > > + * Zero-copy get objects from a mempool cache backed by the
> > > specified
> > > > mempool.
> > > > + *
> > > > + * @param cache
> > > > + *   A pointer to the mempool cache.
> > > > + * @param mp
> > > > + *   A pointer to the mempool.
> > > > + * @param n
> > > > + *   The number of objects to be made available for extraction from
> > > the
> > > > mempool cache.
> > > > + * @return
> > > > + *   The pointer to the objects in the mempool cache.
> > > > + *   NULL on error; i.e. the cache + the pool does not contain 'n'
> > > objects.
> > > > + *   With rte_errno set to the error code of the mempool dequeue
> > > function,
> > > > + *   or EINVAL if the request itself is too big for the cache, i.e.
> > > > + *   exceeds the cache flush threshold.
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline void *
> > > > +rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
> > > > +		struct rte_mempool *mp,
> > > > +		unsigned int n)
> > > > +{
> > > > +	unsigned int len, size;
> > > > +
> > > > +	RTE_ASSERT(cache != NULL);
> > > > +	RTE_ASSERT(mp != NULL);
> > > > +
> > > > +	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
> > > > +
> > > > +	len = cache->len;
> > > > +	size = cache->size;
> > > > +
> > > > +	if (n <= len) {
> > > > +		/* The request can be satisfied from the cache as is. */
> > > > +		len -= n;
> > > > +	} else if (likely(n <= size)) {
> > > > +		/*
> > > > +		 * The request itself can be satisfied from the cache.
> > > > +		 * But first, the cache must be filled from the backend;
> > > > +		 * fetch size + requested - len objects.
> > > > +		 */
> > > > +		int ret;
> > > > +
> > > > +		ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> > > > >objs[len], size + n - len);
> > > > +		if (unlikely(ret < 0)) {
> > > > +			/*
> > > > +			 * We are buffer constrained.
> > > > +			 * Do not fill the cache, just satisfy the request.
> > > > +			 */
> > > > +			ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> > > > >objs[len], n - len);
> > > > +			if (unlikely(ret < 0)) {
> > > > +				/* Unable to satisfy the request. */
> > > > +
> > > > +				RTE_MEMPOOL_STAT_ADD(mp,
> > > > get_fail_bulk, 1);
> > > > +				RTE_MEMPOOL_STAT_ADD(mp,
> > > > get_fail_objs, n);
> > > > +
> > > > +				rte_errno = -ret;
> > > > +				return NULL;
> > > > +			}
> > > > +
> > > > +			len = 0;
> > > > +		} else {
> > > > +			len = size;
> > > > +		}
> > > > +	} else {
> > > > +		/* The request itself is too big for the cache. */
> > > > +		rte_errno = EINVAL;
> > > > +		return NULL;
> > > > +	}
> > > > +
> > > > +	cache->len = len;
> > > > +
> > > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> > > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> > > > +
> > > > +	return &cache->objs[len];
> > > > +}
> > > > +
> > > >  /**
> > > >   * @internal Put several objects back in the mempool; used
> > > internally.
> > > >   * @param mp
> > > > @@ -1364,32 +1557,25 @@ rte_mempool_do_generic_put(struct
> > > > rte_mempool *mp, void * const *obj_table,  {
> > > >  	void **cache_objs;
> > > >
> > > > -	/* No cache provided */
> > > > -	if (unlikely(cache == NULL))
> > > > -		goto driver_enqueue;
> > > > +	/* No cache provided? */
> > > > +	if (unlikely(cache == NULL)) {
> > > > +		/* Increment stats now, adding in mempool always succeeds.
> > > > */
> > > > +		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > > > +		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > > >
> > > > -	/* increment stat now, adding in mempool always success */
> > > > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > > +		goto driver_enqueue;
> > > > +	}
> > > >
> > > > -	/* The request itself is too big for the cache */
> > > > -	if (unlikely(n > cache->flushthresh))
> > > > -		goto driver_enqueue_stats_incremented;
> > > > +	/* Prepare to add the objects to the cache. */
> > > > +	cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> > > >
> > > > -	/*
> > > > -	 * The cache follows the following algorithm:
> > > > -	 *   1. If the objects cannot be added to the cache without
> > > crossing
> > > > -	 *      the flush threshold, flush the cache to the backend.
> > > > -	 *   2. Add the objects to the cache.
> > > > -	 */
> > > > +	/* The request itself is too big for the cache? */
> > > > +	if (unlikely(cache_objs == NULL)) {
> > > > +		/* Increment stats now, adding in mempool always succeeds.
> > > > */
> > > > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > >
> > > > -	if (cache->len + n <= cache->flushthresh) {
> > > > -		cache_objs = &cache->objs[cache->len];
> > > > -		cache->len += n;
> > > > -	} else {
> > > > -		cache_objs = &cache->objs[0];
> > > > -		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > > > >len);
> > > > -		cache->len = n;
> > > > +		goto driver_enqueue;
> > > >  	}
> > > >
> > > >  	/* Add the objects to the cache. */ @@ -1399,13 +1585,7 @@
> > > > rte_mempool_do_generic_put(struct rte_mempool *mp, void * const
> > > > *obj_table,
> > > >
> > > >  driver_enqueue:
> > > >
> > > > -	/* increment stat now, adding in mempool always success */
> > > > -	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > > > -	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > > > -
> > > > -driver_enqueue_stats_incremented:
> > > > -
> > > > -	/* push objects to the backend */
> > > > +	/* Push the objects to the backend. */
> > > >  	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);  }
> > > >
> > > > diff --git a/lib/mempool/rte_mempool_trace_fp.h
> > > > b/lib/mempool/rte_mempool_trace_fp.h
> > > > index ed060e887c..14666457f7 100644
> > > > --- a/lib/mempool/rte_mempool_trace_fp.h
> > > > +++ b/lib/mempool/rte_mempool_trace_fp.h
> > > > @@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
> > > >  	rte_trace_point_emit_ptr(mempool);
> > > >  )
> > > >
> > > > +RTE_TRACE_POINT_FP(
> > > > +	rte_mempool_trace_cache_zc_put_bulk,
> > > > +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> > > > nb_objs),
> > > > +	rte_trace_point_emit_ptr(cache);
> > > > +	rte_trace_point_emit_ptr(mempool);
> > > > +	rte_trace_point_emit_u32(nb_objs);
> > > > +)
> > > > +
> > > > +RTE_TRACE_POINT_FP(
> > > > +	rte_mempool_trace_cache_zc_put_rewind,
> > > > +	RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
> > > > +	rte_trace_point_emit_ptr(cache);
> > > > +	rte_trace_point_emit_u32(nb_objs);
> > > > +)
> > > > +
> > > > +RTE_TRACE_POINT_FP(
> > > > +	rte_mempool_trace_cache_zc_get_bulk,
> > > > +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> > > > nb_objs),
> > > > +	rte_trace_point_emit_ptr(cache);
> > > > +	rte_trace_point_emit_ptr(mempool);
> > > > +	rte_trace_point_emit_u32(nb_objs);
> > > > +)
> > > > +
> > > >  #ifdef __cplusplus
> > > >  }
> > > >  #endif
> > > > diff --git a/lib/mempool/version.map b/lib/mempool/version.map index
> > > > b67d7aace7..1383ae6db2 100644
> > > > --- a/lib/mempool/version.map
> > > > +++ b/lib/mempool/version.map
> > > > @@ -63,6 +63,11 @@ EXPERIMENTAL {
> > > >  	__rte_mempool_trace_ops_alloc;
> > > >  	__rte_mempool_trace_ops_free;
> > > >  	__rte_mempool_trace_set_ops_byname;
> > > > +
> > > > +	# added in 23.03
> > > > +	__rte_mempool_trace_cache_zc_put_bulk;
> > > > +	__rte_mempool_trace_cache_zc_put_rewind;
> > > > +	__rte_mempool_trace_cache_zc_get_bulk;
> > > >  };
> > > >
> > > >  INTERNAL {
> > > > --
> > > > 2.17.1
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v8] mempool cache: add zero-copy get and put functions
  2023-02-13  9:37         ` Olivier Matz
@ 2023-02-13 10:25           ` Morten Brørup
  2023-02-14 14:16             ` Andrew Rybchenko
  0 siblings, 1 reply; 36+ messages in thread
From: Morten Brørup @ 2023-02-13 10:25 UTC (permalink / raw)
  To: Olivier Matz
  Cc: andrew.rybchenko, Kamalakshitha Aligeri, bruce.richardson,
	konstantin.ananyev, dev, nd, david.marchand,
	Honnappa Nagarahalli

> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Monday, 13 February 2023 10.37
> 
> Hello,
> 
> Thank you for this work, and sorry for the late feedback too.

Better late than never. And it's a core library, so important to get it right!

> 
> On Mon, Feb 13, 2023 at 04:29:51AM +0000, Honnappa Nagarahalli wrote:
> > <snip>
> >
> > > > > +/**
> > > > > + * @internal used by rte_mempool_cache_zc_put_bulk() and
> > > > > rte_mempool_do_generic_put().
> > > > > + *
> > > > > + * Zero-copy put objects in a mempool cache backed by the
> specified
> > > > > mempool.
> > > > > + *
> > > > > + * @param cache
> > > > > + *   A pointer to the mempool cache.
> > > > > + * @param mp
> > > > > + *   A pointer to the mempool.
> > > > > + * @param n
> > > > > + *   The number of objects to be put in the mempool cache.
> > > > > + * @return
> > > > > + *   The pointer to where to put the objects in the mempool
> cache.
> > > > > + *   NULL if the request itself is too big for the cache, i.e.
> > > > > + *   exceeds the cache flush threshold.
> > > > > + */
> > > > > +static __rte_always_inline void **
> > > > > +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache
> *cache,
> > > > > +		struct rte_mempool *mp,
> > > > > +		unsigned int n)
> > > > > +{
> > > > > +	void **cache_objs;
> > > > > +
> > > > > +	RTE_ASSERT(cache != NULL);
> > > > > +	RTE_ASSERT(mp != NULL);
> > > > > +
> > > > > +	if (n <= cache->flushthresh - cache->len) {
> 
> The previous code was doing this test instead:
> 
> if (cache->len + n <= cache->flushthresh)
> 
> I know there is an invariant asserting that cache->len <= cache-
> >threshold,
> so there is no real issue, but I'll tend to say that it is a good
> practise
> to avoid substractions on unsigned values to avoid the risk of
> wrapping.
> 
> I also think the previous test was a bit more readable.

I agree with you, but I didn't object to Andrew's recommendation of changing it to this, so I did.

I will change it back. Konstantin, I hope you don't mind. :-)

[...]

> > > > > +/**
> > > > > + * @warning
> > > > > + * @b EXPERIMENTAL: This API may change, or be removed,
> without
> > > > prior
> > > > > notice.
> > > > > + *
> > > > > + * Zero-copy put objects in a mempool cache backed by the
> specified
> > > > > mempool.
> 
> I think we should document the differences and advantage of using this
> function over the standard version, explaining which copy is avoided,
> why it is faster, ...
> 
> Also, we should say that once this function is called, the user has
> to copy the objects to the cache.
> 

I agree, the function descriptions could be more verbose.

If we want to get this feature into DPDK now, we can postpone the descriptions improvements to a later patch.

[...]

> > Earlier there was a discussion on the API name.
> > IMO, we should keep the API names similar to those in ring library.
> This would provide consistency across the libraries.
> > There were some concerns expressed in PMD having to call 2 APIs. I do
> not think changing to 2 APIs will have any perf impact.
> 
> I'm not really convinced by the API names too. Again, sorry, I know
> this
> comment arrives after the battle.
> 
> Your proposal is:
> 
> /* Zero-copy put objects in a mempool cache backed by the specified
> mempool. */
> rte_mempool_cache_zc_put_bulk(cache, mp, n)
> 
> /* Zero-copy get objects from a mempool cache backed by the specified
> mempool. */
> rte_mempool_cache_zc_get_bulk(cache, mp, n)
> 
> Here are some observations:
> 
> - This was said in the discussion previously, but the functions do not
>   really get or put objects in the cache. Instead, they prepare the
>   cache (filling it or flushing it if needed) and update its length so
>   that the user can do the effective copy.

Can be fixed by improving function descriptions.

> 
> - The "_cache" is superfluous for me: these functions do not deal more
>   with the cache than the non zero-copy version

I have been thinking of these as "mempool cache" APIs.

I don't mind getting rid of "_cache" in their names, if we agree that they are "mempool" functions, instead of "mempool cache" functions.

> 
> - The order of the parameters is (cache, mp, n) while the other
> functions
>   that take a mempool and a cache as parameters have the mp first (see
>   _generic versions).

The order of the parameters was due to considering these as "mempool cache" functions, so I followed the convention for an existing "mempool cache" function:

rte_mempool_cache_flush(struct rte_mempool_cache *cache,
		struct rte_mempool *mp);

If we instead consider them as simple "mempool" functions, I agree with you about the parameter ordering.

So, what does the community think... Are these "mempool cache" functions, or just "mempool" functions?

> 
> - The "_bulk" is indeed present on other functions, but not all (the
> generic
>   version does not have it), I'm not sure it is absolutely required

The mempool library offers both single-object and bulk functions, so the function names must include "_bulk".

> 
> What do you think about these API below?
> 
> rte_mempool_prepare_zc_put(mp, n, cache)
> rte_mempool_prepare_zc_get(mp, n, cache)

I initially used "prepare" in the names, but since we don't have accompanying "commit" functions, I decided against "prepare" to avoid confusion. (Any SQL developer will probably agree with me on this.)

> 
> >
> > Also, what is the use case for the 'rewind' API?
> 
> +1
> 
> I have the same feeling that rewind() is not required now. It can be
> added later if we find a use-case.
> 
> In case we want to keep it, I think we need to better specify in the
> API
> comments in which unique conditions the function can be called
> (i.e. after a call to rte_mempool_prepare_zc_put() with the same number
> of objects, given no other operations were done on the mempool in
> between). A call outside of these conditions has an undefined behavior.

Please refer to my answer to Honnappa on this topic.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v9] mempool cache: add zero-copy get and put functions
  2022-11-05 13:19 [RFC]: mempool: zero-copy cache get bulk Morten Brørup
                   ` (8 preceding siblings ...)
  2023-02-09 14:58 ` [PATCH v8] " Morten Brørup
@ 2023-02-13 12:24 ` Morten Brørup
  2023-02-13 14:33   ` Kamalakshitha Aligeri
  9 siblings, 1 reply; 36+ messages in thread
From: Morten Brørup @ 2023-02-13 12:24 UTC (permalink / raw)
  To: olivier.matz, andrew.rybchenko, honnappa.nagarahalli,
	kamalakshitha.aligeri, bruce.richardson, konstantin.ananyev, dev
  Cc: nd, david.marchand, Morten Brørup

Zero-copy access to mempool caches is beneficial for PMD performance, and
must be provided by the mempool library to fix [Bug 1052] without a
performance regression.

[Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052

Bugzilla ID: 1052

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>

v9:
* Also set rte_errno in zero-copy put function, if returning NULL.
  (Honnappa)
* Revert v3 comparison to prevent overflow if n is really huge and len is
  non-zero. (Olivier)
v8:
* Actually include the rte_errno header file.
  Note to self: The changes only take effect on the disk after the file in
  the text editor has been saved.
v7:
* Fix typo in function description. (checkpatch)
* Zero-copy functions may set rte_errno; include rte_errno header file.
  (ci/loongarch-compilation)
v6:
* Improve description of the 'n' parameter to the zero-copy get function.
  (Konstantin, Bruce)
* The caches used for zero-copy may not be user-owned, so remove this word
  from the function descriptions. (Kamalakshitha)
v5:
* Bugfix: Compare zero-copy get request to the cache size instead of the
  flush threshold; otherwise refill could overflow the memory allocated
  for the cache. (Andrew)
* Split the zero-copy put function into an internal function doing the
  work, and a public function with trace.
* Avoid code duplication by rewriting rte_mempool_do_generic_put() to use
  the internal zero-copy put function. (Andrew)
* Corrected the return type of rte_mempool_cache_zc_put_bulk() from void *
  to void **; it returns a pointer to an array of objects.
* Fix coding style: Add missing curly brackets. (Andrew)
v4:
* Fix checkpatch warnings.
v3:
* Bugfix: Respect the cache size; compare to the flush threshold instead
  of RTE_MEMPOOL_CACHE_MAX_SIZE.
* Added 'rewind' function for incomplete 'put' operations. (Konstantin)
* Replace RTE_ASSERTs with runtime checks of the request size.
  Instead of failing, return NULL if the request is too big. (Konstantin)
* Modified comparison to prevent overflow if n is really huge and len is
  non-zero. (Andrew)
* Updated the comments in the code.
v2:
* Fix checkpatch warnings.
* Fix missing registration of trace points.
* The functions are inline, so they don't go into the map file.
v1 changes from the RFC:
* Removed run-time parameter checks. (Honnappa)
  This is a hot fast path function; requiring correct application
  behaviour, i.e. function parameters must be valid.
* Added RTE_ASSERT for parameters instead.
  Code for this is only generated if built with RTE_ENABLE_ASSERT.
* Removed fallback when 'cache' parameter is not set. (Honnappa)
* Chose the simple get function; i.e. do not move the existing objects in
  the cache to the top of the new stack, just leave them at the bottom.
* Renamed the functions. Other suggestions are welcome, of course. ;-)
* Updated the function descriptions.
* Added the functions to trace_fp and version.map.
---
 lib/mempool/mempool_trace_points.c |   9 ++
 lib/mempool/rte_mempool.h          | 239 +++++++++++++++++++++++++----
 lib/mempool/rte_mempool_trace_fp.h |  23 +++
 lib/mempool/version.map            |   5 +
 4 files changed, 247 insertions(+), 29 deletions(-)

diff --git a/lib/mempool/mempool_trace_points.c b/lib/mempool/mempool_trace_points.c
index 4ad76deb34..83d353a764 100644
--- a/lib/mempool/mempool_trace_points.c
+++ b/lib/mempool/mempool_trace_points.c
@@ -77,3 +77,12 @@ RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free,
 
 RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname,
 	lib.mempool.set.ops.byname)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk,
+	lib.mempool.cache.zc.put.bulk)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_rewind,
+	lib.mempool.cache.zc.put.rewind)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk,
+	lib.mempool.cache.zc.get.bulk)
diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 9f530db24b..94f895c329 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -42,6 +42,7 @@
 #include <rte_config.h>
 #include <rte_spinlock.h>
 #include <rte_debug.h>
+#include <rte_errno.h>
 #include <rte_lcore.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
@@ -1346,6 +1347,199 @@ rte_mempool_cache_flush(struct rte_mempool_cache *cache,
 	cache->len = 0;
 }
 
+
+/**
+ * @internal used by rte_mempool_cache_zc_put_bulk() and rte_mempool_do_generic_put().
+ *
+ * Zero-copy put objects in a mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to be put in the mempool cache.
+ * @return
+ *   The pointer to where to put the objects in the mempool cache.
+ *   NULL, with rte_errno set to EINVAL, if the request itself is too big
+ *   for the cache, i.e. exceeds the cache flush threshold.
+ */
+static __rte_always_inline void **
+__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	void **cache_objs;
+
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	if (cache->len + n <= cache->flushthresh) {
+		/*
+		 * The objects can be added to the cache without crossing the
+		 * flush threshold.
+		 */
+		cache_objs = &cache->objs[cache->len];
+		cache->len += n;
+	} else if (likely(n <= cache->flushthresh)) {
+		/*
+		 * The request itself fits into the cache.
+		 * But first, the cache must be flushed to the backend, so
+		 * adding the objects does not cross the flush threshold.
+		 */
+		cache_objs = &cache->objs[0];
+		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
+		cache->len = n;
+	} else {
+		/* The request itself is too big for the cache. */
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
+
+	return cache_objs;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy put objects in a mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to be put in the mempool cache.
+ * @return
+ *   The pointer to where to put the objects in the mempool cache.
+ *   NULL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+__rte_experimental
+static __rte_always_inline void **
+rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
+	return __rte_mempool_cache_zc_put_bulk(cache, mp, n);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy un-put objects in a mempool cache.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param n
+ *   The number of objects not put in the mempool cache after calling
+ *   rte_mempool_cache_zc_put_bulk().
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
+		unsigned int n)
+{
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(n <= cache->len);
+
+	rte_mempool_trace_cache_zc_put_rewind(cache, n);
+
+	cache->len -= n;
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy get objects from a mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to be made available for extraction from the mempool cache.
+ * @return
+ *   The pointer to the objects in the mempool cache.
+ *   NULL on error; i.e. the cache + the pool does not contain 'n' objects.
+ *   With rte_errno set to the error code of the mempool dequeue function,
+ *   or EINVAL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+__rte_experimental
+static __rte_always_inline void *
+rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	unsigned int len, size;
+
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
+
+	len = cache->len;
+	size = cache->size;
+
+	if (n <= len) {
+		/* The request can be satisfied from the cache as is. */
+		len -= n;
+	} else if (likely(n <= size)) {
+		/*
+		 * The request itself can be satisfied from the cache.
+		 * But first, the cache must be filled from the backend;
+		 * fetch size + requested - len objects.
+		 */
+		int ret;
+
+		ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], size + n - len);
+		if (unlikely(ret < 0)) {
+			/*
+			 * We are buffer constrained.
+			 * Do not fill the cache, just satisfy the request.
+			 */
+			ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], n - len);
+			if (unlikely(ret < 0)) {
+				/* Unable to satisfy the request. */
+
+				RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
+				RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
+
+				rte_errno = -ret;
+				return NULL;
+			}
+
+			len = 0;
+		} else {
+			len = size;
+		}
+	} else {
+		/* The request itself is too big for the cache. */
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	cache->len = len;
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
+	return &cache->objs[len];
+}
+
 /**
  * @internal Put several objects back in the mempool; used internally.
  * @param mp
@@ -1364,32 +1558,25 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 {
 	void **cache_objs;
 
-	/* No cache provided */
-	if (unlikely(cache == NULL))
-		goto driver_enqueue;
+	/* No cache provided? */
+	if (unlikely(cache == NULL)) {
+		/* Increment stats now, adding in mempool always succeeds. */
+		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
+		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
 
-	/* increment stat now, adding in mempool always success */
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
+		goto driver_enqueue;
+	}
 
-	/* The request itself is too big for the cache */
-	if (unlikely(n > cache->flushthresh))
-		goto driver_enqueue_stats_incremented;
+	/* Prepare to add the objects to the cache. */
+	cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
 
-	/*
-	 * The cache follows the following algorithm:
-	 *   1. If the objects cannot be added to the cache without crossing
-	 *      the flush threshold, flush the cache to the backend.
-	 *   2. Add the objects to the cache.
-	 */
+	/* The request itself is too big for the cache? */
+	if (unlikely(cache_objs == NULL)) {
+		/* Increment stats now, adding in mempool always succeeds. */
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
 
-	if (cache->len + n <= cache->flushthresh) {
-		cache_objs = &cache->objs[cache->len];
-		cache->len += n;
-	} else {
-		cache_objs = &cache->objs[0];
-		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
-		cache->len = n;
+		goto driver_enqueue;
 	}
 
 	/* Add the objects to the cache. */
@@ -1399,13 +1586,7 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 
 driver_enqueue:
 
-	/* increment stat now, adding in mempool always success */
-	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
-	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
-
-driver_enqueue_stats_incremented:
-
-	/* push objects to the backend */
+	/* Push the objects to the backend. */
 	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
 }
 
diff --git a/lib/mempool/rte_mempool_trace_fp.h b/lib/mempool/rte_mempool_trace_fp.h
index ed060e887c..14666457f7 100644
--- a/lib/mempool/rte_mempool_trace_fp.h
+++ b/lib/mempool/rte_mempool_trace_fp.h
@@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
 	rte_trace_point_emit_ptr(mempool);
 )
 
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_put_bulk,
+	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_ptr(mempool);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_put_rewind,
+	RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_get_bulk,
+	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_ptr(mempool);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/mempool/version.map b/lib/mempool/version.map
index b67d7aace7..1383ae6db2 100644
--- a/lib/mempool/version.map
+++ b/lib/mempool/version.map
@@ -63,6 +63,11 @@ EXPERIMENTAL {
 	__rte_mempool_trace_ops_alloc;
 	__rte_mempool_trace_ops_free;
 	__rte_mempool_trace_set_ops_byname;
+
+	# added in 23.03
+	__rte_mempool_trace_cache_zc_put_bulk;
+	__rte_mempool_trace_cache_zc_put_rewind;
+	__rte_mempool_trace_cache_zc_get_bulk;
 };
 
 INTERNAL {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v9] mempool cache: add zero-copy get and put functions
  2023-02-13 12:24 ` [PATCH v9] " Morten Brørup
@ 2023-02-13 14:33   ` Kamalakshitha Aligeri
  0 siblings, 0 replies; 36+ messages in thread
From: Kamalakshitha Aligeri @ 2023-02-13 14:33 UTC (permalink / raw)
  To: Morten Brørup, olivier.matz, andrew.rybchenko,
	Honnappa Nagarahalli, bruce.richardson, konstantin.ananyev, dev
  Cc: nd, david.marchand, nd

Patch looks good to me

Acked-by: Kamalakshitha Aligeri <Kamalakshitha.aligeri@arm.com>

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Monday, February 13, 2023 4:25 AM
> To: olivier.matz@6wind.com; andrew.rybchenko@oktetlabs.ru; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Kamalakshitha Aligeri
> <Kamalakshitha.Aligeri@arm.com>; bruce.richardson@intel.com;
> konstantin.ananyev@huawei.com; dev@dpdk.org
> Cc: nd <nd@arm.com>; david.marchand@redhat.com; Morten Brørup
> <mb@smartsharesystems.com>
> Subject: [PATCH v9] mempool cache: add zero-copy get and put functions
> 
> Zero-copy access to mempool caches is beneficial for PMD performance, and
> must be provided by the mempool library to fix [Bug 1052] without a
> performance regression.
> 
> [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> 
> Bugzilla ID: 1052
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> Acked-by: Chengwen Feng <fengchengwen@huawei.com>
> 
> v9:
> * Also set rte_errno in zero-copy put function, if returning NULL.
>   (Honnappa)
> * Revert v3 comparison to prevent overflow if n is really huge and len is
>   non-zero. (Olivier)
> v8:
> * Actually include the rte_errno header file.
>   Note to self: The changes only take effect on the disk after the file in
>   the text editor has been saved.
> v7:
> * Fix typo in function description. (checkpatch)
> * Zero-copy functions may set rte_errno; include rte_errno header file.
>   (ci/loongarch-compilation)
> v6:
> * Improve description of the 'n' parameter to the zero-copy get function.
>   (Konstantin, Bruce)
> * The caches used for zero-copy may not be user-owned, so remove this
> word
>   from the function descriptions. (Kamalakshitha)
> v5:
> * Bugfix: Compare zero-copy get request to the cache size instead of the
>   flush threshold; otherwise refill could overflow the memory allocated
>   for the cache. (Andrew)
> * Split the zero-copy put function into an internal function doing the
>   work, and a public function with trace.
> * Avoid code duplication by rewriting rte_mempool_do_generic_put() to use
>   the internal zero-copy put function. (Andrew)
> * Corrected the return type of rte_mempool_cache_zc_put_bulk() from
> void *
>   to void **; it returns a pointer to an array of objects.
> * Fix coding style: Add missing curly brackets. (Andrew)
> v4:
> * Fix checkpatch warnings.
> v3:
> * Bugfix: Respect the cache size; compare to the flush threshold instead
>   of RTE_MEMPOOL_CACHE_MAX_SIZE.
> * Added 'rewind' function for incomplete 'put' operations. (Konstantin)
> * Replace RTE_ASSERTs with runtime checks of the request size.
>   Instead of failing, return NULL if the request is too big. (Konstantin)
> * Modified comparison to prevent overflow if n is really huge and len is
>   non-zero. (Andrew)
> * Updated the comments in the code.
> v2:
> * Fix checkpatch warnings.
> * Fix missing registration of trace points.
> * The functions are inline, so they don't go into the map file.
> v1 changes from the RFC:
> * Removed run-time parameter checks. (Honnappa)
>   This is a hot fast path function; requiring correct application
>   behaviour, i.e. function parameters must be valid.
> * Added RTE_ASSERT for parameters instead.
>   Code for this is only generated if built with RTE_ENABLE_ASSERT.
> * Removed fallback when 'cache' parameter is not set. (Honnappa)
> * Chose the simple get function; i.e. do not move the existing objects in
>   the cache to the top of the new stack, just leave them at the bottom.
> * Renamed the functions. Other suggestions are welcome, of course. ;-)
> * Updated the function descriptions.
> * Added the functions to trace_fp and version.map.
> ---
>  lib/mempool/mempool_trace_points.c |   9 ++
>  lib/mempool/rte_mempool.h          | 239 +++++++++++++++++++++++++----
>  lib/mempool/rte_mempool_trace_fp.h |  23 +++
>  lib/mempool/version.map            |   5 +
>  4 files changed, 247 insertions(+), 29 deletions(-)
> 
> diff --git a/lib/mempool/mempool_trace_points.c
> b/lib/mempool/mempool_trace_points.c
> index 4ad76deb34..83d353a764 100644
> --- a/lib/mempool/mempool_trace_points.c
> +++ b/lib/mempool/mempool_trace_points.c
> @@ -77,3 +77,12 @@
> RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free,
> 
>  RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname,
>  	lib.mempool.set.ops.byname)
> +
> +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk,
> +	lib.mempool.cache.zc.put.bulk)
> +
> +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_rewind,
> +	lib.mempool.cache.zc.put.rewind)
> +
> +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk,
> +	lib.mempool.cache.zc.get.bulk)
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 9f530db24b..94f895c329 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -42,6 +42,7 @@
>  #include <rte_config.h>
>  #include <rte_spinlock.h>
>  #include <rte_debug.h>
> +#include <rte_errno.h>
>  #include <rte_lcore.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_ring.h>
> @@ -1346,6 +1347,199 @@ rte_mempool_cache_flush(struct
> rte_mempool_cache *cache,
>  	cache->len = 0;
>  }
> 
> +
> +/**
> + * @internal used by rte_mempool_cache_zc_put_bulk() and
> rte_mempool_do_generic_put().
> + *
> + * Zero-copy put objects in a mempool cache backed by the specified
> mempool.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param mp
> + *   A pointer to the mempool.
> + * @param n
> + *   The number of objects to be put in the mempool cache.
> + * @return
> + *   The pointer to where to put the objects in the mempool cache.
> + *   NULL, with rte_errno set to EINVAL, if the request itself is too big
> + *   for the cache, i.e. exceeds the cache flush threshold.
> + */
> +static __rte_always_inline void **
> +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> +		struct rte_mempool *mp,
> +		unsigned int n)
> +{
> +	void **cache_objs;
> +
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(mp != NULL);
> +
> +	if (cache->len + n <= cache->flushthresh) {
> +		/*
> +		 * The objects can be added to the cache without crossing
> the
> +		 * flush threshold.
> +		 */
> +		cache_objs = &cache->objs[cache->len];
> +		cache->len += n;
> +	} else if (likely(n <= cache->flushthresh)) {
> +		/*
> +		 * The request itself fits into the cache.
> +		 * But first, the cache must be flushed to the backend, so
> +		 * adding the objects does not cross the flush threshold.
> +		 */
> +		cache_objs = &cache->objs[0];
> +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> >len);
> +		cache->len = n;
> +	} else {
> +		/* The request itself is too big for the cache. */
> +		rte_errno = EINVAL;
> +		return NULL;
> +	}
> +
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> +
> +	return cache_objs;
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: This API may change, or be removed, without prior
> notice.
> + *
> + * Zero-copy put objects in a mempool cache backed by the specified
> mempool.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param mp
> + *   A pointer to the mempool.
> + * @param n
> + *   The number of objects to be put in the mempool cache.
> + * @return
> + *   The pointer to where to put the objects in the mempool cache.
> + *   NULL if the request itself is too big for the cache, i.e.
> + *   exceeds the cache flush threshold.
> + */
> +__rte_experimental
> +static __rte_always_inline void **
> +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> +		struct rte_mempool *mp,
> +		unsigned int n)
> +{
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(mp != NULL);
> +
> +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> +	return __rte_mempool_cache_zc_put_bulk(cache, mp, n); }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: This API may change, or be removed, without prior
> notice.
> + *
> + * Zero-copy un-put objects in a mempool cache.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param n
> + *   The number of objects not put in the mempool cache after calling
> + *   rte_mempool_cache_zc_put_bulk().
> + */
> +__rte_experimental
> +static __rte_always_inline void
> +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> +		unsigned int n)
> +{
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(n <= cache->len);
> +
> +	rte_mempool_trace_cache_zc_put_rewind(cache, n);
> +
> +	cache->len -= n;
> +
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n); }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: This API may change, or be removed, without prior
> notice.
> + *
> + * Zero-copy get objects from a mempool cache backed by the specified
> mempool.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param mp
> + *   A pointer to the mempool.
> + * @param n
> + *   The number of objects to be made available for extraction from the
> mempool cache.
> + * @return
> + *   The pointer to the objects in the mempool cache.
> + *   NULL on error; i.e. the cache + the pool does not contain 'n' objects.
> + *   With rte_errno set to the error code of the mempool dequeue function,
> + *   or EINVAL if the request itself is too big for the cache, i.e.
> + *   exceeds the cache flush threshold.
> + */
> +__rte_experimental
> +static __rte_always_inline void *
> +rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
> +		struct rte_mempool *mp,
> +		unsigned int n)
> +{
> +	unsigned int len, size;
> +
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(mp != NULL);
> +
> +	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
> +
> +	len = cache->len;
> +	size = cache->size;
> +
> +	if (n <= len) {
> +		/* The request can be satisfied from the cache as is. */
> +		len -= n;
> +	} else if (likely(n <= size)) {
> +		/*
> +		 * The request itself can be satisfied from the cache.
> +		 * But first, the cache must be filled from the backend;
> +		 * fetch size + requested - len objects.
> +		 */
> +		int ret;
> +
> +		ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> >objs[len], size + n - len);
> +		if (unlikely(ret < 0)) {
> +			/*
> +			 * We are buffer constrained.
> +			 * Do not fill the cache, just satisfy the request.
> +			 */
> +			ret = rte_mempool_ops_dequeue_bulk(mp,
> &cache->objs[len], n - len);
> +			if (unlikely(ret < 0)) {
> +				/* Unable to satisfy the request. */
> +
> +				RTE_MEMPOOL_STAT_ADD(mp,
> get_fail_bulk, 1);
> +				RTE_MEMPOOL_STAT_ADD(mp,
> get_fail_objs, n);
> +
> +				rte_errno = -ret;
> +				return NULL;
> +			}
> +
> +			len = 0;
> +		} else {
> +			len = size;
> +		}
> +	} else {
> +		/* The request itself is too big for the cache. */
> +		rte_errno = EINVAL;
> +		return NULL;
> +	}
> +
> +	cache->len = len;
> +
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> +
> +	return &cache->objs[len];
> +}
> +
>  /**
>   * @internal Put several objects back in the mempool; used internally.
>   * @param mp
> @@ -1364,32 +1558,25 @@ rte_mempool_do_generic_put(struct
> rte_mempool *mp, void * const *obj_table,  {
>  	void **cache_objs;
> 
> -	/* No cache provided */
> -	if (unlikely(cache == NULL))
> -		goto driver_enqueue;
> +	/* No cache provided? */
> +	if (unlikely(cache == NULL)) {
> +		/* Increment stats now, adding in mempool always succeeds.
> */
> +		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> +		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> 
> -	/* increment stat now, adding in mempool always success */
> -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> +		goto driver_enqueue;
> +	}
> 
> -	/* The request itself is too big for the cache */
> -	if (unlikely(n > cache->flushthresh))
> -		goto driver_enqueue_stats_incremented;
> +	/* Prepare to add the objects to the cache. */
> +	cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> 
> -	/*
> -	 * The cache follows the following algorithm:
> -	 *   1. If the objects cannot be added to the cache without crossing
> -	 *      the flush threshold, flush the cache to the backend.
> -	 *   2. Add the objects to the cache.
> -	 */
> +	/* The request itself is too big for the cache? */
> +	if (unlikely(cache_objs == NULL)) {
> +		/* Increment stats now, adding in mempool always succeeds.
> */
> +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> 
> -	if (cache->len + n <= cache->flushthresh) {
> -		cache_objs = &cache->objs[cache->len];
> -		cache->len += n;
> -	} else {
> -		cache_objs = &cache->objs[0];
> -		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> >len);
> -		cache->len = n;
> +		goto driver_enqueue;
>  	}
> 
>  	/* Add the objects to the cache. */
> @@ -1399,13 +1586,7 @@ rte_mempool_do_generic_put(struct
> rte_mempool *mp, void * const *obj_table,
> 
>  driver_enqueue:
> 
> -	/* increment stat now, adding in mempool always success */
> -	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> -	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> -
> -driver_enqueue_stats_incremented:
> -
> -	/* push objects to the backend */
> +	/* Push the objects to the backend. */
>  	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);  }
> 
> diff --git a/lib/mempool/rte_mempool_trace_fp.h
> b/lib/mempool/rte_mempool_trace_fp.h
> index ed060e887c..14666457f7 100644
> --- a/lib/mempool/rte_mempool_trace_fp.h
> +++ b/lib/mempool/rte_mempool_trace_fp.h
> @@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
>  	rte_trace_point_emit_ptr(mempool);
>  )
> 
> +RTE_TRACE_POINT_FP(
> +	rte_mempool_trace_cache_zc_put_bulk,
> +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> nb_objs),
> +	rte_trace_point_emit_ptr(cache);
> +	rte_trace_point_emit_ptr(mempool);
> +	rte_trace_point_emit_u32(nb_objs);
> +)
> +
> +RTE_TRACE_POINT_FP(
> +	rte_mempool_trace_cache_zc_put_rewind,
> +	RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
> +	rte_trace_point_emit_ptr(cache);
> +	rte_trace_point_emit_u32(nb_objs);
> +)
> +
> +RTE_TRACE_POINT_FP(
> +	rte_mempool_trace_cache_zc_get_bulk,
> +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> nb_objs),
> +	rte_trace_point_emit_ptr(cache);
> +	rte_trace_point_emit_ptr(mempool);
> +	rte_trace_point_emit_u32(nb_objs);
> +)
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/mempool/version.map b/lib/mempool/version.map index
> b67d7aace7..1383ae6db2 100644
> --- a/lib/mempool/version.map
> +++ b/lib/mempool/version.map
> @@ -63,6 +63,11 @@ EXPERIMENTAL {
>  	__rte_mempool_trace_ops_alloc;
>  	__rte_mempool_trace_ops_free;
>  	__rte_mempool_trace_set_ops_byname;
> +
> +	# added in 23.03
> +	__rte_mempool_trace_cache_zc_put_bulk;
> +	__rte_mempool_trace_cache_zc_put_rewind;
> +	__rte_mempool_trace_cache_zc_get_bulk;
>  };
> 
>  INTERNAL {
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v8] mempool cache: add zero-copy get and put functions
  2023-02-13 10:25           ` Morten Brørup
@ 2023-02-14 14:16             ` Andrew Rybchenko
  0 siblings, 0 replies; 36+ messages in thread
From: Andrew Rybchenko @ 2023-02-14 14:16 UTC (permalink / raw)
  To: Morten Brørup, Olivier Matz
  Cc: Kamalakshitha Aligeri, bruce.richardson, konstantin.ananyev, dev,
	nd, david.marchand, Honnappa Nagarahalli

On 2/13/23 13:25, Morten Brørup wrote:
>> From: Olivier Matz [mailto:olivier.matz@6wind.com]
>> Sent: Monday, 13 February 2023 10.37
>>
>> Hello,
>>
>> Thank you for this work, and sorry for the late feedback too.
> 
> Better late than never. And it's a core library, so important to get it right!
> 
>>
>> On Mon, Feb 13, 2023 at 04:29:51AM +0000, Honnappa Nagarahalli wrote:
>>> <snip>
>>>
>>>>>> +/**
>>>>>> + * @internal used by rte_mempool_cache_zc_put_bulk() and
>>>>>> rte_mempool_do_generic_put().
>>>>>> + *
>>>>>> + * Zero-copy put objects in a mempool cache backed by the
>> specified
>>>>>> mempool.
>>>>>> + *
>>>>>> + * @param cache
>>>>>> + *   A pointer to the mempool cache.
>>>>>> + * @param mp
>>>>>> + *   A pointer to the mempool.
>>>>>> + * @param n
>>>>>> + *   The number of objects to be put in the mempool cache.
>>>>>> + * @return
>>>>>> + *   The pointer to where to put the objects in the mempool
>> cache.
>>>>>> + *   NULL if the request itself is too big for the cache, i.e.
>>>>>> + *   exceeds the cache flush threshold.
>>>>>> + */
>>>>>> +static __rte_always_inline void **
>>>>>> +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache
>> *cache,
>>>>>> +		struct rte_mempool *mp,
>>>>>> +		unsigned int n)
>>>>>> +{
>>>>>> +	void **cache_objs;
>>>>>> +
>>>>>> +	RTE_ASSERT(cache != NULL);
>>>>>> +	RTE_ASSERT(mp != NULL);
>>>>>> +
>>>>>> +	if (n <= cache->flushthresh - cache->len) {
>>
>> The previous code was doing this test instead:
>>
>> if (cache->len + n <= cache->flushthresh)
>>
>> I know there is an invariant asserting that cache->len <= cache-
>>> threshold,
>> so there is no real issue, but I'll tend to say that it is a good
>> practise
>> to avoid substractions on unsigned values to avoid the risk of
>> wrapping.
>>
>> I also think the previous test was a bit more readable.
> 
> I agree with you, but I didn't object to Andrew's recommendation of changing it to this, so I did.
> 
> I will change it back. Konstantin, I hope you don't mind. :-)

I've suggested to use minus here to ensure that we handle
extremely big 'n' value here correctly (which would result in
addition overflow).

> 
> [...]
> 
>>>>>> +/**
>>>>>> + * @warning
>>>>>> + * @b EXPERIMENTAL: This API may change, or be removed,
>> without
>>>>> prior
>>>>>> notice.
>>>>>> + *
>>>>>> + * Zero-copy put objects in a mempool cache backed by the
>> specified
>>>>>> mempool.
>>
>> I think we should document the differences and advantage of using this
>> function over the standard version, explaining which copy is avoided,
>> why it is faster, ...
>>
>> Also, we should say that once this function is called, the user has
>> to copy the objects to the cache.
>>
> 
> I agree, the function descriptions could be more verbose.
> 
> If we want to get this feature into DPDK now, we can postpone the descriptions improvements to a later patch.

No strong opinion, but I'd wait for description improvements.
It is very important to have good description from the very
beginning.
I'll try to find time this week to help, but can't promise.
May be it is already late...

> 
> [...]
> 
>>> Earlier there was a discussion on the API name.
>>> IMO, we should keep the API names similar to those in ring library.
>> This would provide consistency across the libraries.
>>> There were some concerns expressed in PMD having to call 2 APIs. I do
>> not think changing to 2 APIs will have any perf impact.
>>
>> I'm not really convinced by the API names too. Again, sorry, I know
>> this
>> comment arrives after the battle.
>>
>> Your proposal is:
>>
>> /* Zero-copy put objects in a mempool cache backed by the specified
>> mempool. */
>> rte_mempool_cache_zc_put_bulk(cache, mp, n)
>>
>> /* Zero-copy get objects from a mempool cache backed by the specified
>> mempool. */
>> rte_mempool_cache_zc_get_bulk(cache, mp, n)
>>
>> Here are some observations:
>>
>> - This was said in the discussion previously, but the functions do not
>>    really get or put objects in the cache. Instead, they prepare the
>>    cache (filling it or flushing it if needed) and update its length so
>>    that the user can do the effective copy.
> 
> Can be fixed by improving function descriptions.
> 
>>
>> - The "_cache" is superfluous for me: these functions do not deal more
>>    with the cache than the non zero-copy version
> 
> I have been thinking of these as "mempool cache" APIs.
> 
> I don't mind getting rid of "_cache" in their names, if we agree that they are "mempool" functions, instead of "mempool cache" functions.
> 
>>
>> - The order of the parameters is (cache, mp, n) while the other
>> functions
>>    that take a mempool and a cache as parameters have the mp first (see
>>    _generic versions).
> 
> The order of the parameters was due to considering these as "mempool cache" functions, so I followed the convention for an existing "mempool cache" function:
> 
> rte_mempool_cache_flush(struct rte_mempool_cache *cache,
> 		struct rte_mempool *mp);
> 
> If we instead consider them as simple "mempool" functions, I agree with you about the parameter ordering.
> 
> So, what does the community think... Are these "mempool cache" functions, or just "mempool" functions?

Since 'cache' is mandatory here (it cannot be NULL), I agree
that it is 'mempool cache' API, not 'mempool API'.

> 
>>
>> - The "_bulk" is indeed present on other functions, but not all (the
>> generic
>>    version does not have it), I'm not sure it is absolutely required
> 
> The mempool library offers both single-object and bulk functions, so the function names must include "_bulk".

I have no strong opinion here. Yes, "bulk" is nice for
consistency, but IMHO not strictly required since it
makes function name longer and there is no value
in single-object version of these functions.

> 
>>
>> What do you think about these API below?
>>
>> rte_mempool_prepare_zc_put(mp, n, cache)
>> rte_mempool_prepare_zc_get(mp, n, cache)
> 
> I initially used "prepare" in the names, but since we don't have accompanying "commit" functions, I decided against "prepare" to avoid confusion. (Any SQL developer will probably agree with me on this.)

prepare -> reserve?

However, in the case of get we really do get. When function
returns corresponding objects are fully got from mempool.
Yes, in the case of put we need to copy object pointers
into provided space. However, we don't need to call any
mempool (cache) API to commit it. Plus API naming symmetry.
So, I'd *not* add prepare/reserve to function names.

> 
>>
>>>
>>> Also, what is the use case for the 'rewind' API?
>>
>> +1
>>
>> I have the same feeling that rewind() is not required now. It can be
>> added later if we find a use-case.
>>
>> In case we want to keep it, I think we need to better specify in the
>> API
>> comments in which unique conditions the function can be called
>> (i.e. after a call to rte_mempool_prepare_zc_put() with the same number
>> of objects, given no other operations were done on the mempool in
>> between). A call outside of these conditions has an undefined behavior.
> 
> Please refer to my answer to Honnappa on this topic.
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2023-02-14 14:16 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-05 13:19 [RFC]: mempool: zero-copy cache get bulk Morten Brørup
2022-11-07  9:19 ` Bruce Richardson
2022-11-07 14:32   ` Morten Brørup
2022-11-15 16:18 ` [PATCH] mempool cache: add zero-copy get and put functions Morten Brørup
2022-11-16 18:04 ` [PATCH v2] " Morten Brørup
2022-11-29 20:54   ` Kamalakshitha Aligeri
2022-11-30 10:21     ` Morten Brørup
2022-12-22 15:57   ` Konstantin Ananyev
2022-12-22 17:55     ` Morten Brørup
2022-12-23 16:58       ` Konstantin Ananyev
2022-12-24 12:17         ` Morten Brørup
2022-12-24 11:49 ` [PATCH v3] " Morten Brørup
2022-12-24 11:55 ` [PATCH v4] " Morten Brørup
2022-12-27  9:24   ` Andrew Rybchenko
2022-12-27 10:31     ` Morten Brørup
2022-12-27 15:17 ` [PATCH v5] " Morten Brørup
2023-01-22 20:34   ` Konstantin Ananyev
2023-01-22 21:17     ` Morten Brørup
2023-01-23 11:53       ` Konstantin Ananyev
2023-01-23 12:23         ` Morten Brørup
2023-01-23 12:52           ` Konstantin Ananyev
2023-01-23 14:30           ` Bruce Richardson
2023-01-24  1:53             ` Kamalakshitha Aligeri
2023-02-09 14:39 ` [PATCH v6] " Morten Brørup
2023-02-09 14:52 ` [PATCH v7] " Morten Brørup
2023-02-09 14:58 ` [PATCH v8] " Morten Brørup
2023-02-10  8:35   ` fengchengwen
2023-02-12 19:56   ` Honnappa Nagarahalli
2023-02-12 23:15     ` Morten Brørup
2023-02-13  4:29       ` Honnappa Nagarahalli
2023-02-13  9:30         ` Morten Brørup
2023-02-13  9:37         ` Olivier Matz
2023-02-13 10:25           ` Morten Brørup
2023-02-14 14:16             ` Andrew Rybchenko
2023-02-13 12:24 ` [PATCH v9] " Morten Brørup
2023-02-13 14:33   ` Kamalakshitha Aligeri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).