RE: [RFC]: mempool: zero-copy cache get bulk

DPDK patches and discussions
 help / color / mirror / Atom feed

* RE: [RFC]: mempool: zero-copy cache get bulk
@ 2022-11-06  7:12 Morten Brørup
  2022-11-13 18:31 ` Honnappa Nagarahalli
  0 siblings, 1 reply; 5+ messages in thread
From: Morten Brørup @ 2022-11-06  7:12 UTC (permalink / raw)
  To: dev, olivier.matz, andrew.rybchenko, honnappa.nagarahalli
  Cc: Kamalakshitha Aligeri, nd

> From: Morten Brørup
> Sent: Saturday, 5 November 2022 14.19
> 
> Zero-copy access to the mempool cache is beneficial for PMD
> performance, and must be provided by the mempool library to fix [Bug
> 1052] without a performance regression.
> 
> [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> 
> 
> This RFC offers two conceptual variants of zero-copy get:
> 1. A simple version.
> 2. A version where existing (hot) objects in the cache are moved to the
> top of the cache before new objects from the backend driver are pulled
> in.
> 
> I would like some early feedback. Also, which variant do you prefer?
> 
> Notes:
> * Allowing the 'cache' parameter to be NULL, and getting it from the
> mempool instead, was inspired by rte_mempool_cache_flush().

"instead" -> "in this case"

> * Asserting that the 'mp' parameter is not NULL is not done by other
> functions, so I omitted it here too.
> 
> NB: Please ignore formatting. Also, this code has not even been compile
> tested.

And I just spotted an error: the rte_memcpy() length field must be multiplied by sizeof(void*).

> 
> 1. Simple version:
> 
> /**
>  * Get objects from a mempool via zero-copy access to a user-owned
> mempool cache.
>  *
>  * @param cache
>  *   A pointer to the mempool cache.
>  * @param mp
>  *   A pointer to the mempool.
>  * @param n
>  *   The number of objects to prefetch into the mempool cache.
>  * @return
>  *   The pointer to the objects in the mempool cache.
>  *   NULL on error
>  *   with rte_errno set appropriately.
>  */
> static __rte_always_inline void *
> rte_mempool_cache_get_bulk(struct rte_mempool_cache *cache,
>         struct rte_mempool *mp,
>         unsigned int n)
> {
>     unsigned int len;
> 
>     if (cache == NULL)
>         cache = rte_mempool_default_cache(mp, rte_lcore_id());
>     if (cache == NULL) {
>         rte_errno = EINVAL;
>         goto fail;
>     }
> 
>     rte_mempool_trace_cache_get_bulk(cache, mp, n);
> 
>     len = cache->len;
> 
>     if (unlikely(n > len)) {
>         unsigned int size;
> 
>         if (unlikely(n > RTE_MEMPOOL_CACHE_MAX_SIZE)) {
>             rte_errno = EINVAL;
>             goto fail;
>         }
> 
>         /* Fill the cache from the backend; fetch size + requested -
> len objects. */
>         size = cache->size;
> 
>         ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], size
> + n - len);
>         if (unlikely(ret < 0)) {
>             /*
>              * We are buffer constrained.
>              * Do not fill the cache, just satisfy the request.
>              */
>             ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], n
> - len);
>             if (unlikely(ret < 0)) {
>                 rte_errno = -ret;
>                 goto fail;
>             }
> 
>             len = 0;
>         } else
>             len = size;
>     } else
>         len -= n;
> 
>     cache->len = len;
> 
>     RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
>     RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> 
>     return &cache->objs[len];
> 
> fail:
> 
>     RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
>     RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
> 
>     return NULL;
> }
> 
> 
> 2. Advanced version:
> 
> /**
>  * Get objects from a mempool via zero-copy access to a user-owned
> mempool cache.
>  *
>  * @param cache
>  *   A pointer to the mempool cache.
>  * @param mp
>  *   A pointer to the mempool.
>  * @param n
>  *   The number of objects to prefetch into the mempool cache.
>  * @return
>  *   The pointer to the objects in the mempool cache.
>  *   NULL on error
>  *   with rte_errno set appropriately.
>  */
> static __rte_always_inline void *
> rte_mempool_cache_get_bulk(struct rte_mempool_cache *cache,
>         struct rte_mempool *mp,
>         unsigned int n)
> {
>     unsigned int len;
> 
>     if (cache == NULL)
>         cache = rte_mempool_default_cache(mp, rte_lcore_id());
>     if (cache == NULL) {
>         rte_errno = EINVAL;
>         goto fail;
>     }
> 
>     rte_mempool_trace_cache_get_bulk(cache, mp, n);
> 
>     len = cache->len;
> 
>     if (unlikely(n > len)) {
>         unsigned int size;
> 
>         if (unlikely(n > RTE_MEMPOOL_CACHE_MAX_SIZE)) {
>             rte_errno = EINVAL;
>             goto fail;
>         }
> 
>         /* Fill the cache from the backend; fetch size + requested -
> len objects. */
>         size = cache->size;
> 
>         if (likely(size + n >= 2 * len)) {
>             /*
>              * No overlap when copying (dst >= len): size + n - len >=
> len.
>              * Move (i.e. copy) the existing objects in the cache to
> the
>              * coming top of the cache, to make room for new objects
> below.
>              */
>             rte_memcpy(&cache->objs[size + n - len], &cache->objs[0],
> len);

Length is bytes, not number of objects, so that should be:

rte_memcpy(&cache->objs[size + n - len], &cache->objs[0], len * sizeof(void*));

> 
>             /* Fill the cache below the existing objects in the cache.
> */
>             ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[0],
> size + n - len);
>             if (unlikely(ret < 0)) {
>                 goto constrained;
>             } else
>                 len = size;
>         } else {
>             /* Fill the cache on top of any objects in it. */
>             ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len],
> size + n - len);
>             if (unlikely(ret < 0)) {
> 
> constrained:
>                 /*
>                  * We are buffer constrained.
>                  * Do not fill the cache, just satisfy the request.
>                  */
>                 ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> >objs[len], n - len);
>                 if (unlikely(ret < 0)) {
>                     rte_errno = -ret;
>                     goto fail;
>                 }
> 
>                 len = 0;
>             } else
>                 len = size;
>         }
>     } else
>         len -= n;
> 
>     cache->len = len;
> 
>     RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
>     RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> 
>     return &cache->objs[len];
> 
> fail:
> 
>     RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
>     RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
> 
>     return NULL;
> }
> 
> 
> Med venlig hilsen / Kind regards,
> -Morten Brørup


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [RFC]: mempool: zero-copy cache get bulk
  2022-11-06  7:12 [RFC]: mempool: zero-copy cache get bulk Morten Brørup
@ 2022-11-13 18:31 ` Honnappa Nagarahalli
  0 siblings, 0 replies; 5+ messages in thread
From: Honnappa Nagarahalli @ 2022-11-13 18:31 UTC (permalink / raw)
  To: Morten Brørup, dev, olivier.matz, andrew.rybchenko
  Cc: Kamalakshitha Aligeri, nd, nd

<snip>

> 
> > From: Morten Brørup
> > Sent: Saturday, 5 November 2022 14.19
> >
> > Zero-copy access to the mempool cache is beneficial for PMD
> > performance, and must be provided by the mempool library to fix [Bug
> > 1052] without a performance regression.
> >
> > [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> >
> >
> > This RFC offers two conceptual variants of zero-copy get:
> > 1. A simple version.
Few comments inline for this.

> > 2. A version where existing (hot) objects in the cache are moved to
> > the top of the cache before new objects from the backend driver are
> > pulled in.
I think there is no compelling use case for this. We could come up with theoretical use cases, but may be we should avoid complicating the code till someone hits the problem.

In the run to completion model, if this situation occurs, the configuration of the cache can be changed to avoid this.

In the pipeline model, we will not hit this condition as the refill threshold can be a multiple of cache size.

> >
> > I would like some early feedback. Also, which variant do you prefer?
> >
> > Notes:
> > * Allowing the 'cache' parameter to be NULL, and getting it from the
> > mempool instead, was inspired by rte_mempool_cache_flush().
> 
> "instead" -> "in this case"
> 
> > * Asserting that the 'mp' parameter is not NULL is not done by other
> > functions, so I omitted it here too.
> >
> > NB: Please ignore formatting. Also, this code has not even been
> > compile tested.
> 
> And I just spotted an error: the rte_memcpy() length field must be multiplied
> by sizeof(void*).
> 
> >
> > 1. Simple version:
> >
> > /**
> >  * Get objects from a mempool via zero-copy access to a user-owned
> > mempool cache.
> >  *
> >  * @param cache
> >  *   A pointer to the mempool cache.
> >  * @param mp
> >  *   A pointer to the mempool.
> >  * @param n
> >  *   The number of objects to prefetch into the mempool cache.
> >  * @return
> >  *   The pointer to the objects in the mempool cache.
> >  *   NULL on error
> >  *   with rte_errno set appropriately.
> >  */
> > static __rte_always_inline void *
> > rte_mempool_cache_get_bulk(struct rte_mempool_cache *cache,
> >         struct rte_mempool *mp,
> >         unsigned int n)
> > {
> >     unsigned int len;
> >
> >     if (cache == NULL)
> >         cache = rte_mempool_default_cache(mp, rte_lcore_id());
I do not think we should support this.

> >     if (cache == NULL) {
> >         rte_errno = EINVAL;
> >         goto fail;
> >     }
This could be under debug flags. This is a data plane function, we should avoid additional checks.

> >
> >     rte_mempool_trace_cache_get_bulk(cache, mp, n);
> >
> >     len = cache->len;
> >
> >     if (unlikely(n > len)) {
> >         unsigned int size;
> >
> >         if (unlikely(n > RTE_MEMPOOL_CACHE_MAX_SIZE)) {
> >             rte_errno = EINVAL;
> >             goto fail;
> >         }
Same here, this being a data plane function, we should avoid these checks. This checks for something that is fundamentally incorrect in the application.

> >
> >         /* Fill the cache from the backend; fetch size + requested -
> > len objects. */
> >         size = cache->size;
> >
> >         ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], size
> > + n - len);
> >         if (unlikely(ret < 0)) {
> >             /*
> >              * We are buffer constrained.
> >              * Do not fill the cache, just satisfy the request.
> >              */
> >             ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len],
> > n
> > - len);
> >             if (unlikely(ret < 0)) {
> >                 rte_errno = -ret;
> >                 goto fail;
> >             }
> >
> >             len = 0;
> >         } else
> >             len = size;
> >     } else
> >         len -= n;
> >
> >     cache->len = len;
> >
> >     RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> >     RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> >
> >     return &cache->objs[len];
> >
> > fail:
> >
> >     RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
> >     RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
> >
> >     return NULL;
> > }
> >
> >
> > 2. Advanced version:
> >
> > /**
> >  * Get objects from a mempool via zero-copy access to a user-owned
> > mempool cache.
> >  *
> >  * @param cache
> >  *   A pointer to the mempool cache.
> >  * @param mp
> >  *   A pointer to the mempool.
> >  * @param n
> >  *   The number of objects to prefetch into the mempool cache.
> >  * @return
> >  *   The pointer to the objects in the mempool cache.
> >  *   NULL on error
> >  *   with rte_errno set appropriately.
> >  */
> > static __rte_always_inline void *
> > rte_mempool_cache_get_bulk(struct rte_mempool_cache *cache,
> >         struct rte_mempool *mp,
> >         unsigned int n)
> > {
> >     unsigned int len;
> >
> >     if (cache == NULL)
> >         cache = rte_mempool_default_cache(mp, rte_lcore_id());
> >     if (cache == NULL) {
> >         rte_errno = EINVAL;
> >         goto fail;
> >     }
> >
> >     rte_mempool_trace_cache_get_bulk(cache, mp, n);
> >
> >     len = cache->len;
> >
> >     if (unlikely(n > len)) {
> >         unsigned int size;
> >
> >         if (unlikely(n > RTE_MEMPOOL_CACHE_MAX_SIZE)) {
> >             rte_errno = EINVAL;
> >             goto fail;
> >         }
> >
> >         /* Fill the cache from the backend; fetch size + requested -
> > len objects. */
> >         size = cache->size;
> >
> >         if (likely(size + n >= 2 * len)) {
> >             /*
> >              * No overlap when copying (dst >= len): size + n - len >=
> > len.
> >              * Move (i.e. copy) the existing objects in the cache to
> > the
> >              * coming top of the cache, to make room for new objects
> > below.
> >              */
> >             rte_memcpy(&cache->objs[size + n - len], &cache->objs[0],
> > len);
> 
> Length is bytes, not number of objects, so that should be:
> 
> rte_memcpy(&cache->objs[size + n - len], &cache->objs[0], len *
> sizeof(void*));
> 
> >
> >             /* Fill the cache below the existing objects in the cache.
> > */
> >             ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[0],
> > size + n - len);
> >             if (unlikely(ret < 0)) {
> >                 goto constrained;
> >             } else
> >                 len = size;
> >         } else {
> >             /* Fill the cache on top of any objects in it. */
> >             ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len],
> > size + n - len);
> >             if (unlikely(ret < 0)) {
> >
> > constrained:
> >                 /*
> >                  * We are buffer constrained.
> >                  * Do not fill the cache, just satisfy the request.
> >                  */
> >                 ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> > >objs[len], n - len);
> >                 if (unlikely(ret < 0)) {
> >                     rte_errno = -ret;
> >                     goto fail;
> >                 }
> >
> >                 len = 0;
> >             } else
> >                 len = size;
> >         }
> >     } else
> >         len -= n;
> >
> >     cache->len = len;
> >
> >     RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> >     RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> >
> >     return &cache->objs[len];
> >
> > fail:
> >
> >     RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
> >     RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
> >
> >     return NULL;
> > }
> >
> >
> > Med venlig hilsen / Kind regards,
> > -Morten Brørup


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [RFC]: mempool: zero-copy cache get bulk
  2022-11-07  9:19 ` Bruce Richardson
@ 2022-11-07 14:32   ` Morten Brørup
  0 siblings, 0 replies; 5+ messages in thread
From: Morten Brørup @ 2022-11-07 14:32 UTC (permalink / raw)
  To: Bruce Richardson, Kamalakshitha Aligeri
  Cc: dev, olivier.matz, andrew.rybchenko, honnappa.nagarahalli, nd

+ Akshitha, apparently working on similar patches

> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Monday, 7 November 2022 10.19
> 
> On Sat, Nov 05, 2022 at 02:19:13PM +0100, Morten Brørup wrote:
> > Zero-copy access to the mempool cache is beneficial for PMD
> performance, and must be provided by the mempool library to fix [Bug
> 1052] without a performance regression.
> >
> > [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> >
> >
> > This RFC offers two conceptual variants of zero-copy get:
> > 1. A simple version.
> > 2. A version where existing (hot) objects in the cache are moved to
> the top of the cache before new objects from the backend driver are
> pulled in.
> >
> > I would like some early feedback. Also, which variant do you prefer?
> >
> > Notes:
> > * Allowing the 'cache' parameter to be NULL, and getting it from the
> mempool instead, was inspired by rte_mempool_cache_flush().
> > * Asserting that the 'mp' parameter is not NULL is not done by other
> functions, so I omitted it here too.
> >
> > NB: Please ignore formatting. Also, this code has not even been
> compile tested.
> >
> >
> > PS: No promises, but I expect to offer an RFC for zero-copy put too.
> :-)
> >
> 
> Thanks for this work, I think it's good to have. The existing functions
> could probably be reworked to use this new code too, right, since the
> copy
> at the end would be all that is needed to complete the implementation?

Only for the likely case, where the request can be fulfilled entirely from the cache.

Not for the corner case, where only some of the objects are in the cache, so the cache needs to be refilled from the backing store.

E.g. requesting 32 objects, and 8 objects are in the cache. (Those 8 object are assumed to be hot, as opposed to the cold objects pulled in from the backing store, and were given preferential treatment with commit [a2833ecc5ea4adcbc3b77e7aeac2a6fd945da6a0].)

[a2833ecc5ea4adcbc3b77e7aeac2a6fd945da6a0]: http://git.dpdk.org/dpdk/commit/lib/mempool/rte_mempool.h?id=a2833ecc5ea4adcbc3b77e7aeac2a6fd945da6a0

The existing function copies the 8 existing objects directly to the final destination, then refills the cache from the backing store, and then copies the remaining 24 objects directly to the final destination.

The "2. variant" in this RFC handles this corner case by moving the 8 objects in the cache to the new top of the cache, and then refilling the cache from the backing store. And it can only move those 8 objects around in the cache if there is room for them. (The 32 returned objects are, ordered from top to bottom of the stack: 8 hot and 24 new.)

On other words: If we replaced the existing function with this function plus copying at the end, the corner case will perform additional copying (moving the objects around in the stack), whereas the existing function only copies each object once.

While I usually agree 100 % about avoiding code duplication, I think the difference in behavior between the existing and the new functions warrants two separate implementations.

Please also note: The cache is a stack, so when accessing the cache directly, objects should be retrieved in reverse order. (This should be mentioned in the function description!) The existing function reverses the order of the objects when returning them, so the application can use them in normal order.

> 
> Only real comment I have on this version is that I am not sure about
> the
> naming. I think having "cache_get_bulk" doesn't really make it very
> clear
> what the function does, that is removes items from the cache without
> copying them elsewhere. How about:
> 
> - rte_mempool_cache_pop?
> - rte_mempool_cache_reserve?
> 
> I would tend to prefer the former, since the latter implies that there
> needs to be a follow-up call to clear the reservation. On the other
> hand,
> reserve does give the correct impression that the elements are still
> there
> in the mempool cache.
> 
> Others may have better suggestions, since, as we know, naming things is
> hard! :)

- rte_mempool_cache_prefetch_bulk?
- rte_mempool_cache_get_bulk_promise?

When I came up with the function name rte_mempool_cache_put_bulk_promise for the sister function [1], I thought along the same lines as you, Bruce. It is important that the function name doesn't imply that there is a follow-up function to indicate that the transaction has been completed. (Before working on that, I assumed that a "prepare" and "commit" pair of functions were required, but the function turned out to be much simpler than anticipated.)

[1]: http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D87489@smartserver.smartshare.dk/#t

The mempool library offers single-object functions, so _bulk should be part of the function name, to indicate that the function operates on more than one object.

> 
> Overall, though, I think this is very good to have.
> /Bruce

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC]: mempool: zero-copy cache get bulk
  2022-11-05 13:19 Morten Brørup
@ 2022-11-07  9:19 ` Bruce Richardson
  2022-11-07 14:32   ` Morten Brørup
  0 siblings, 1 reply; 5+ messages in thread
From: Bruce Richardson @ 2022-11-07  9:19 UTC (permalink / raw)
  To: Morten Brørup
  Cc: dev, olivier.matz, andrew.rybchenko, honnappa.nagarahalli

On Sat, Nov 05, 2022 at 02:19:13PM +0100, Morten Brørup wrote:
> Zero-copy access to the mempool cache is beneficial for PMD performance, and must be provided by the mempool library to fix [Bug 1052] without a performance regression.
> 
> [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> 
> 
> This RFC offers two conceptual variants of zero-copy get:
> 1. A simple version.
> 2. A version where existing (hot) objects in the cache are moved to the top of the cache before new objects from the backend driver are pulled in.
> 
> I would like some early feedback. Also, which variant do you prefer?
> 
> Notes:
> * Allowing the 'cache' parameter to be NULL, and getting it from the mempool instead, was inspired by rte_mempool_cache_flush().
> * Asserting that the 'mp' parameter is not NULL is not done by other functions, so I omitted it here too.
> 
> NB: Please ignore formatting. Also, this code has not even been compile tested.
> 
> 
> PS: No promises, but I expect to offer an RFC for zero-copy put too. :-)
> 

Thanks for this work, I think it's good to have. The existing functions
could probably be reworked to use this new code too, right, since the copy
at the end would be all that is needed to complete the implementation?

Only real comment I have on this version is that I am not sure about the
naming. I think having "cache_get_bulk" doesn't really make it very clear
what the function does, that is removes items from the cache without
copying them elsewhere. How about:

- rte_mempool_cache_pop?
- rte_mempool_cache_reserve?

I would tend to prefer the former, since the latter implies that there
needs to be a follow-up call to clear the reservation. On the other hand,
reserve does give the correct impression that the elements are still there
in the mempool cache.

Others may have better suggestions, since, as we know, naming things is
hard! :)

Overall, though, I think this is very good to have.
/Bruce

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC]: mempool: zero-copy cache get bulk
@ 2022-11-05 13:19 Morten Brørup
  2022-11-07  9:19 ` Bruce Richardson
  0 siblings, 1 reply; 5+ messages in thread
From: Morten Brørup @ 2022-11-05 13:19 UTC (permalink / raw)
  To: dev, olivier.matz, andrew.rybchenko, honnappa.nagarahalli

Zero-copy access to the mempool cache is beneficial for PMD performance, and must be provided by the mempool library to fix [Bug 1052] without a performance regression.

[Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052


This RFC offers two conceptual variants of zero-copy get:
1. A simple version.
2. A version where existing (hot) objects in the cache are moved to the top of the cache before new objects from the backend driver are pulled in.

I would like some early feedback. Also, which variant do you prefer?

Notes:
* Allowing the 'cache' parameter to be NULL, and getting it from the mempool instead, was inspired by rte_mempool_cache_flush().
* Asserting that the 'mp' parameter is not NULL is not done by other functions, so I omitted it here too.

NB: Please ignore formatting. Also, this code has not even been compile tested.


PS: No promises, but I expect to offer an RFC for zero-copy put too. :-)


1. Simple version:

/**
 * Get objects from a mempool via zero-copy access to a user-owned mempool cache.
 *
 * @param cache
 *   A pointer to the mempool cache.
 * @param mp
 *   A pointer to the mempool.
 * @param n
 *   The number of objects to prefetch into the mempool cache.
 * @return
 *   The pointer to the objects in the mempool cache.
 *   NULL on error
 *   with rte_errno set appropriately.
 */
static __rte_always_inline void *
rte_mempool_cache_get_bulk(struct rte_mempool_cache *cache,
        struct rte_mempool *mp,
        unsigned int n)
{
    unsigned int len;

    if (cache == NULL)
        cache = rte_mempool_default_cache(mp, rte_lcore_id());
    if (cache == NULL) {
        rte_errno = EINVAL;
        goto fail;
    }

    rte_mempool_trace_cache_get_bulk(cache, mp, n);

    len = cache->len;

    if (unlikely(n > len)) {
        unsigned int size;

        if (unlikely(n > RTE_MEMPOOL_CACHE_MAX_SIZE)) {
            rte_errno = EINVAL;
            goto fail;
        }

        /* Fill the cache from the backend; fetch size + requested - len objects. */
        size = cache->size;

        ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], size + n - len);
        if (unlikely(ret < 0)) {
            /*
             * We are buffer constrained.
             * Do not fill the cache, just satisfy the request.
             */
            ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], n - len);
            if (unlikely(ret < 0)) {
                rte_errno = -ret;
                goto fail;
            }

            len = 0;
        } else
            len = size;
    } else
        len -= n;

    cache->len = len;

    RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
    RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);

    return &cache->objs[len];

fail:

    RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
    RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);

    return NULL;
}


2. Advanced version:

/**
 * Get objects from a mempool via zero-copy access to a user-owned mempool cache.
 *
 * @param cache
 *   A pointer to the mempool cache.
 * @param mp
 *   A pointer to the mempool.
 * @param n
 *   The number of objects to prefetch into the mempool cache.
 * @return
 *   The pointer to the objects in the mempool cache.
 *   NULL on error
 *   with rte_errno set appropriately.
 */
static __rte_always_inline void *
rte_mempool_cache_get_bulk(struct rte_mempool_cache *cache,
        struct rte_mempool *mp,
        unsigned int n)
{
    unsigned int len;

    if (cache == NULL)
        cache = rte_mempool_default_cache(mp, rte_lcore_id());
    if (cache == NULL) {
        rte_errno = EINVAL;
        goto fail;
    }

    rte_mempool_trace_cache_get_bulk(cache, mp, n);

    len = cache->len;

    if (unlikely(n > len)) {
        unsigned int size;

        if (unlikely(n > RTE_MEMPOOL_CACHE_MAX_SIZE)) {
            rte_errno = EINVAL;
            goto fail;
        }

        /* Fill the cache from the backend; fetch size + requested - len objects. */
        size = cache->size;

        if (likely(size + n >= 2 * len)) {
            /*
             * No overlap when copying (dst >= len): size + n - len >= len.
             * Move (i.e. copy) the existing objects in the cache to the
             * coming top of the cache, to make room for new objects below.
             */
            rte_memcpy(&cache->objs[size + n - len], &cache->objs[0], len);

            /* Fill the cache below the existing objects in the cache. */
            ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[0], size + n - len);
            if (unlikely(ret < 0)) {
                goto constrained;
            } else
                len = size;
        } else {
            /* Fill the cache on top of any objects in it. */
            ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], size + n - len);
            if (unlikely(ret < 0)) {

constrained:
                /*
                 * We are buffer constrained.
                 * Do not fill the cache, just satisfy the request.
                 */
                ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], n - len);
                if (unlikely(ret < 0)) {
                    rte_errno = -ret;
                    goto fail;
                }

                len = 0;
            } else
                len = size;
        }
    } else
        len -= n;

    cache->len = len;

    RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
    RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);

    return &cache->objs[len];

fail:

    RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
    RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);

    return NULL;
}


Med venlig hilsen / Kind regards,
-Morten Brørup


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-11-13 18:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-06  7:12 [RFC]: mempool: zero-copy cache get bulk Morten Brørup
2022-11-13 18:31 ` Honnappa Nagarahalli
  -- strict thread matches above, loose matches on Subject: below --
2022-11-05 13:19 Morten Brørup
2022-11-07  9:19 ` Bruce Richardson
2022-11-07 14:32   ` Morten Brørup

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).