From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 992C0A0543;
	Sun,  6 Nov 2022 08:12:31 +0100 (CET)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 36E0940691;
	Sun,  6 Nov 2022 08:12:31 +0100 (CET)
Received: from smartserver.smartsharesystems.com
 (smartserver.smartsharesystems.com [77.243.40.215])
 by mails.dpdk.org (Postfix) with ESMTP id 55AF34003C
 for <dev@dpdk.org>; Sun,  6 Nov 2022 08:12:29 +0100 (CET)
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Subject: RE: [RFC]: mempool: zero-copy cache get bulk
Date: Sun, 6 Nov 2022 08:12:25 +0100
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D8748B@smartserver.smartshare.dk>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: [RFC]: mempool: zero-copy cache get bulk
Thread-Index: AdjxGTKWl0t76eJiQb2E+CV6pXxvaQAlLH4g
From: =?iso-8859-1?Q?Morten_Br=F8rup?= <mb@smartsharesystems.com>
To: <dev@dpdk.org>, <olivier.matz@6wind.com>, <andrew.rybchenko@oktetlabs.ru>,
 <honnappa.nagarahalli@arm.com>
Cc: "Kamalakshitha Aligeri" <Kamalakshitha.Aligeri@arm.com>, "nd" <nd@arm.com>
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

> From: Morten Br=F8rup
> Sent: Saturday, 5 November 2022 14.19
>=20
> Zero-copy access to the mempool cache is beneficial for PMD
> performance, and must be provided by the mempool library to fix [Bug
> 1052] without a performance regression.
>=20
> [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=3D1052
>=20
>=20
> This RFC offers two conceptual variants of zero-copy get:
> 1. A simple version.
> 2. A version where existing (hot) objects in the cache are moved to =
the
> top of the cache before new objects from the backend driver are pulled
> in.
>=20
> I would like some early feedback. Also, which variant do you prefer?
>=20
> Notes:
> * Allowing the 'cache' parameter to be NULL, and getting it from the
> mempool instead, was inspired by rte_mempool_cache_flush().

"instead" -> "in this case"

> * Asserting that the 'mp' parameter is not NULL is not done by other
> functions, so I omitted it here too.
>=20
> NB: Please ignore formatting. Also, this code has not even been =
compile
> tested.

And I just spotted an error: the rte_memcpy() length field must be =
multiplied by sizeof(void*).

>=20
> 1. Simple version:
>=20
> /**
>  * Get objects from a mempool via zero-copy access to a user-owned
> mempool cache.
>  *
>  * @param cache
>  *   A pointer to the mempool cache.
>  * @param mp
>  *   A pointer to the mempool.
>  * @param n
>  *   The number of objects to prefetch into the mempool cache.
>  * @return
>  *   The pointer to the objects in the mempool cache.
>  *   NULL on error
>  *   with rte_errno set appropriately.
>  */
> static __rte_always_inline void *
> rte_mempool_cache_get_bulk(struct rte_mempool_cache *cache,
>         struct rte_mempool *mp,
>         unsigned int n)
> {
>     unsigned int len;
>=20
>     if (cache =3D=3D NULL)
>         cache =3D rte_mempool_default_cache(mp, rte_lcore_id());
>     if (cache =3D=3D NULL) {
>         rte_errno =3D EINVAL;
>         goto fail;
>     }
>=20
>     rte_mempool_trace_cache_get_bulk(cache, mp, n);
>=20
>     len =3D cache->len;
>=20
>     if (unlikely(n > len)) {
>         unsigned int size;
>=20
>         if (unlikely(n > RTE_MEMPOOL_CACHE_MAX_SIZE)) {
>             rte_errno =3D EINVAL;
>             goto fail;
>         }
>=20
>         /* Fill the cache from the backend; fetch size + requested -
> len objects. */
>         size =3D cache->size;
>=20
>         ret =3D rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], =
size
> + n - len);
>         if (unlikely(ret < 0)) {
>             /*
>              * We are buffer constrained.
>              * Do not fill the cache, just satisfy the request.
>              */
>             ret =3D rte_mempool_ops_dequeue_bulk(mp, =
&cache->objs[len], n
> - len);
>             if (unlikely(ret < 0)) {
>                 rte_errno =3D -ret;
>                 goto fail;
>             }
>=20
>             len =3D 0;
>         } else
>             len =3D size;
>     } else
>         len -=3D n;
>=20
>     cache->len =3D len;
>=20
>     RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
>     RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
>=20
>     return &cache->objs[len];
>=20
> fail:
>=20
>     RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
>     RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
>=20
>     return NULL;
> }
>=20
>=20
> 2. Advanced version:
>=20
> /**
>  * Get objects from a mempool via zero-copy access to a user-owned
> mempool cache.
>  *
>  * @param cache
>  *   A pointer to the mempool cache.
>  * @param mp
>  *   A pointer to the mempool.
>  * @param n
>  *   The number of objects to prefetch into the mempool cache.
>  * @return
>  *   The pointer to the objects in the mempool cache.
>  *   NULL on error
>  *   with rte_errno set appropriately.
>  */
> static __rte_always_inline void *
> rte_mempool_cache_get_bulk(struct rte_mempool_cache *cache,
>         struct rte_mempool *mp,
>         unsigned int n)
> {
>     unsigned int len;
>=20
>     if (cache =3D=3D NULL)
>         cache =3D rte_mempool_default_cache(mp, rte_lcore_id());
>     if (cache =3D=3D NULL) {
>         rte_errno =3D EINVAL;
>         goto fail;
>     }
>=20
>     rte_mempool_trace_cache_get_bulk(cache, mp, n);
>=20
>     len =3D cache->len;
>=20
>     if (unlikely(n > len)) {
>         unsigned int size;
>=20
>         if (unlikely(n > RTE_MEMPOOL_CACHE_MAX_SIZE)) {
>             rte_errno =3D EINVAL;
>             goto fail;
>         }
>=20
>         /* Fill the cache from the backend; fetch size + requested -
> len objects. */
>         size =3D cache->size;
>=20
>         if (likely(size + n >=3D 2 * len)) {
>             /*
>              * No overlap when copying (dst >=3D len): size + n - len =
>=3D
> len.
>              * Move (i.e. copy) the existing objects in the cache to
> the
>              * coming top of the cache, to make room for new objects
> below.
>              */
>             rte_memcpy(&cache->objs[size + n - len], &cache->objs[0],
> len);

Length is bytes, not number of objects, so that should be:

rte_memcpy(&cache->objs[size + n - len], &cache->objs[0], len * =
sizeof(void*));

>=20
>             /* Fill the cache below the existing objects in the cache.
> */
>             ret =3D rte_mempool_ops_dequeue_bulk(mp, &cache->objs[0],
> size + n - len);
>             if (unlikely(ret < 0)) {
>                 goto constrained;
>             } else
>                 len =3D size;
>         } else {
>             /* Fill the cache on top of any objects in it. */
>             ret =3D rte_mempool_ops_dequeue_bulk(mp, =
&cache->objs[len],
> size + n - len);
>             if (unlikely(ret < 0)) {
>=20
> constrained:
>                 /*
>                  * We are buffer constrained.
>                  * Do not fill the cache, just satisfy the request.
>                  */
>                 ret =3D rte_mempool_ops_dequeue_bulk(mp, &cache-
> >objs[len], n - len);
>                 if (unlikely(ret < 0)) {
>                     rte_errno =3D -ret;
>                     goto fail;
>                 }
>=20
>                 len =3D 0;
>             } else
>                 len =3D size;
>         }
>     } else
>         len -=3D n;
>=20
>     cache->len =3D len;
>=20
>     RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
>     RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
>=20
>     return &cache->objs[len];
>=20
> fail:
>=20
>     RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
>     RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
>=20
>     return NULL;
> }
>=20
>=20
> Med venlig hilsen / Kind regards,
> -Morten Br=F8rup