From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 992C0A0543; Sun, 6 Nov 2022 08:12:31 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 36E0940691; Sun, 6 Nov 2022 08:12:31 +0100 (CET) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id 55AF34003C for ; Sun, 6 Nov 2022 08:12:29 +0100 (CET) X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: [RFC]: mempool: zero-copy cache get bulk Date: Sun, 6 Nov 2022 08:12:25 +0100 Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D8748B@smartserver.smartshare.dk> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [RFC]: mempool: zero-copy cache get bulk Thread-Index: AdjxGTKWl0t76eJiQb2E+CV6pXxvaQAlLH4g From: =?iso-8859-1?Q?Morten_Br=F8rup?= To: , , , Cc: "Kamalakshitha Aligeri" , "nd" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > From: Morten Br=F8rup > Sent: Saturday, 5 November 2022 14.19 >=20 > Zero-copy access to the mempool cache is beneficial for PMD > performance, and must be provided by the mempool library to fix [Bug > 1052] without a performance regression. >=20 > [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=3D1052 >=20 >=20 > This RFC offers two conceptual variants of zero-copy get: > 1. A simple version. > 2. A version where existing (hot) objects in the cache are moved to = the > top of the cache before new objects from the backend driver are pulled > in. >=20 > I would like some early feedback. Also, which variant do you prefer? >=20 > Notes: > * Allowing the 'cache' parameter to be NULL, and getting it from the > mempool instead, was inspired by rte_mempool_cache_flush(). "instead" -> "in this case" > * Asserting that the 'mp' parameter is not NULL is not done by other > functions, so I omitted it here too. >=20 > NB: Please ignore formatting. Also, this code has not even been = compile > tested. And I just spotted an error: the rte_memcpy() length field must be = multiplied by sizeof(void*). >=20 > 1. Simple version: >=20 > /** > * Get objects from a mempool via zero-copy access to a user-owned > mempool cache. > * > * @param cache > * A pointer to the mempool cache. > * @param mp > * A pointer to the mempool. > * @param n > * The number of objects to prefetch into the mempool cache. > * @return > * The pointer to the objects in the mempool cache. > * NULL on error > * with rte_errno set appropriately. > */ > static __rte_always_inline void * > rte_mempool_cache_get_bulk(struct rte_mempool_cache *cache, > struct rte_mempool *mp, > unsigned int n) > { > unsigned int len; >=20 > if (cache =3D=3D NULL) > cache =3D rte_mempool_default_cache(mp, rte_lcore_id()); > if (cache =3D=3D NULL) { > rte_errno =3D EINVAL; > goto fail; > } >=20 > rte_mempool_trace_cache_get_bulk(cache, mp, n); >=20 > len =3D cache->len; >=20 > if (unlikely(n > len)) { > unsigned int size; >=20 > if (unlikely(n > RTE_MEMPOOL_CACHE_MAX_SIZE)) { > rte_errno =3D EINVAL; > goto fail; > } >=20 > /* Fill the cache from the backend; fetch size + requested - > len objects. */ > size =3D cache->size; >=20 > ret =3D rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], = size > + n - len); > if (unlikely(ret < 0)) { > /* > * We are buffer constrained. > * Do not fill the cache, just satisfy the request. > */ > ret =3D rte_mempool_ops_dequeue_bulk(mp, = &cache->objs[len], n > - len); > if (unlikely(ret < 0)) { > rte_errno =3D -ret; > goto fail; > } >=20 > len =3D 0; > } else > len =3D size; > } else > len -=3D n; >=20 > cache->len =3D len; >=20 > RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1); > RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n); >=20 > return &cache->objs[len]; >=20 > fail: >=20 > RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1); > RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n); >=20 > return NULL; > } >=20 >=20 > 2. Advanced version: >=20 > /** > * Get objects from a mempool via zero-copy access to a user-owned > mempool cache. > * > * @param cache > * A pointer to the mempool cache. > * @param mp > * A pointer to the mempool. > * @param n > * The number of objects to prefetch into the mempool cache. > * @return > * The pointer to the objects in the mempool cache. > * NULL on error > * with rte_errno set appropriately. > */ > static __rte_always_inline void * > rte_mempool_cache_get_bulk(struct rte_mempool_cache *cache, > struct rte_mempool *mp, > unsigned int n) > { > unsigned int len; >=20 > if (cache =3D=3D NULL) > cache =3D rte_mempool_default_cache(mp, rte_lcore_id()); > if (cache =3D=3D NULL) { > rte_errno =3D EINVAL; > goto fail; > } >=20 > rte_mempool_trace_cache_get_bulk(cache, mp, n); >=20 > len =3D cache->len; >=20 > if (unlikely(n > len)) { > unsigned int size; >=20 > if (unlikely(n > RTE_MEMPOOL_CACHE_MAX_SIZE)) { > rte_errno =3D EINVAL; > goto fail; > } >=20 > /* Fill the cache from the backend; fetch size + requested - > len objects. */ > size =3D cache->size; >=20 > if (likely(size + n >=3D 2 * len)) { > /* > * No overlap when copying (dst >=3D len): size + n - len = >=3D > len. > * Move (i.e. copy) the existing objects in the cache to > the > * coming top of the cache, to make room for new objects > below. > */ > rte_memcpy(&cache->objs[size + n - len], &cache->objs[0], > len); Length is bytes, not number of objects, so that should be: rte_memcpy(&cache->objs[size + n - len], &cache->objs[0], len * = sizeof(void*)); >=20 > /* Fill the cache below the existing objects in the cache. > */ > ret =3D rte_mempool_ops_dequeue_bulk(mp, &cache->objs[0], > size + n - len); > if (unlikely(ret < 0)) { > goto constrained; > } else > len =3D size; > } else { > /* Fill the cache on top of any objects in it. */ > ret =3D rte_mempool_ops_dequeue_bulk(mp, = &cache->objs[len], > size + n - len); > if (unlikely(ret < 0)) { >=20 > constrained: > /* > * We are buffer constrained. > * Do not fill the cache, just satisfy the request. > */ > ret =3D rte_mempool_ops_dequeue_bulk(mp, &cache- > >objs[len], n - len); > if (unlikely(ret < 0)) { > rte_errno =3D -ret; > goto fail; > } >=20 > len =3D 0; > } else > len =3D size; > } > } else > len -=3D n; >=20 > cache->len =3D len; >=20 > RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1); > RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n); >=20 > return &cache->objs[len]; >=20 > fail: >=20 > RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1); > RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n); >=20 > return NULL; > } >=20 >=20 > Med venlig hilsen / Kind regards, > -Morten Br=F8rup