From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0B9B843A1B; Wed, 7 Feb 2024 03:24:56 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8E1534069F; Wed, 7 Feb 2024 03:24:55 +0100 (CET) Received: from dkmailrelay1.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id AE77F402EA for ; Wed, 7 Feb 2024 03:24:54 +0100 (CET) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesys.local [192.168.4.10]) by dkmailrelay1.smartsharesystems.com (Postfix) with ESMTP id AD21A206C4; Wed, 7 Feb 2024 03:24:53 +0100 (CET) Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: [EXT] [PATCH] cryptodev: add a simple mapping cache to speed up ops pool create X-MimeOLE: Produced By Microsoft Exchange V6.5 Date: Wed, 7 Feb 2024 03:24:50 +0100 Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9F1F3@smartserver.smartshare.dk> In-Reply-To: A X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [EXT] [PATCH] cryptodev: add a simple mapping cache to speed up ops pool create Thread-Index: AQHaSvZkAbcKG4dnWUKRNlBy6tcvdrD9eKlwgADAE9A= References: <20240119164122.11829-1-andrew.boyer@amd.com> A From: =?iso-8859-1?Q?Morten_Br=F8rup?= To: "Akhil Goyal" , "Andrew Boyer" , , "Andrew Rybchenko" Cc: , "Fan Zhang" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > From: Akhil Goyal [mailto:gakhil@marvell.com] > Sent: Tuesday, 6 February 2024 15.25 >=20 > > Cache the most recent VA -> PA mapping found so that we can skip > > most of the system calls. With 4K pages this reduces pool create > > time by about 90%. > > > > Signed-off-by: Andrew Boyer >=20 > I believe there should be a generic solution for this in mempool > if it is not there already. > Here, you are adding cache in mempool priv > which does not seem to be a correct place. > This optimization would be needed across all types of mempools. > Adding more people for comments. >=20 >=20 > > --- > > lib/cryptodev/rte_crypto.h | 5 +++++ > > lib/cryptodev/rte_cryptodev.c | 23 ++++++++++++++++++++++- > > 2 files changed, 27 insertions(+), 1 deletion(-) > > > > diff --git a/lib/cryptodev/rte_crypto.h b/lib/cryptodev/rte_crypto.h > > index dbc2700da5..ee6aa1e40e 100644 > > --- a/lib/cryptodev/rte_crypto.h > > +++ b/lib/cryptodev/rte_crypto.h > > @@ -220,6 +220,11 @@ struct rte_crypto_op_pool_private { > > /**< Crypto op pool type operation. */ > > uint16_t priv_size; > > /**< Size of private area in each crypto operation. */ > > + > > + unsigned long vp_cache; > > + /* Virtual page address of previous op. */ > > + rte_iova_t iovp_cache; > > + /* I/O virtual page address of previous op. */ > > }; > > > > > > diff --git a/lib/cryptodev/rte_cryptodev.c > b/lib/cryptodev/rte_cryptodev.c > > index b233c0ecd7..d596f85a57 100644 > > --- a/lib/cryptodev/rte_cryptodev.c > > +++ b/lib/cryptodev/rte_cryptodev.c > > @@ -10,6 +10,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -2568,12 +2569,32 @@ rte_crypto_op_init(struct rte_mempool > *mempool, > > { > > struct rte_crypto_op *op =3D _op_data; > > enum rte_crypto_op_type type =3D *(enum rte_crypto_op_type > > *)opaque_arg; > > + struct rte_crypto_op_pool_private *priv; > > + unsigned long virt_addr =3D (unsigned long)(uintptr_t)_op_data; > > +#ifdef RTE_EXEC_ENV_WINDOWS > > + unsigned long page_mask =3D 4095; > > +#else > > + unsigned long page_mask =3D sysconf(_SC_PAGESIZE) - 1; > > +#endif > > + unsigned long virt_page =3D virt_addr & ~page_mask; > > > > memset(_op_data, 0, mempool->elt_size); > > > > __rte_crypto_op_reset(op, type); > > > > - op->phys_addr =3D rte_mem_virt2iova(_op_data); This optimization is for rte_mem_virt2iova(_op_data) being slow. If I'm not mistaken, _op_data is an object in a mempool, where the = mempool object headers have already been initialized. In this case, it could simply be optimized as this: - op->phys_addr =3D rte_mem_virt2iova(_op_data); + op->phys_addr =3D rte_mempool_virt2iova(_op_data); Now going down a rat hole... If the above is true, I wonder if struct rte_crypto_op is only = instantiated as objects in mempools... If it is, the op->mempool and = op->phys_addr fields are fundamentally redundant, and can be retrieved = from the mempool object header instead: op->phys_addr =3D=3D=3D rte_mempool_virt2iova(op) op->mempool =3D=3D=3D rte_mempool_from_obj(op) Having these shadow variables in struct rte_crypto_op may provide = performance benefits when RTE_MEMPOOL_F_NO_CACHE_ALIGN is not set on the = mempool, the mempool object header is in the preceding cache line of the = mempool object (containing the struct rte_crypto_op op). A better solution than these shadow variables would be to introduce = another mempool flag to cache align the mempool object header, but let = the mempool object itself directly follow the mempool object header, so = the mempool object header and the mempool object itself (if small = enough) reside in the same cache line. This would also use less memory. > > + priv =3D (struct rte_crypto_op_pool_private *) > > + rte_mempool_get_priv(mempool); > > + > > + if (virt_page =3D=3D priv->vp_cache) { > > + op->phys_addr =3D priv->iovp_cache + (virt_addr & page_mask); > > + } else { > > + op->phys_addr =3D rte_mem_virt2iova(_op_data); > > + > > + /* Update cached values */ > > + priv->vp_cache =3D virt_page; > > + priv->iovp_cache =3D op->phys_addr & ~page_mask; > > + } > > + > > op->mempool =3D mempool; > > } > > > > -- > > 2.17.1