On Feb 6, 2024, at 9:24 PM, Morten Brørup <mb@smartsharesystems.com> wrote:

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.


From: Akhil Goyal [mailto:gakhil@marvell.com]
Sent: Tuesday, 6 February 2024 15.25

Cache the most recent VA -> PA mapping found so that we can skip
most of the system calls. With 4K pages this reduces pool create
time by about 90%.

Signed-off-by: Andrew Boyer <andrew.boyer@amd.com>

I believe there should be a generic solution for this in mempool
if it is not there already.
Here, you are adding cache in mempool priv
which does not seem to be a correct place.
This optimization would be needed across all types of mempools.
Adding more people for comments.


---
lib/cryptodev/rte_crypto.h    |  5 +++++
lib/cryptodev/rte_cryptodev.c | 23 ++++++++++++++++++++++-
2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/lib/cryptodev/rte_crypto.h b/lib/cryptodev/rte_crypto.h
index dbc2700da5..ee6aa1e40e 100644
--- a/lib/cryptodev/rte_crypto.h
+++ b/lib/cryptodev/rte_crypto.h
@@ -220,6 +220,11 @@ struct rte_crypto_op_pool_private {
   /**< Crypto op pool type operation. */
   uint16_t priv_size;
   /**< Size of private area in each crypto operation. */
+
+   unsigned long vp_cache;
+   /* Virtual page address of previous op. */
+   rte_iova_t iovp_cache;
+   /* I/O virtual page address of previous op. */
};


diff --git a/lib/cryptodev/rte_cryptodev.c
b/lib/cryptodev/rte_cryptodev.c
index b233c0ecd7..d596f85a57 100644
--- a/lib/cryptodev/rte_cryptodev.c
+++ b/lib/cryptodev/rte_cryptodev.c
@@ -10,6 +10,7 @@
#include <errno.h>
#include <stdint.h>
#include <inttypes.h>
+#include <unistd.h>

#include <rte_log.h>
#include <rte_debug.h>
@@ -2568,12 +2569,32 @@ rte_crypto_op_init(struct rte_mempool
*mempool,
{
   struct rte_crypto_op *op = _op_data;
   enum rte_crypto_op_type type = *(enum rte_crypto_op_type
*)opaque_arg;
+   struct rte_crypto_op_pool_private *priv;
+   unsigned long virt_addr = (unsigned long)(uintptr_t)_op_data;
+#ifdef RTE_EXEC_ENV_WINDOWS
+   unsigned long page_mask = 4095;
+#else
+   unsigned long page_mask = sysconf(_SC_PAGESIZE) - 1;
+#endif
+   unsigned long virt_page = virt_addr & ~page_mask;

   memset(_op_data, 0, mempool->elt_size);

   __rte_crypto_op_reset(op, type);

-   op->phys_addr = rte_mem_virt2iova(_op_data);

This optimization is for rte_mem_virt2iova(_op_data) being slow.

If I'm not mistaken, _op_data is an object in a mempool, where the mempool object headers have already been initialized.

In this case, it could simply be optimized as this:
-       op->phys_addr = rte_mem_virt2iova(_op_data);
+       op->phys_addr = rte_mempool_virt2iova(_op_data);


That certainly is shorter! Thanks, I was not aware of this function.

-Andrew