DPDK patches and discussions
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] crypto/mlx5: support AES-GCM
@ 2023-04-18  9:23 Suanming Mou
  2023-04-18  9:23 ` [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability Suanming Mou
                   ` (7 more replies)
  0 siblings, 8 replies; 54+ messages in thread
From: Suanming Mou @ 2023-04-18  9:23 UTC (permalink / raw)
  To: matan; +Cc: rasland, mkashani, dev

AES-GCM provides both authenticated encryption and the ability to check
the integrity and authentication of additional authenticated data (AAD)
that is sent in the clear.

The crypto operations are performed with crypto WQE. If the input
buffers(AAD, mbuf, digest) are not contiguous, as the requirement
from FW, an UMR WQE is needed to generate contiguous address space
for crypto WQE. The UMR WQE and crypto WQE are handled in two
different QPs.

The QP for UMR operation contains two types of WQE, UMR and SEND_EN
WQE. The WQEs are built dynamically according to the crypto operation 
buffer address. Crypto operation with non-contiguous buffers will
have its own UMR WQE, while the operation with contiguous buffers   
doesn't need the UMR WQE. Once the all the operations WQE in the
enqueue burst built finishes, if any UMR WQEs are built, additional
SEND_EN WQE will be as the final WQE of the burst in the UMR QP.
The purpose of that SEND_EN WQE is to trigger the crypto QP processing
with the UMR ready input memory address space buffers.

The QP for crypto operations contains only the crypto WQE and the QP
WQEs are built as fixed in QP setup. The QP processing is triggered
by doorbell ring or the SEND_EN WQE from UMR QP.

Suanming Mou (5):
  crypto/mlx5: add AES-GCM capability
  crypto/mlx5: add AES-GCM encryption key
  crypto/mlx5: add AES-GCM session configure
  crypto/mlx5: add queue pair setup
  crypto/mlx5: add enqueue and dequeue operations

 doc/guides/nics/mlx5.rst              |   8 +
 drivers/common/mlx5/mlx5_devx_cmds.c  |  29 +-
 drivers/common/mlx5/mlx5_devx_cmds.h  |  18 +
 drivers/common/mlx5/mlx5_prm.h        |  62 +-
 drivers/crypto/mlx5/meson.build       |   1 +
 drivers/crypto/mlx5/mlx5_crypto.c     |  64 +-
 drivers/crypto/mlx5/mlx5_crypto.h     |  57 +-
 drivers/crypto/mlx5/mlx5_crypto_dek.c | 157 +++--
 drivers/crypto/mlx5/mlx5_crypto_gcm.c | 803 ++++++++++++++++++++++++++
 9 files changed, 1139 insertions(+), 60 deletions(-)
 create mode 100644 drivers/crypto/mlx5/mlx5_crypto_gcm.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability
  2023-04-18  9:23 [RFC PATCH 0/5] crypto/mlx5: support AES-GCM Suanming Mou
@ 2023-04-18  9:23 ` Suanming Mou
  2023-05-17  7:37   ` [EXT] " Akhil Goyal
  2023-04-18  9:23 ` [RFC PATCH 2/5] crypto/mlx5: add AES-GCM encryption key Suanming Mou
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 54+ messages in thread
From: Suanming Mou @ 2023-04-18  9:23 UTC (permalink / raw)
  To: matan; +Cc: rasland, mkashani, dev

AES-GCM provides both authenticated encryption and the ability to check
the integrity and authentication of additional authenticated data (AAD)
that is sent in the clear.

This commit adds the AES-GCM capability query and check. An new devarg
"algo" is added to identify if the crypto PMD will be initialized as
AES-GCM(algo=1) or AES-XTS(algo=0, default).

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 doc/guides/nics/mlx5.rst              |   8 +++
 drivers/common/mlx5/mlx5_devx_cmds.c  |  17 +++++
 drivers/common/mlx5/mlx5_devx_cmds.h  |  14 ++++
 drivers/common/mlx5/mlx5_prm.h        |  19 ++++-
 drivers/crypto/mlx5/meson.build       |   1 +
 drivers/crypto/mlx5/mlx5_crypto.c     |  30 +++++++-
 drivers/crypto/mlx5/mlx5_crypto.h     |   5 ++
 drivers/crypto/mlx5/mlx5_crypto_gcm.c | 100 ++++++++++++++++++++++++++
 8 files changed, 189 insertions(+), 5 deletions(-)
 create mode 100644 drivers/crypto/mlx5/mlx5_crypto_gcm.c

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 9d111ed436..5eb2150613 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -1270,6 +1270,14 @@ for an additional list of options shared with other mlx5 drivers.
 
   Set to zero by default.
 
+- ``algo`` parameter [int]
+
+  - 0. AES-XTS crypto.
+
+  - 1. AES-GCM crypto.
+
+  Set to zero(AES-XTS) by default.
+
 Supported NICs
 --------------
 
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 96d3e3e373..592a7cffdb 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -1197,6 +1197,23 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 		attr->crypto_wrapped_import_method = !!(MLX5_GET(crypto_caps,
 						hcattr, wrapped_import_method)
 						& 1 << 2);
+		attr->sw_wrapped_dek = MLX5_GET(crypto_caps, hcattr, sw_wrapped_dek_key_purpose) ?
+			MLX5_GET(crypto_caps, hcattr, sw_wrapped_dek_new) : 0;
+		attr->crypto_mmo.crypto_mmo_qp = MLX5_GET(crypto_caps, hcattr, crypto_mmo_qp);
+		attr->crypto_mmo.gcm_256_encrypt =
+			MLX5_GET(crypto_caps, hcattr, crypto_aes_gcm_256_encrypt);
+		attr->crypto_mmo.gcm_128_encrypt =
+			MLX5_GET(crypto_caps, hcattr, crypto_aes_gcm_128_encrypt);
+		attr->crypto_mmo.gcm_256_decrypt =
+			MLX5_GET(crypto_caps, hcattr, crypto_aes_gcm_256_decrypt);
+		attr->crypto_mmo.gcm_128_decrypt =
+			MLX5_GET(crypto_caps, hcattr, crypto_aes_gcm_128_decrypt);
+		attr->crypto_mmo.gcm_auth_tag_128 =
+			MLX5_GET(crypto_caps, hcattr, gcm_auth_tag_128);
+		attr->crypto_mmo.gcm_auth_tag_96 =
+			MLX5_GET(crypto_caps, hcattr, gcm_auth_tag_96);
+		attr->crypto_mmo.log_crypto_mmo_max_size =
+			MLX5_GET(crypto_caps, hcattr, log_crypto_mmo_max_size);
 	}
 	if (hca_cap_2_sup) {
 		hcattr = mlx5_devx_get_hca_cap(ctx, in, out, &rc,
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index e006a04d68..d640482346 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -153,6 +153,18 @@ struct mlx5_hca_ipsec_attr {
 	struct mlx5_hca_ipsec_reformat_attr reformat_fdb;
 };
 
+__extension__
+struct mlx5_hca_crypto_mmo_attr {
+	uint32_t crypto_mmo_qp:1;
+	uint32_t gcm_256_encrypt:1;
+	uint32_t gcm_128_encrypt:1;
+	uint32_t gcm_256_decrypt:1;
+	uint32_t gcm_128_decrypt:1;
+	uint32_t gcm_auth_tag_128:1;
+	uint32_t gcm_auth_tag_96:1;
+	uint32_t log_crypto_mmo_max_size:6;
+};
+
 /* ISO C restricts enumerator values to range of 'int' */
 __extension__
 enum {
@@ -266,6 +278,7 @@ struct mlx5_hca_attr {
 	uint32_t import_kek:1; /* General obj type IMPORT_KEK supported. */
 	uint32_t credential:1; /* General obj type CREDENTIAL supported. */
 	uint32_t crypto_login:1; /* General obj type CRYPTO_LOGIN supported. */
+	uint32_t sw_wrapped_dek:16; /* DEKs wrapped by SW are supported */
 	uint32_t regexp_num_of_engines;
 	uint32_t log_max_ft_sampler_num:8;
 	uint32_t inner_ipv4_ihl:1;
@@ -281,6 +294,7 @@ struct mlx5_hca_attr {
 	struct mlx5_hca_flow_attr flow;
 	struct mlx5_hca_flex_attr flex;
 	struct mlx5_hca_ipsec_attr ipsec;
+	struct mlx5_hca_crypto_mmo_attr crypto_mmo;
 	int log_max_qp_sz;
 	int log_max_cq_sz;
 	int log_max_qp;
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 31db082c50..a3b85f514e 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -4654,7 +4654,9 @@ struct mlx5_ifc_crypto_caps_bits {
 	u8 synchronize_dek[0x1];
 	u8 int_kek_manual[0x1];
 	u8 int_kek_auto[0x1];
-	u8 reserved_at_6[0x12];
+	u8 reserved_at_6[0xd];
+	u8 sw_wrapped_dek_key_purpose[0x1];
+	u8 reserved_at_14[0x4];
 	u8 wrapped_import_method[0x8];
 	u8 reserved_at_20[0x3];
 	u8 log_dek_max_alloc[0x5];
@@ -4671,8 +4673,19 @@ struct mlx5_ifc_crypto_caps_bits {
 	u8 log_dek_granularity[0x5];
 	u8 reserved_at_68[0x3];
 	u8 log_max_num_int_kek[0x5];
-	u8 reserved_at_70[0x10];
-	u8 reserved_at_80[0x780];
+	u8 sw_wrapped_dek_new[0x10];
+	u8 reserved_at_80[0x80];
+	u8 crypto_mmo_qp[0x1];
+	u8 crypto_aes_gcm_256_encrypt[0x1];
+	u8 crypto_aes_gcm_128_encrypt[0x1];
+	u8 crypto_aes_gcm_256_decrypt[0x1];
+	u8 crypto_aes_gcm_128_decrypt[0x1];
+	u8 gcm_auth_tag_128[0x1];
+	u8 gcm_auth_tag_96[0x1];
+	u8 reserved_at_107[0x3];
+	u8 log_crypto_mmo_max_size[0x6];
+	u8 reserved_at_110[0x10];
+	u8 reserved_at_120[0x6d0];
 };
 
 struct mlx5_ifc_crypto_commissioning_register_bits {
diff --git a/drivers/crypto/mlx5/meson.build b/drivers/crypto/mlx5/meson.build
index a830a4c7b9..930a31c795 100644
--- a/drivers/crypto/mlx5/meson.build
+++ b/drivers/crypto/mlx5/meson.build
@@ -15,6 +15,7 @@ endif
 
 sources = files(
         'mlx5_crypto.c',
+	'mlx5_crypto_gcm.c',
         'mlx5_crypto_dek.c',
 )
 
diff --git a/drivers/crypto/mlx5/mlx5_crypto.c b/drivers/crypto/mlx5/mlx5_crypto.c
index 9dec1cfbe0..6963d8a9c9 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -23,6 +23,13 @@
 #define MLX5_CRYPTO_MAX_QPS 128
 #define MLX5_CRYPTO_MAX_SEGS 56
 
+enum mlx5_crypto_pmd_support_algo {
+	MLX5_CRYPTO_PMD_SUPPORT_ALGO_NULL,
+	MLX5_CRYPTO_PMD_SUPPORT_ALGO_AES_XTS,
+	MLX5_CRYPTO_PMD_SUPPORT_ALGO_AES_GCM,
+	MLX5_CRYPTO_PMD_SUPPORT_ALGO_MAX,
+};
+
 #define MLX5_CRYPTO_FEATURE_FLAGS(wrapped_mode) \
 	(RTE_CRYPTODEV_FF_SYMMETRIC_CRYPTO | RTE_CRYPTODEV_FF_HW_ACCELERATED | \
 	 RTE_CRYPTODEV_FF_IN_PLACE_SGL | RTE_CRYPTODEV_FF_OOP_SGL_IN_SGL_OUT | \
@@ -102,7 +109,7 @@ mlx5_crypto_dev_infos_get(struct rte_cryptodev *dev,
 		dev_info->driver_id = mlx5_crypto_driver_id;
 		dev_info->feature_flags =
 			MLX5_CRYPTO_FEATURE_FLAGS(priv->is_wrapped_mode);
-		dev_info->capabilities = mlx5_crypto_caps;
+		dev_info->capabilities = priv->caps;
 		dev_info->max_nb_queue_pairs = MLX5_CRYPTO_MAX_QPS;
 		dev_info->min_mbuf_headroom_req = 0;
 		dev_info->min_mbuf_tailroom_req = 0;
@@ -749,6 +756,14 @@ mlx5_crypto_args_check_handler(const char *key, const char *val, void *opaque)
 		attr->credential_pointer = (uint32_t)tmp;
 	} else if (strcmp(key, "keytag") == 0) {
 		devarg_prms->keytag = tmp;
+	} else if (strcmp(key, "algo") == 0) {
+		if (tmp == 1) {
+			devarg_prms->is_aes_gcm = 1;
+		} else if (tmp > 1) {
+			DRV_LOG(ERR, "Invalid algo.");
+			rte_errno = EINVAL;
+			return -rte_errno;
+		}
 	}
 	return 0;
 }
@@ -765,6 +780,7 @@ mlx5_crypto_parse_devargs(struct mlx5_kvargs_ctrl *mkvlist,
 		"keytag",
 		"max_segs_num",
 		"wcs_file",
+		"algo",
 		NULL,
 	};
 
@@ -895,7 +911,9 @@ mlx5_crypto_dev_probe(struct mlx5_common_device *cdev,
 		rte_errno = ENOTSUP;
 		return -rte_errno;
 	}
-	if (!cdev->config.hca_attr.crypto || !cdev->config.hca_attr.aes_xts) {
+	if (!cdev->config.hca_attr.crypto ||
+	   (!cdev->config.hca_attr.aes_xts &&
+	    !cdev->config.hca_attr.crypto_mmo.crypto_mmo_qp)) {
 		DRV_LOG(ERR, "Not enough capabilities to support crypto "
 			"operations, maybe old FW/OFED version?");
 		rte_errno = ENOTSUP;
@@ -924,6 +942,14 @@ mlx5_crypto_dev_probe(struct mlx5_common_device *cdev,
 	priv->cdev = cdev;
 	priv->crypto_dev = crypto_dev;
 	priv->is_wrapped_mode = wrapped_mode;
+	priv->caps = mlx5_crypto_caps;
+	/* Init and override AES-GCM configuration. */
+	if (devarg_prms.is_aes_gcm) {
+		ret = mlx5_crypto_gcm_init(priv);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to init AES-GCM crypto.");
+		}
+	}
 	if (mlx5_devx_uar_prepare(cdev, &priv->uar) != 0) {
 		rte_cryptodev_pmd_destroy(priv->crypto_dev);
 		return -1;
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index a2771b3dab..80c2cab0dd 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -31,6 +31,7 @@ struct mlx5_crypto_priv {
 	struct mlx5_uar uar; /* User Access Region. */
 	uint32_t max_segs_num; /* Maximum supported data segs. */
 	struct mlx5_hlist *dek_hlist; /* Dek hash list. */
+	const struct rte_cryptodev_capabilities *caps;
 	struct rte_cryptodev_config dev_config;
 	struct mlx5_devx_obj *login_obj;
 	uint64_t keytag;
@@ -68,6 +69,7 @@ struct mlx5_crypto_devarg_params {
 	struct mlx5_devx_crypto_login_attr login_attr;
 	uint64_t keytag;
 	uint32_t max_segs_num;
+	uint32_t is_aes_gcm:1;
 };
 
 int
@@ -84,4 +86,7 @@ mlx5_crypto_dek_setup(struct mlx5_crypto_priv *priv);
 void
 mlx5_crypto_dek_unset(struct mlx5_crypto_priv *priv);
 
+int
+mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv);
+
 #endif /* MLX5_CRYPTO_H_ */
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
new file mode 100644
index 0000000000..d60ac379cf
--- /dev/null
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -0,0 +1,100 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 NVIDIA Corporation & Affiliates
+ */
+
+#include <rte_malloc.h>
+#include <rte_mempool.h>
+#include <rte_eal_paging.h>
+#include <rte_errno.h>
+#include <rte_log.h>
+#include <bus_pci_driver.h>
+#include <rte_memory.h>
+
+#include <mlx5_glue.h>
+#include <mlx5_common.h>
+#include <mlx5_devx_cmds.h>
+#include <mlx5_common_os.h>
+
+#include "mlx5_crypto_utils.h"
+#include "mlx5_crypto.h"
+
+static struct rte_cryptodev_capabilities mlx5_crypto_gcm_caps[] = {
+	{
+		.op = RTE_CRYPTO_OP_TYPE_UNDEFINED,
+	},
+	{
+		.op = RTE_CRYPTO_OP_TYPE_UNDEFINED,
+	}
+};
+
+static int
+mlx5_crypto_generate_gcm_cap(struct mlx5_hca_crypto_mmo_attr *mmo_attr,
+			     struct rte_cryptodev_capabilities *cap)
+{
+	/* Init key size. */
+	if (mmo_attr->gcm_128_encrypt && mmo_attr->gcm_128_decrypt &&
+		mmo_attr->gcm_256_encrypt && mmo_attr->gcm_256_decrypt) {
+		cap->sym.aead.key_size.min = 16;
+		cap->sym.aead.key_size.max = 32;
+		cap->sym.aead.key_size.increment = 16;
+	} else if (mmo_attr->gcm_256_encrypt && mmo_attr->gcm_256_decrypt) {
+		cap->sym.aead.key_size.min = 32;
+		cap->sym.aead.key_size.max = 32;
+		cap->sym.aead.key_size.increment = 0;
+	} else if (mmo_attr->gcm_128_encrypt && mmo_attr->gcm_128_decrypt) {
+		cap->sym.aead.key_size.min = 16;
+		cap->sym.aead.key_size.max = 16;
+		cap->sym.aead.key_size.increment = 0;
+	} else {
+		DRV_LOG(ERR, "No available AES-GCM encryption/decryption supported.");
+		return -1;
+	}
+	/* Init tag size. */
+	if (mmo_attr->gcm_auth_tag_128 && mmo_attr->gcm_auth_tag_128) {
+		cap->sym.aead.digest_size.min = 8;
+		cap->sym.aead.digest_size.max = 16;
+		cap->sym.aead.digest_size.increment = 8;
+	} else if (mmo_attr->gcm_auth_tag_128) {
+		cap->sym.aead.digest_size.min = 8;
+		cap->sym.aead.digest_size.max = 8;
+		cap->sym.aead.digest_size.increment = 0;
+	} else if (mmo_attr->gcm_128_encrypt && mmo_attr->gcm_128_decrypt) {
+		cap->sym.aead.digest_size.min = 16;
+		cap->sym.aead.digest_size.max = 16;
+		cap->sym.aead.digest_size.increment = 0;
+	} else {
+		DRV_LOG(ERR, "No available AES-GCM tag size supported.");
+		return -1;
+	}
+	/* Init AAD size. */
+	cap->sym.aead.aad_size.min = 0;
+	cap->sym.aead.aad_size.max = UINT16_MAX;
+	cap->sym.aead.aad_size.increment = 1;
+	/* Init IV size. */
+	cap->sym.aead.iv_size.min = 12;
+	cap->sym.aead.iv_size.max = 12;
+	cap->sym.aead.iv_size.increment = 0;
+	/* Init left items. */
+	cap->op = RTE_CRYPTO_OP_TYPE_SYMMETRIC;
+	cap->sym.xform_type = RTE_CRYPTO_SYM_XFORM_AEAD;
+	cap->sym.aead.algo = RTE_CRYPTO_AEAD_AES_GCM;
+	return 0;
+}
+
+int
+mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
+{
+	struct mlx5_common_device *cdev = priv->cdev;
+	int ret;
+
+	/* Generate GCM capability. */
+	ret = mlx5_crypto_generate_gcm_cap(&cdev->config.hca_attr.crypto_mmo,
+					   mlx5_crypto_gcm_caps);
+	if (ret) {
+		DRV_LOG(ERR, "No enough AES-GCM cap.");
+		return -1;
+	}
+	priv->caps = mlx5_crypto_gcm_caps;
+	return 0;
+}
+
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [RFC PATCH 2/5] crypto/mlx5: add AES-GCM encryption key
  2023-04-18  9:23 [RFC PATCH 0/5] crypto/mlx5: support AES-GCM Suanming Mou
  2023-04-18  9:23 ` [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability Suanming Mou
@ 2023-04-18  9:23 ` Suanming Mou
  2023-04-18  9:23 ` [RFC PATCH 3/5] crypto/mlx5: add AES-GCM session configure Suanming Mou
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-04-18  9:23 UTC (permalink / raw)
  To: matan; +Cc: rasland, mkashani, dev

The crypto device requires the DEK(data encryption key) object for
data encryption/decryption operation.

This commit adds the AES-GCM DEK object management support.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c  |   6 +-
 drivers/common/mlx5/mlx5_devx_cmds.h  |   1 +
 drivers/common/mlx5/mlx5_prm.h        |   6 +-
 drivers/crypto/mlx5/mlx5_crypto.c     |   2 +-
 drivers/crypto/mlx5/mlx5_crypto.h     |   3 +-
 drivers/crypto/mlx5/mlx5_crypto_dek.c | 157 ++++++++++++++++++++------
 drivers/crypto/mlx5/mlx5_crypto_gcm.c |   2 +
 7 files changed, 137 insertions(+), 40 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 592a7cffdb..8b51a75cc8 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -3166,10 +3166,14 @@ mlx5_devx_cmd_create_dek_obj(void *ctx, struct mlx5_devx_dek_attr *attr)
 	ptr = MLX5_ADDR_OF(create_dek_in, in, dek);
 	MLX5_SET(dek, ptr, key_size, attr->key_size);
 	MLX5_SET(dek, ptr, has_keytag, attr->has_keytag);
+	MLX5_SET(dek, ptr, sw_wrapped, attr->sw_wrapped);
 	MLX5_SET(dek, ptr, key_purpose, attr->key_purpose);
 	MLX5_SET(dek, ptr, pd, attr->pd);
 	MLX5_SET64(dek, ptr, opaque, attr->opaque);
-	key_addr = MLX5_ADDR_OF(dek, ptr, key);
+	if (attr->sw_wrapped)
+		key_addr = MLX5_ADDR_OF(dek, ptr, sw_wrapped_dek);
+	else
+		key_addr = MLX5_ADDR_OF(dek, ptr, key);
 	memcpy(key_addr, (void *)(attr->key), MLX5_CRYPTO_KEY_MAX_SIZE);
 	dek_obj->obj = mlx5_glue->devx_obj_create(ctx, in, sizeof(in),
 						  out, sizeof(out));
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index d640482346..79502cda08 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -664,6 +664,7 @@ struct mlx5_devx_dek_attr {
 	uint32_t key_size:4;
 	uint32_t has_keytag:1;
 	uint32_t key_purpose:4;
+	uint32_t sw_wrapped:1;
 	uint32_t pd:24;
 	uint64_t opaque;
 	uint8_t key[MLX5_CRYPTO_KEY_MAX_SIZE];
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index a3b85f514e..9728be24dd 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -3736,7 +3736,8 @@ enum {
 struct mlx5_ifc_dek_bits {
 	u8 modify_field_select[0x40];
 	u8 state[0x8];
-	u8 reserved_at_48[0xc];
+	u8 sw_wrapped[0x1];
+	u8 reserved_at_49[0xb];
 	u8 key_size[0x4];
 	u8 has_keytag[0x1];
 	u8 reserved_at_59[0x3];
@@ -3747,7 +3748,8 @@ struct mlx5_ifc_dek_bits {
 	u8 opaque[0x40];
 	u8 reserved_at_1c0[0x40];
 	u8 key[0x400];
-	u8 reserved_at_600[0x200];
+	u8 sw_wrapped_dek[0x400];
+	u8 reserved_at_a00[0x300];
 };
 
 struct mlx5_ifc_create_dek_in_bits {
diff --git a/drivers/crypto/mlx5/mlx5_crypto.c b/drivers/crypto/mlx5/mlx5_crypto.c
index 6963d8a9c9..66c9f94346 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -196,7 +196,7 @@ mlx5_crypto_sym_session_configure(struct rte_cryptodev *dev,
 		return -ENOTSUP;
 	}
 	cipher = &xform->cipher;
-	sess_private_data->dek = mlx5_crypto_dek_prepare(priv, cipher);
+	sess_private_data->dek = mlx5_crypto_dek_prepare(priv, xform);
 	if (sess_private_data->dek == NULL) {
 		DRV_LOG(ERR, "Failed to prepare dek.");
 		return -ENOMEM;
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index 80c2cab0dd..11352f9409 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -40,6 +40,7 @@ struct mlx5_crypto_priv {
 	uint16_t umr_wqe_stride;
 	uint16_t max_rdmar_ds;
 	uint32_t is_wrapped_mode:1;
+	uint32_t is_gcm_dek_wrap:1;
 };
 
 struct mlx5_crypto_qp {
@@ -78,7 +79,7 @@ mlx5_crypto_dek_destroy(struct mlx5_crypto_priv *priv,
 
 struct mlx5_crypto_dek *
 mlx5_crypto_dek_prepare(struct mlx5_crypto_priv *priv,
-			struct rte_crypto_cipher_xform *cipher);
+			struct rte_crypto_sym_xform *xform);
 
 int
 mlx5_crypto_dek_setup(struct mlx5_crypto_priv *priv);
diff --git a/drivers/crypto/mlx5/mlx5_crypto_dek.c b/drivers/crypto/mlx5/mlx5_crypto_dek.c
index 7339ef2bd9..ba6dab52f7 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_dek.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_dek.c
@@ -14,10 +14,29 @@
 #include "mlx5_crypto.h"
 
 struct mlx5_crypto_dek_ctx {
-	struct rte_crypto_cipher_xform *cipher;
+	struct rte_crypto_sym_xform *xform;
 	struct mlx5_crypto_priv *priv;
 };
 
+static int
+mlx5_crypto_dek_get_key(struct rte_crypto_sym_xform *xform,
+			const uint8_t **key,
+			uint16_t *key_len)
+{
+	if (xform->type == RTE_CRYPTO_SYM_XFORM_CIPHER) {
+		*key = xform->cipher.key.data;
+		*key_len = xform->cipher.key.length;
+	} else if (xform->type == RTE_CRYPTO_SYM_XFORM_AEAD) {
+		*key = xform->aead.key.data;
+		*key_len = xform->aead.key.length;
+	} else {
+		DRV_LOG(ERR, "Xform dek type not supported.");
+		rte_errno = -EINVAL;
+		return -1;
+	}
+	return 0;
+}
+
 int
 mlx5_crypto_dek_destroy(struct mlx5_crypto_priv *priv,
 			struct mlx5_crypto_dek *dek)
@@ -27,19 +46,22 @@ mlx5_crypto_dek_destroy(struct mlx5_crypto_priv *priv,
 
 struct mlx5_crypto_dek *
 mlx5_crypto_dek_prepare(struct mlx5_crypto_priv *priv,
-			struct rte_crypto_cipher_xform *cipher)
+			struct rte_crypto_sym_xform *xform)
 {
+	const uint8_t *key;
+	uint16_t key_len;
 	struct mlx5_hlist *dek_hlist = priv->dek_hlist;
 	struct mlx5_crypto_dek_ctx dek_ctx = {
-		.cipher = cipher,
+		.xform = xform,
 		.priv = priv,
 	};
-	struct rte_crypto_cipher_xform *cipher_ctx = cipher;
-	uint64_t key64 = __rte_raw_cksum(cipher_ctx->key.data,
-					 cipher_ctx->key.length, 0);
-	struct mlx5_list_entry *entry = mlx5_hlist_register(dek_hlist,
-							     key64, &dek_ctx);
+	uint64_t key64;
+	struct mlx5_list_entry *entry;
 
+	if (mlx5_crypto_dek_get_key(xform, &key, &key_len))
+		return NULL;
+	key64 = __rte_raw_cksum(key, key_len, 0);
+	entry = mlx5_hlist_register(dek_hlist, key64, &dek_ctx);
 	return entry == NULL ? NULL :
 			     container_of(entry, struct mlx5_crypto_dek, entry);
 }
@@ -76,76 +98,141 @@ mlx5_crypto_dek_match_cb(void *tool_ctx __rte_unused,
 			 struct mlx5_list_entry *entry, void *cb_ctx)
 {
 	struct mlx5_crypto_dek_ctx *ctx = cb_ctx;
-	struct rte_crypto_cipher_xform *cipher_ctx = ctx->cipher;
+	struct rte_crypto_sym_xform *xform = ctx->xform;
 	struct mlx5_crypto_dek *dek =
 			container_of(entry, typeof(*dek), entry);
 	uint32_t key_len = dek->size;
+	uint16_t xkey_len;
+	const uint8_t *key;
 
-	if (key_len != cipher_ctx->key.length)
+	if (mlx5_crypto_dek_get_key(xform, &key, &xkey_len))
+		return -1;
+	if (key_len != xkey_len)
 		return -1;
-	return memcmp(cipher_ctx->key.data, dek->data, cipher_ctx->key.length);
+	return memcmp(key, dek->data, xkey_len);
 }
 
-static struct mlx5_list_entry *
-mlx5_crypto_dek_create_cb(void *tool_ctx __rte_unused, void *cb_ctx)
+static int
+mlx5_crypto_dek_create_aes_xts(struct mlx5_crypto_dek *dek,
+		struct mlx5_devx_dek_attr *dek_attr,
+		void *cb_ctx)
 {
 	struct mlx5_crypto_dek_ctx *ctx = cb_ctx;
-	struct rte_crypto_cipher_xform *cipher_ctx = ctx->cipher;
-	struct mlx5_crypto_dek *dek = rte_zmalloc(__func__, sizeof(*dek),
-						  RTE_CACHE_LINE_SIZE);
-	struct mlx5_devx_dek_attr dek_attr = {
-		.pd = ctx->priv->cdev->pdn,
-		.key_purpose = MLX5_CRYPTO_KEY_PURPOSE_AES_XTS,
-		.has_keytag = 1,
-	};
+	struct rte_crypto_cipher_xform *cipher_ctx = &ctx->xform->cipher;
 	bool is_wrapped = ctx->priv->is_wrapped_mode;
 
-	if (dek == NULL) {
-		DRV_LOG(ERR, "Failed to allocate dek memory.");
-		return NULL;
+	if (cipher_ctx->algo != RTE_CRYPTO_CIPHER_AES_XTS) {
+		DRV_LOG(ERR, "Only AES-XTS algo supported.");
+		return -EINVAL;
 	}
+	dek_attr->key_purpose = MLX5_CRYPTO_KEY_PURPOSE_AES_XTS;
+	dek_attr->has_keytag = 1;
 	if (is_wrapped) {
 		switch (cipher_ctx->key.length) {
 		case 48:
 			dek->size = 48;
-			dek_attr.key_size = MLX5_CRYPTO_KEY_SIZE_128b;
+			dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_128b;
 			break;
 		case 80:
 			dek->size = 80;
-			dek_attr.key_size = MLX5_CRYPTO_KEY_SIZE_256b;
+			dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_256b;
 			break;
 		default:
 			DRV_LOG(ERR, "Wrapped key size not supported.");
-			return NULL;
+			return -EINVAL;
 		}
 	} else {
 		switch (cipher_ctx->key.length) {
 		case 32:
 			dek->size = 40;
-			dek_attr.key_size = MLX5_CRYPTO_KEY_SIZE_128b;
+			dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_128b;
 			break;
 		case 64:
 			dek->size = 72;
-			dek_attr.key_size = MLX5_CRYPTO_KEY_SIZE_256b;
+			dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_256b;
 			break;
 		default:
 			DRV_LOG(ERR, "Key size not supported.");
-			return NULL;
+			return -EINVAL;
 		}
-		memcpy(&dek_attr.key[cipher_ctx->key.length],
+		memcpy(&dek_attr->key[cipher_ctx->key.length],
 						&ctx->priv->keytag, 8);
 	}
-	memcpy(&dek_attr.key, cipher_ctx->key.data, cipher_ctx->key.length);
+	memcpy(&dek_attr->key, cipher_ctx->key.data, cipher_ctx->key.length);
+	memcpy(&dek->data, cipher_ctx->key.data, cipher_ctx->key.length);
+	return 0;
+}
+
+static int
+mlx5_crypto_dek_create_aes_gcm(struct mlx5_crypto_dek *dek,
+		struct mlx5_devx_dek_attr *dek_attr,
+		void *cb_ctx)
+{
+	struct mlx5_crypto_dek_ctx *ctx = cb_ctx;
+	struct rte_crypto_aead_xform *aead_ctx = &ctx->xform->aead;
+
+	if (aead_ctx->algo != RTE_CRYPTO_AEAD_AES_GCM) {
+		DRV_LOG(ERR, "Only AES-GCM algo supported.");
+		return -EINVAL;
+	}
+	dek_attr->key_purpose = MLX5_CRYPTO_KEY_PURPOSE_GCM;
+	switch (aead_ctx->key.length) {
+	case 16:
+		dek->size = 16;
+		dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_128b;
+		break;
+	case 32:
+		dek->size = 32;
+		dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_256b;
+		break;
+	default:
+		DRV_LOG(ERR, "Wrapped key size not supported.");
+		return -EINVAL;
+	}
+#ifdef MLX5_DEK_WRAP
+	if (ctx->priv->is_gcm_dek_wrap)
+		dek_attr->sw_wrapped = 1;
+#endif
+	memcpy(&dek_attr->key, aead_ctx->key.data, aead_ctx->key.length);
+	memcpy(&dek->data, aead_ctx->key.data, aead_ctx->key.length);
+	return 0;
+}
+
+static struct mlx5_list_entry *
+mlx5_crypto_dek_create_cb(void *tool_ctx __rte_unused, void *cb_ctx)
+{
+	struct mlx5_crypto_dek_ctx *ctx = cb_ctx;
+	struct rte_crypto_sym_xform *xform = ctx->xform;
+	struct mlx5_crypto_dek *dek = rte_zmalloc(__func__, sizeof(*dek),
+						  RTE_CACHE_LINE_SIZE);
+	struct mlx5_devx_dek_attr dek_attr = {
+		.pd = ctx->priv->cdev->pdn,
+	};
+	int ret = -1;
+
+	if (dek == NULL) {
+		DRV_LOG(ERR, "Failed to allocate dek memory.");
+		return NULL;
+	}
+	if (xform->type == RTE_CRYPTO_SYM_XFORM_CIPHER)
+		ret = mlx5_crypto_dek_create_aes_xts(dek, &dek_attr, cb_ctx);
+	else if (xform->type == RTE_CRYPTO_SYM_XFORM_AEAD)
+		ret = mlx5_crypto_dek_create_aes_gcm(dek, &dek_attr, cb_ctx);
+	if (ret)
+		goto fail;
 	dek->obj = mlx5_devx_cmd_create_dek_obj(ctx->priv->cdev->ctx,
 						&dek_attr);
 	if (dek->obj == NULL) {
-		rte_free(dek);
-		return NULL;
+		DRV_LOG(ERR, "Failed to create dek obj.");
+		goto fail;
 	}
-	memcpy(&dek->data, cipher_ctx->key.data, cipher_ctx->key.length);
 	return &dek->entry;
+fail:
+	rte_free(dek);
+	return NULL;
 }
 
+
 static void
 mlx5_crypto_dek_remove_cb(void *tool_ctx __rte_unused,
 			  struct mlx5_list_entry *entry)
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index d60ac379cf..c7fd86d7b9 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -95,6 +95,8 @@ mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 		return -1;
 	}
 	priv->caps = mlx5_crypto_gcm_caps;
+	priv->is_gcm_dek_wrap = !!(cdev->config.hca_attr.sw_wrapped_dek &
+				(1 << MLX5_CRYPTO_KEY_PURPOSE_GCM));
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [RFC PATCH 3/5] crypto/mlx5: add AES-GCM session configure
  2023-04-18  9:23 [RFC PATCH 0/5] crypto/mlx5: support AES-GCM Suanming Mou
  2023-04-18  9:23 ` [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability Suanming Mou
  2023-04-18  9:23 ` [RFC PATCH 2/5] crypto/mlx5: add AES-GCM encryption key Suanming Mou
@ 2023-04-18  9:23 ` Suanming Mou
  2023-04-18  9:23 ` [RFC PATCH 4/5] crypto/mlx5: add queue pair setup Suanming Mou
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-04-18  9:23 UTC (permalink / raw)
  To: matan; +Cc: rasland, mkashani, dev

Sessions are used in symmetric transformations in order to prepare
objects and data for packet processing stage.

The AES-GCM session includes IV, AAD, digest(tag), DEK, operation
mode information.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/common/mlx5/mlx5_prm.h        | 12 +++++++
 drivers/crypto/mlx5/mlx5_crypto.c     | 15 ---------
 drivers/crypto/mlx5/mlx5_crypto.h     | 35 ++++++++++++++++++++
 drivers/crypto/mlx5/mlx5_crypto_gcm.c | 46 +++++++++++++++++++++++++++
 4 files changed, 93 insertions(+), 15 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 9728be24dd..25ff66ee7e 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -528,11 +528,23 @@ enum {
 	MLX5_BLOCK_SIZE_4048B	= 0x6,
 };
 
+enum {
+	MLX5_ENCRYPTION_TYPE_AES_GCM = 0x3,
+};
+
+enum {
+	MLX5_CRYPTO_OP_TYPE_ENCRYPTION = 0x0,
+	MLX5_CRYPTO_OP_TYPE_DECRYPTION = 0x1,
+};
+
 #define MLX5_BSF_SIZE_OFFSET		30
 #define MLX5_BSF_P_TYPE_OFFSET		24
 #define MLX5_ENCRYPTION_ORDER_OFFSET	16
 #define MLX5_BLOCK_SIZE_OFFSET		24
 
+#define MLX5_CRYPTO_MMO_TYPE_OFFSET 24
+#define MLX5_CRYPTO_MMO_OP_OFFSET 20
+
 struct mlx5_wqe_umr_bsf_seg {
 	/*
 	 * bs_bpt_eo_es contains:
diff --git a/drivers/crypto/mlx5/mlx5_crypto.c b/drivers/crypto/mlx5/mlx5_crypto.c
index 66c9f94346..8946f13e5e 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -83,21 +83,6 @@ static const struct rte_driver mlx5_drv = {
 
 static struct cryptodev_driver mlx5_cryptodev_driver;
 
-struct mlx5_crypto_session {
-	uint32_t bs_bpt_eo_es;
-	/**< bsf_size, bsf_p_type, encryption_order and encryption standard,
-	 * saved in big endian format.
-	 */
-	uint32_t bsp_res;
-	/**< crypto_block_size_pointer and reserved 24 bits saved in big
-	 * endian format.
-	 */
-	uint32_t iv_offset:16;
-	/**< Starting point for Initialisation Vector. */
-	struct mlx5_crypto_dek *dek; /**< Pointer to dek struct. */
-	uint32_t dek_id; /**< DEK ID */
-} __rte_packed;
-
 static void
 mlx5_crypto_dev_infos_get(struct rte_cryptodev *dev,
 			  struct rte_cryptodev_info *dev_info)
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index 11352f9409..c34a860404 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -73,6 +73,41 @@ struct mlx5_crypto_devarg_params {
 	uint32_t is_aes_gcm:1;
 };
 
+struct mlx5_crypto_session {
+	union {
+		/**< AES-XTS configuration. */
+		struct {
+			uint32_t bs_bpt_eo_es;
+			/**< bsf_size, bsf_p_type, encryption_order and encryption standard,
+			 * saved in big endian format.
+			 */
+			uint32_t bsp_res;
+			/**< crypto_block_size_pointer and reserved 24 bits saved in big
+			 * endian format.
+			 */
+		};
+		/**< AES-GCM configuration. */
+		struct {
+			uint32_t mmo_ctrl;
+			/**< Crypto control fields with algo type and op type in big
+			 * endian format.
+			 */
+			uint16_t tag_len;
+			/**< AES-GCM crypto digest size in bytes. */
+			uint16_t aad_len;
+			/**< The length of the additional authenticated data (AAD) in bytes. */
+			uint32_t op_type;
+			/**< Operation type. */
+		};
+	};
+	uint32_t iv_offset:16;
+	/**< Starting point for Initialisation Vector. */
+	uint32_t iv_len;
+	/**< Initialisation Vector length. */
+	struct mlx5_crypto_dek *dek; /**< Pointer to dek struct. */
+	uint32_t dek_id; /**< DEK ID */
+} __rte_packed;
+
 int
 mlx5_crypto_dek_destroy(struct mlx5_crypto_priv *priv,
 			struct mlx5_crypto_dek *dek);
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index c7fd86d7b9..6c2c759fba 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -81,12 +81,58 @@ mlx5_crypto_generate_gcm_cap(struct mlx5_hca_crypto_mmo_attr *mmo_attr,
 	return 0;
 }
 
+static int
+mlx5_crypto_sym_gcm_session_configure(struct rte_cryptodev *dev,
+				  struct rte_crypto_sym_xform *xform,
+				  struct rte_cryptodev_sym_session *session)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_crypto_session *sess_private_data = CRYPTODEV_GET_SYM_SESS_PRIV(session);
+	struct rte_crypto_aead_xform *aead = &xform->aead;
+	uint32_t op_type;
+
+	if (unlikely(xform->next != NULL)) {
+		DRV_LOG(ERR, "Xform next is not supported.");
+		return -ENOTSUP;
+	}
+	if (aead->algo != RTE_CRYPTO_AEAD_AES_GCM) {
+		DRV_LOG(ERR, "Only AES-GCM algorithm is supported.");
+		return -ENOTSUP;
+	}
+	if (aead->op == RTE_CRYPTO_AEAD_OP_ENCRYPT)
+		op_type = MLX5_CRYPTO_OP_TYPE_ENCRYPTION;
+	else
+		op_type = MLX5_CRYPTO_OP_TYPE_DECRYPTION;
+	sess_private_data->op_type = op_type;
+	sess_private_data->mmo_ctrl = rte_cpu_to_be_32
+			(op_type << MLX5_CRYPTO_MMO_OP_OFFSET |
+			 MLX5_ENCRYPTION_TYPE_AES_GCM << MLX5_CRYPTO_MMO_TYPE_OFFSET);
+	sess_private_data->aad_len = aead->aad_length;
+	sess_private_data->tag_len = aead->digest_length;
+	sess_private_data->iv_offset = aead->iv.offset;
+	sess_private_data->iv_len = aead->iv.length;
+	sess_private_data->dek = mlx5_crypto_dek_prepare(priv, xform);
+	if (sess_private_data->dek == NULL) {
+		DRV_LOG(ERR, "Failed to prepare dek.");
+		return -ENOMEM;
+	}
+	sess_private_data->dek_id =
+			rte_cpu_to_be_32(sess_private_data->dek->obj->id &
+					 0xffffff);
+	DRV_LOG(DEBUG, "Session %p was configured.", sess_private_data);
+	return 0;
+}
+
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 {
 	struct mlx5_common_device *cdev = priv->cdev;
+	struct rte_cryptodev *crypto_dev = priv->crypto_dev;
+	struct rte_cryptodev_ops *dev_ops = crypto_dev->dev_ops;
 	int ret;
 
+	/* Override AES-GCM specified ops. */
+	dev_ops->sym_session_configure = mlx5_crypto_sym_gcm_session_configure;
 	/* Generate GCM capability. */
 	ret = mlx5_crypto_generate_gcm_cap(&cdev->config.hca_attr.crypto_mmo,
 					   mlx5_crypto_gcm_caps);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [RFC PATCH 4/5] crypto/mlx5: add queue pair setup
  2023-04-18  9:23 [RFC PATCH 0/5] crypto/mlx5: support AES-GCM Suanming Mou
                   ` (2 preceding siblings ...)
  2023-04-18  9:23 ` [RFC PATCH 3/5] crypto/mlx5: add AES-GCM session configure Suanming Mou
@ 2023-04-18  9:23 ` Suanming Mou
  2023-04-18  9:23 ` [RFC PATCH 5/5] crypto/mlx5: add enqueue and dequeue operations Suanming Mou
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-04-18  9:23 UTC (permalink / raw)
  To: matan; +Cc: rasland, mkashani, dev

Crypto queue pair is for handling the encryption/decryption operations.

As AES-GCM AEAD API provides AAD, mbuf, digest separately, low-level FW
only accepts the data in a single contiguous memory region, two internal
QPs are created for AES-GCM queue pair. One for organizing the memory
to be contiguous if they are not. The other is for crypto.

If the buffers are checked as implicitly contiguous, the buffer will be
sent to the crypto QP directly for encryption/decryption. If not, the
buffers will be handled by the first UMR QP. The UMR QP will convert
the buffers to be contiguous one. Then the well organized "new" buffer
can be handled by crypto QP.

The crypto QP is initialized as follower, and UMR as leader. Once
crypto operation input buffer requires memory address space converting
by UMR QP, the crypto QP processing will be triggered by UMR QP.
Otherwise, the ring crypto QP doorbell directly.

The existing max_segs_num devarg is used for define how many segments
the chained mbuf contains same as AES-XTS before.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c  |   6 +
 drivers/common/mlx5/mlx5_devx_cmds.h  |   3 +
 drivers/common/mlx5/mlx5_prm.h        |  24 +++
 drivers/crypto/mlx5/mlx5_crypto.c     |  17 ++
 drivers/crypto/mlx5/mlx5_crypto.h     |  12 ++
 drivers/crypto/mlx5/mlx5_crypto_gcm.c | 254 ++++++++++++++++++++++++++
 6 files changed, 316 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 8b51a75cc8..6be02c0a65 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2563,6 +2563,12 @@ mlx5_devx_cmd_create_qp(void *ctx,
 				 attr->dbr_umem_valid);
 			MLX5_SET(qpc, qpc, dbr_umem_id, attr->dbr_umem_id);
 		}
+		if (attr->cd_master)
+			MLX5_SET(qpc, qpc, cd_master, attr->cd_master);
+		if (attr->cd_slave_send)
+			MLX5_SET(qpc, qpc, cd_slave_send, attr->cd_slave_send);
+		if (attr->cd_slave_recv)
+			MLX5_SET(qpc, qpc, cd_slave_receive, attr->cd_slave_recv);
 		MLX5_SET64(qpc, qpc, dbr_addr, attr->dbr_address);
 		MLX5_SET64(create_qp_in, in, wq_umem_offset,
 			   attr->wq_umem_offset);
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 79502cda08..e68aa077d7 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -590,6 +590,9 @@ struct mlx5_devx_qp_attr {
 	uint64_t wq_umem_offset;
 	uint32_t user_index:24;
 	uint32_t mmo:1;
+	uint32_t cd_master:1;
+	uint32_t cd_slave_send:1;
+	uint32_t cd_slave_recv:1;
 };
 
 struct mlx5_devx_virtio_q_couners_attr {
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 25ff66ee7e..c8d73a8456 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -594,6 +594,17 @@ struct mlx5_rdma_write_wqe {
 	struct mlx5_wqe_dseg dseg[];
 } __rte_packed;
 
+struct mlx5_wqe_send_en_seg {
+	uint32_t reserve[2];
+	uint32_t sqnpc;
+	uint32_t qpn;
+} __rte_packed;
+
+struct mlx5_wqe_send_en_wqe {
+	struct mlx5_wqe_cseg ctr;
+	struct mlx5_wqe_send_en_seg sseg;
+} __rte_packed;
+
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
@@ -668,6 +679,19 @@ union mlx5_gga_compress_opaque {
 	uint32_t data[64];
 };
 
+union mlx5_gga_crypto_opaque {
+	struct {
+		uint32_t syndrome;
+		uint32_t reserved0[2];
+		struct {
+			uint32_t iv[3];
+			uint32_t tag_size;
+			uint32_t aad_size;
+		} cp __rte_packed;
+	} __rte_packed;
+	uint8_t data[64];
+};
+
 struct mlx5_ifc_regexp_mmo_control_bits {
 	uint8_t reserved_at_31[0x2];
 	uint8_t le[0x1];
diff --git a/drivers/crypto/mlx5/mlx5_crypto.c b/drivers/crypto/mlx5/mlx5_crypto.c
index 8946f13e5e..f2e5b25c15 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -849,12 +849,27 @@ mlx5_crypto_max_segs_num(uint16_t max_wqe_size)
 	return max_segs_cap;
 }
 
+static __rte_always_inline int
+mlx5_crypto_configure_gcm_wqe_size(struct mlx5_crypto_priv *priv)
+{
+	uint32_t send_en_wqe_size;
+
+	priv->umr_wqe_size = RTE_ALIGN(sizeof(struct mlx5_umr_wqe) + sizeof(struct mlx5_wqe_dseg),
+		MLX5_SEND_WQE_BB);
+	send_en_wqe_size = RTE_ALIGN(sizeof(struct mlx5_wqe_send_en_wqe), MLX5_SEND_WQE_BB);
+	priv->umr_wqe_stride = priv->umr_wqe_size / MLX5_SEND_WQE_BB;
+	priv->wqe_set_size = priv->umr_wqe_size + send_en_wqe_size;
+	return 0;
+}
+
 static int
 mlx5_crypto_configure_wqe_size(struct mlx5_crypto_priv *priv,
 				uint16_t max_wqe_size, uint32_t max_segs_num)
 {
 	uint32_t rdmw_wqe_size, umr_wqe_size;
 
+	if (priv->is_gcm_dek_wrap)
+		return mlx5_crypto_configure_gcm_wqe_size(priv);
 	mlx5_crypto_get_wqe_sizes(max_segs_num, &umr_wqe_size,
 					&rdmw_wqe_size);
 	priv->wqe_set_size = rdmw_wqe_size + umr_wqe_size;
@@ -927,12 +942,14 @@ mlx5_crypto_dev_probe(struct mlx5_common_device *cdev,
 	priv->cdev = cdev;
 	priv->crypto_dev = crypto_dev;
 	priv->is_wrapped_mode = wrapped_mode;
+	priv->max_segs_num = devarg_prms.max_segs_num;
 	priv->caps = mlx5_crypto_caps;
 	/* Init and override AES-GCM configuration. */
 	if (devarg_prms.is_aes_gcm) {
 		ret = mlx5_crypto_gcm_init(priv);
 		if (ret) {
 			DRV_LOG(ERR, "Failed to init AES-GCM crypto.");
+			return -ENOTSUP;
 		}
 	}
 	if (mlx5_devx_uar_prepare(cdev, &priv->uar) != 0) {
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index c34a860404..9945891ea8 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -47,15 +47,27 @@ struct mlx5_crypto_qp {
 	struct mlx5_crypto_priv *priv;
 	struct mlx5_devx_cq cq_obj;
 	struct mlx5_devx_qp qp_obj;
+	struct mlx5_devx_cq umr_cq_obj;
+	struct mlx5_devx_qp umr_qp_obj;
 	struct rte_cryptodev_stats stats;
 	struct rte_crypto_op **ops;
 	struct mlx5_devx_obj **mkey; /* WQE's indirect mekys. */
+	struct mlx5_klm *klm_array;
 	struct mlx5_mr_ctrl mr_ctrl;
+	struct mlx5_pmd_mr opaque_mr;
+	struct mlx5_pmd_mr klm_mr;
+	/* Crypto QP. */
 	uint8_t *wqe;
 	uint16_t entries_n;
 	uint16_t pi;
 	uint16_t ci;
 	uint16_t db_pi;
+	/* UMR QP. */
+	uint8_t *umr_wqe;
+	uint16_t umr_wqbbs;
+	uint16_t umr_pi;
+	uint16_t umr_ci;
+	uint32_t umr_errors;
 };
 
 struct mlx5_crypto_dek {
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index 6c2c759fba..b67f22c591 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -123,6 +123,257 @@ mlx5_crypto_sym_gcm_session_configure(struct rte_cryptodev *dev,
 	return 0;
 }
 
+static void
+mlx5_crypto_gcm_indirect_mkeys_release(struct mlx5_crypto_qp *qp, uint16_t n)
+{
+	uint16_t i;
+
+	for (i = 0; i < n; i++)
+		if (qp->mkey[i])
+			claim_zero(mlx5_devx_cmd_destroy(qp->mkey[i]));
+}
+
+static int
+mlx5_crypto_gcm_indirect_mkeys_prepare(struct mlx5_crypto_priv *priv,
+				  struct mlx5_crypto_qp *qp)
+{
+	uint32_t i;
+	struct mlx5_devx_mkey_attr attr = {
+		.pd = priv->cdev->pdn,
+		.umr_en = 1,
+		.set_remote_rw = 1,
+		.klm_num = priv->max_segs_num,
+	};
+
+	for (i = 0; i < qp->entries_n; i++) {
+		attr.klm_array = (struct mlx5_klm *)&qp->klm_array[i * priv->max_segs_num];
+		qp->mkey[i] = mlx5_devx_cmd_mkey_create(priv->cdev->ctx, &attr);
+		if (!qp->mkey[i])
+			goto error;
+	}
+	return 0;
+error:
+	DRV_LOG(ERR, "Failed to allocate gcm indirect mkey.");
+	mlx5_crypto_gcm_indirect_mkeys_release(qp, i);
+	return -1;
+}
+
+static int
+mlx5_crypto_gcm_qp_release(struct rte_cryptodev *dev, uint16_t qp_id)
+{
+	struct mlx5_crypto_qp *qp = dev->data->queue_pairs[qp_id];
+
+	if (qp->umr_qp_obj.qp != NULL)
+		mlx5_devx_qp_destroy(&qp->umr_qp_obj);
+	if (qp->umr_cq_obj.cq != NULL)
+		mlx5_devx_cq_destroy(&qp->umr_cq_obj);
+	if (qp->qp_obj.qp != NULL)
+		mlx5_devx_qp_destroy(&qp->qp_obj);
+	if (qp->cq_obj.cq != NULL)
+		mlx5_devx_cq_destroy(&qp->cq_obj);
+	if (qp->opaque_mr.obj != NULL) {
+		void *opaq = qp->opaque_mr.addr;
+
+		mlx5_common_verbs_dereg_mr(&qp->opaque_mr);
+		rte_free(opaq);
+	}
+	mlx5_crypto_gcm_indirect_mkeys_release(qp, qp->entries_n);
+	if (qp->klm_mr.obj != NULL) {
+		void *klm = qp->klm_mr.addr;
+
+		mlx5_common_verbs_dereg_mr(&qp->klm_mr);
+		rte_free(klm);
+	}
+	mlx5_mr_btree_free(&qp->mr_ctrl.cache_bh);
+	rte_free(qp);
+	dev->data->queue_pairs[qp_id] = NULL;
+	return 0;
+}
+
+static void
+mlx5_crypto_gcm_init_qp(struct mlx5_crypto_qp *qp)
+{
+	volatile struct mlx5_gga_wqe *restrict wqe =
+				    (volatile struct mlx5_gga_wqe *)qp->qp_obj.wqes;
+	volatile union mlx5_gga_crypto_opaque *opaq = qp->opaque_mr.addr;
+	const uint32_t sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) | 4u);
+	const uint32_t flags = RTE_BE32(MLX5_COMP_ALWAYS <<
+					MLX5_COMP_MODE_OFFSET);
+	const uint32_t opaq_lkey = rte_cpu_to_be_32(qp->opaque_mr.lkey);
+	int i;
+
+	/* All the next fields state should stay constant. */
+	for (i = 0; i < qp->entries_n; ++i, ++wqe) {
+		wqe->sq_ds = sq_ds;
+		wqe->flags = flags;
+		wqe->opaque_lkey = opaq_lkey;
+		wqe->opaque_vaddr = rte_cpu_to_be_64((uint64_t)(uintptr_t)&opaq[i]);
+	}
+}
+
+static inline int
+mlx5_crypto_gcm_umr_qp_setup(struct rte_cryptodev *dev, struct mlx5_crypto_qp *qp,
+			     uint16_t log_nb_desc, int socket_id)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_devx_qp_attr attr = {0};
+	uint32_t ret;
+	uint32_t log_wqbb_n;
+	struct mlx5_devx_cq_attr cq_attr = {
+		.use_first_only = 1,
+		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
+	};
+	size_t klm_size = priv->max_segs_num * sizeof(struct mlx5_klm);
+	void *klm_array;
+
+	klm_array = rte_calloc(__func__, (size_t)qp->entries_n, klm_size, 64);
+	if (klm_array == NULL) {
+		DRV_LOG(ERR, "Failed to allocate opaque memory.");
+		rte_errno = ENOMEM;
+		return -1;
+	}
+	if (mlx5_common_verbs_reg_mr(priv->cdev->pd, klm_array,
+				     qp->entries_n * klm_size,
+				     &qp->klm_mr) != 0) {
+		rte_free(klm_array);
+		DRV_LOG(ERR, "Failed to register klm MR.");
+		rte_errno = ENOMEM;
+		return -1;
+	}
+	qp->klm_array = (struct mlx5_klm *)qp->klm_mr.addr;
+	if (mlx5_devx_cq_create(priv->cdev->ctx, &qp->umr_cq_obj, log_nb_desc,
+				&cq_attr, socket_id) != 0) {
+		DRV_LOG(ERR, "Failed to create UMR CQ.");
+		return -1;
+	}
+	/* Set UMR + SEND_EN WQE as maximum same with crypto. */
+	log_wqbb_n = rte_log2_u32(qp->entries_n *
+			(priv->wqe_set_size / MLX5_SEND_WQE_BB));
+	attr.pd = priv->cdev->pdn;
+	attr.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar.obj);
+	attr.cqn = qp->umr_cq_obj.cq->id;
+	attr.num_of_receive_wqes = 0;
+	attr.num_of_send_wqbbs = RTE_BIT32(log_wqbb_n);
+	attr.ts_format =
+		mlx5_ts_format_conv(priv->cdev->config.hca_attr.qp_ts_format);
+	attr.cd_master = 1;
+	ret = mlx5_devx_qp_create(priv->cdev->ctx, &qp->umr_qp_obj,
+				  attr.num_of_send_wqbbs * MLX5_SEND_WQE_BB,
+				  &attr, socket_id);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to create UMR QP.");
+		return -1;
+	}
+	if (mlx5_devx_qp2rts(&qp->umr_qp_obj, qp->umr_qp_obj.qp->id)) {
+		DRV_LOG(ERR, "Failed to change UMR QP state to RTS.");
+		return -1;
+	}
+	/* Save the UMR WQEBBS for checking the WQE boundary. */
+	qp->umr_wqbbs = attr.num_of_send_wqbbs;
+	return 0;
+}
+
+static int
+mlx5_crypto_gcm_qp_setup(struct rte_cryptodev *dev, uint16_t qp_id,
+			 const struct rte_cryptodev_qp_conf *qp_conf,
+			 int socket_id)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_hca_attr *attr = &priv->cdev->config.hca_attr;
+	struct mlx5_crypto_qp *qp;
+	struct mlx5_devx_cq_attr cq_attr = {
+		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
+	};
+	struct mlx5_devx_qp_attr qp_attr = {
+		.pd = priv->cdev->pdn,
+		.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
+		.user_index = qp_id,
+	};
+	uint32_t log_ops_n = rte_log2_u32(qp_conf->nb_descriptors);
+	uint32_t entries = RTE_BIT32(log_ops_n);
+	uint32_t alloc_size = sizeof(*qp);
+	void *opaq_buf;
+	int ret;
+
+	alloc_size = RTE_ALIGN(alloc_size, RTE_CACHE_LINE_SIZE);
+	alloc_size += (sizeof(struct rte_crypto_op *) +
+		       sizeof(struct mlx5_devx_obj *)) * entries;
+	qp = rte_zmalloc_socket(__func__, alloc_size, RTE_CACHE_LINE_SIZE,
+				socket_id);
+	if (qp == NULL) {
+		DRV_LOG(ERR, "Failed to allocate qp memory.");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	qp->priv = priv;
+	qp->entries_n = entries;
+	if (mlx5_mr_ctrl_init(&qp->mr_ctrl, &priv->cdev->mr_scache.dev_gen,
+				  priv->dev_config.socket_id)) {
+		DRV_LOG(ERR, "Cannot allocate MR Btree for qp %u.",
+			(uint32_t)qp_id);
+		rte_errno = ENOMEM;
+		goto err;
+	}
+	opaq_buf = rte_calloc(__func__, (size_t)entries,
+			      sizeof(union mlx5_gga_crypto_opaque),
+			      sizeof(union mlx5_gga_crypto_opaque));
+	if (opaq_buf == NULL) {
+		DRV_LOG(ERR, "Failed to allocate opaque memory.");
+		rte_errno = ENOMEM;
+		goto err;
+	}
+	if (mlx5_common_verbs_reg_mr(priv->cdev->pd, opaq_buf, entries *
+				     sizeof(union mlx5_gga_crypto_opaque),
+				     &qp->opaque_mr) != 0) {
+		rte_free(opaq_buf);
+		DRV_LOG(ERR, "Failed to register opaque MR.");
+		rte_errno = ENOMEM;
+		goto err;
+	}
+	ret = mlx5_devx_cq_create(priv->cdev->ctx, &qp->cq_obj, log_ops_n,
+				  &cq_attr, socket_id);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Failed to create CQ.");
+		goto err;
+	}
+	qp_attr.cqn = qp->cq_obj.cq->id;
+	qp_attr.ts_format = mlx5_ts_format_conv(attr->qp_ts_format);
+	qp_attr.num_of_receive_wqes = 0;
+	qp_attr.num_of_send_wqbbs = entries;
+	qp_attr.mmo = attr->crypto_mmo.crypto_mmo_qp;
+	/* Set MMO QP as follower as the input data may depend on UMR. */
+	qp_attr.cd_slave_send = 1;
+	ret = mlx5_devx_qp_create(priv->cdev->ctx, &qp->qp_obj,
+				  qp_attr.num_of_send_wqbbs * MLX5_WQE_SIZE,
+				  &qp_attr, socket_id);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Failed to create QP.");
+		goto err;
+	}
+	mlx5_crypto_gcm_init_qp(qp);
+	ret = mlx5_devx_qp2rts(&qp->qp_obj, 0);
+	if (ret)
+		goto err;
+	qp->ops = (struct rte_crypto_op **)(qp + 1);
+	qp->mkey = (struct mlx5_devx_obj **)(qp->ops + entries);
+	if (mlx5_crypto_gcm_umr_qp_setup(dev, qp, log_ops_n, socket_id)) {
+		DRV_LOG(ERR, "Failed to setup UMR QP.");
+		goto err;
+	}
+	DRV_LOG(INFO, "QP %u: SQN=0x%X CQN=0x%X entries num = %u",
+		(uint32_t)qp_id, qp->qp_obj.qp->id, qp->cq_obj.cq->id, entries);
+	if (mlx5_crypto_gcm_indirect_mkeys_prepare(priv, qp)) {
+		DRV_LOG(ERR, "Cannot allocate indirect memory regions.");
+		rte_errno = ENOMEM;
+		goto err;
+	}
+	dev->data->queue_pairs[qp_id] = qp;
+	return 0;
+err:
+	mlx5_crypto_gcm_qp_release(dev, qp_id);
+	return -1;
+}
+
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 {
@@ -133,6 +384,8 @@ mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 
 	/* Override AES-GCM specified ops. */
 	dev_ops->sym_session_configure = mlx5_crypto_sym_gcm_session_configure;
+	dev_ops->queue_pair_setup = mlx5_crypto_gcm_qp_setup;
+	dev_ops->queue_pair_release = mlx5_crypto_gcm_qp_release;
 	/* Generate GCM capability. */
 	ret = mlx5_crypto_generate_gcm_cap(&cdev->config.hca_attr.crypto_mmo,
 					   mlx5_crypto_gcm_caps);
@@ -140,6 +393,7 @@ mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 		DRV_LOG(ERR, "No enough AES-GCM cap.");
 		return -1;
 	}
+	priv->max_segs_num = rte_align32pow2((priv->max_segs_num + 2) * 2);
 	priv->caps = mlx5_crypto_gcm_caps;
 	priv->is_gcm_dek_wrap = !!(cdev->config.hca_attr.sw_wrapped_dek &
 				(1 << MLX5_CRYPTO_KEY_PURPOSE_GCM));
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [RFC PATCH 5/5] crypto/mlx5: add enqueue and dequeue operations
  2023-04-18  9:23 [RFC PATCH 0/5] crypto/mlx5: support AES-GCM Suanming Mou
                   ` (3 preceding siblings ...)
  2023-04-18  9:23 ` [RFC PATCH 4/5] crypto/mlx5: add queue pair setup Suanming Mou
@ 2023-04-18  9:23 ` Suanming Mou
  2023-05-26  3:14 ` [PATCH v2 0/9] crypto/mlx5: support AES-GCM Suanming Mou
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-04-18  9:23 UTC (permalink / raw)
  To: matan; +Cc: rasland, mkashani, dev

The crypto operations are performed with crypto WQE. If the input
buffers(AAD, mbuf, digest) are not contiguous, as the requirement
from FW, an UMR WQE is needed to generate contiguous address space
for crypto WQE. The UMR WQE and crypto WQE are handled in two
different QPs.

The QP for UMR operation contains two types of WQE, UMR and SEND_EN
WQE. The WQEs are built dynamically according to the crypto operation
buffer address. Crypto operation with non-contiguous buffers will
have its own UMR WQE, while the operation with contiguous buffers
doesn't need the UMR WQE. Once the all the operations WQE in the
enqueue burst built finishes, if any UMR WQEs are built, additional
SEND_EN WQE will be as the final WQE of the burst in the UMR QP.
The purpose of that SEND_EN WQE is to trigger the crypto QP processing
with the UMR ready input memory address space buffers.

The QP for crypto operations contains only the crypto WQE and the QP
WQEs are built as fixed in QP setup. The QP processing is triggered
by doorbell ring or the SEND_EN WQE from UMR QP.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/common/mlx5/mlx5_prm.h        |   1 +
 drivers/crypto/mlx5/mlx5_crypto.h     |   2 +
 drivers/crypto/mlx5/mlx5_crypto_gcm.c | 401 ++++++++++++++++++++++++++
 3 files changed, 404 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index c8d73a8456..71000ebf02 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -613,6 +613,7 @@ struct mlx5_wqe_send_en_wqe {
 /* MMO metadata segment */
 
 #define	MLX5_OPCODE_MMO	0x2fu
+#define	MLX5_OPC_MOD_MMO_CRYPTO 0x6u
 #define	MLX5_OPC_MOD_MMO_REGEX 0x4u
 #define	MLX5_OPC_MOD_MMO_COMP 0x2u
 #define	MLX5_OPC_MOD_MMO_DECOMP 0x3u
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index 9945891ea8..0b0ef1a84d 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -66,8 +66,10 @@ struct mlx5_crypto_qp {
 	uint8_t *umr_wqe;
 	uint16_t umr_wqbbs;
 	uint16_t umr_pi;
+	uint16_t umr_last_pi;
 	uint16_t umr_ci;
 	uint32_t umr_errors;
+	bool has_umr;
 };
 
 struct mlx5_crypto_dek {
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index b67f22c591..40cf4c804e 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -9,6 +9,7 @@
 #include <rte_log.h>
 #include <bus_pci_driver.h>
 #include <rte_memory.h>
+#include <rte_io.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
@@ -18,6 +19,17 @@
 #include "mlx5_crypto_utils.h"
 #include "mlx5_crypto.h"
 
+#define MLX5_MMO_CRYPTO_OPC (MLX5_OPCODE_MMO | \
+	(MLX5_OPC_MOD_MMO_CRYPTO << WQE_CSEG_OPC_MOD_OFFSET))
+
+struct mlx5_crypto_gcm_data {
+	void *src_addr;
+	uint32_t src_bytes;
+	void *dst_addr;
+	uint32_t dst_bytes;
+	uint32_t mkey;
+};
+
 static struct rte_cryptodev_capabilities mlx5_crypto_gcm_caps[] = {
 	{
 		.op = RTE_CRYPTO_OP_TYPE_UNDEFINED,
@@ -246,6 +258,10 @@ mlx5_crypto_gcm_umr_qp_setup(struct rte_cryptodev *dev, struct mlx5_crypto_qp *q
 		DRV_LOG(ERR, "Failed to create UMR CQ.");
 		return -1;
 	}
+	/* Init CQ to ones to be in HW owner in the start. */
+	qp->umr_cq_obj.cqes[0].op_own = MLX5_CQE_OWNER_MASK;
+	qp->umr_cq_obj.cqes[0].wqe_counter = rte_cpu_to_be_16(UINT16_MAX);
+	qp->umr_last_pi = UINT16_MAX;
 	/* Set UMR + SEND_EN WQE as maximum same with crypto. */
 	log_wqbb_n = rte_log2_u32(qp->entries_n *
 			(priv->wqe_set_size / MLX5_SEND_WQE_BB));
@@ -374,6 +390,389 @@ mlx5_crypto_gcm_qp_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 	return -1;
 }
 
+static __rte_always_inline bool
+mlx5_crypto_is_gcm_input_continuous(struct rte_crypto_op *op)
+{
+	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+	struct rte_mbuf *m_src = op->sym->m_src;
+	void *aad_addr = op->sym->aead.aad.data;
+	void *tag_addr = op->sym->aead.digest.data;
+	void *pkt_addr = rte_pktmbuf_mtod_offset(m_src, void *, op->sym->aead.data.offset);
+
+	/* Out of place mode, AAD will never satisfy the expectation. */
+	if ((op->sym->m_dst && op->sym->m_dst != m_src) ||
+	    (m_src->nb_segs > 1) ||
+	    (RTE_PTR_ADD(aad_addr, sess->aad_len) != pkt_addr) ||
+	    (RTE_PTR_ADD(pkt_addr, op->sym->aead.data.length) != tag_addr))
+		return false;
+	return true;
+}
+
+static __rte_always_inline uint32_t
+mlx5_crypto_gcm_umr_klm_set(struct mlx5_crypto_qp *qp, struct rte_mbuf *mbuf,
+		    struct mlx5_klm *klm, uint32_t offset,
+		    uint32_t *remain)
+{
+	uint32_t data_len = (rte_pktmbuf_data_len(mbuf) - offset);
+	uintptr_t addr = rte_pktmbuf_mtod_offset(mbuf, uintptr_t, offset);
+
+	if (data_len > *remain)
+		data_len = *remain;
+	*remain -= data_len;
+	klm->byte_count = rte_cpu_to_be_32(data_len);
+	klm->address = rte_cpu_to_be_64(addr);
+	klm->mkey = mlx5_mr_mb2mr(&qp->mr_ctrl, mbuf);
+	return klm->mkey;
+}
+
+static __rte_always_inline int
+mlx5_crypto_gcm_build_klm(struct mlx5_crypto_qp *qp,
+		struct rte_crypto_op *op,
+		struct rte_mbuf *mbuf,
+		struct mlx5_klm *klm)
+{
+	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+	uint32_t remain_len = op->sym->aead.data.length;
+	uint32_t nb_segs = mbuf->nb_segs;
+	uint32_t klm_n = 0;
+
+	/* Set AAD. */
+	klm->byte_count = rte_cpu_to_be_32(sess->aad_len);
+	klm->address = rte_cpu_to_be_64((uintptr_t)op->sym->aead.aad.data);
+	klm->mkey = mlx5_mr_addr2mr_bh(&qp->mr_ctrl, (uintptr_t)op->sym->aead.aad.data);
+	klm_n++;
+	/* First mbuf needs to take the data offset. */
+	if (unlikely(mlx5_crypto_gcm_umr_klm_set(qp, mbuf, ++klm,
+		     op->sym->aead.data.offset, &remain_len) == UINT32_MAX)) {
+		op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+		return 0;
+	}
+	klm_n++;
+	while (remain_len) {
+		nb_segs--;
+		mbuf = mbuf->next;
+		if (unlikely(mbuf == NULL || nb_segs == 0)) {
+			op->status = RTE_CRYPTO_OP_STATUS_INVALID_ARGS;
+			return 0;
+		}
+		if (unlikely(mlx5_crypto_gcm_umr_klm_set(qp, mbuf, ++klm, 0,
+						 &remain_len) == UINT32_MAX)) {
+			op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+			return 0;
+		}
+		klm_n++;
+	}
+	/* Set TAG. */
+	klm++;
+	klm->byte_count = rte_cpu_to_be_32((uint32_t)sess->tag_len);
+	klm->address = rte_cpu_to_be_64((uintptr_t)op->sym->aead.digest.data);
+	klm->mkey = mlx5_mr_addr2mr_bh(&qp->mr_ctrl, (uintptr_t)op->sym->aead.digest.data);
+	klm_n++;
+	return klm_n;
+}
+
+static __rte_always_inline void*
+mlx5_crypto_gcm_get_umr_wqe(struct mlx5_crypto_qp *qp)
+{
+	struct mlx5_crypto_priv *priv = qp->priv;
+	uint32_t wqe_offset = qp->umr_pi & (qp->umr_wqbbs - 1);
+	uint32_t left_wqbbs = qp->umr_wqbbs - wqe_offset;
+	struct mlx5_wqe_cseg *wqe;
+
+	/* If UMR WQE is near the boundary. */
+	if (left_wqbbs < priv->umr_wqe_stride) {
+		/* Append NOP WQE as the left WQEBBS is not enough for UMR. */
+		wqe = (struct mlx5_wqe_cseg *)RTE_PTR_ADD(qp->umr_qp_obj.umem_buf,
+			wqe_offset * MLX5_SEND_WQE_BB);
+		wqe->opcode = RTE_BE32(MLX5_OPCODE_NOP | ((uint32_t)qp->umr_pi << 8));
+		wqe->sq_ds = rte_cpu_to_be_32((qp->umr_qp_obj.qp->id << 8) | (left_wqbbs << 2));
+		wqe->flags = RTE_BE32(0);
+		wqe->misc = RTE_BE32(0);
+		qp->umr_pi += left_wqbbs;
+		wqe_offset = qp->umr_pi & (qp->umr_wqbbs - 1);
+	}
+	wqe_offset *= MLX5_SEND_WQE_BB;
+	return RTE_PTR_ADD(qp->umr_qp_obj.umem_buf, wqe_offset);
+}
+
+static __rte_always_inline int
+mlx5_crypto_gcm_build_umr(struct mlx5_crypto_qp *qp,
+			  struct rte_crypto_op *op,
+			  uint32_t idx,
+			  struct mlx5_crypto_gcm_data *data)
+{
+	struct mlx5_crypto_priv *priv = qp->priv;
+	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+	struct mlx5_wqe_cseg *wqe;
+	struct mlx5_wqe_umr_ctrl_seg *ucseg;
+	struct mlx5_wqe_mkey_context_seg *mkc;
+	struct mlx5_klm *iklm;
+	struct mlx5_klm *klm = &qp->klm_array[idx * priv->max_segs_num];
+	uint16_t klm_size, klm_align;
+	uint16_t klm_src = 0, klm_dst = 0;
+	uint32_t total_len = op->sym->aead.data.length + sess->aad_len + sess->tag_len;
+	uint32_t i;
+
+	/* Build KLM base on the op. */
+	klm_src = mlx5_crypto_gcm_build_klm(qp, op, op->sym->m_src, klm);
+	if (!klm_src)
+		return -EINVAL;
+	if (op->sym->m_dst && op->sym->m_dst != op->sym->m_src) {
+		klm_dst = mlx5_crypto_gcm_build_klm(qp, op, op->sym->m_dst, klm + klm_src);
+		if (!klm_dst)
+			return -EINVAL;
+		total_len *= 2;
+	}
+	klm_size = klm_src + klm_dst;
+	klm_align = RTE_ALIGN(klm_size, 4);
+	/* Get UMR WQE memory. */
+	wqe = (struct mlx5_wqe_cseg *)mlx5_crypto_gcm_get_umr_wqe(qp);
+	memset(wqe, 0, priv->umr_wqe_size);
+	/* Set WQE control seg. Non-inline KLM UMR WQE size must be 9 WQE_DS. */
+	wqe->opcode = RTE_BE32(MLX5_OPCODE_UMR | ((uint32_t)qp->umr_pi << 8));
+	wqe->sq_ds = rte_cpu_to_be_32((qp->umr_qp_obj.qp->id << 8) | 9);
+	wqe->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR << MLX5_COMP_MODE_OFFSET);
+	wqe->misc = rte_cpu_to_be_32(qp->mkey[idx]->id);
+	/* Set UMR WQE control seg. */
+	ucseg = (struct mlx5_wqe_umr_ctrl_seg *)(wqe + 1);
+	ucseg->mkey_mask |= rte_cpu_to_be_64(MLX5_WQE_UMR_CTRL_MKEY_MASK_LEN);
+	ucseg->klm_octowords = rte_cpu_to_be_16(klm_align);
+	/* Set mkey context seg. */
+	mkc = (struct mlx5_wqe_mkey_context_seg *)(ucseg + 1);
+	mkc->len = rte_cpu_to_be_64(total_len);
+	mkc->qpn_mkey = rte_cpu_to_be_32(0xffffff00 | (qp->mkey[idx]->id & 0xff));
+	/* Set UMR pointer to data seg. */
+	iklm = (struct mlx5_klm *)(mkc + 1);
+	iklm->address = rte_cpu_to_be_64((uintptr_t)((char *)klm));
+	iklm->mkey = rte_cpu_to_be_32(qp->klm_mr.lkey);
+	iklm->byte_count = rte_cpu_to_be_32(klm_align);
+	data->mkey = rte_cpu_to_be_32(qp->mkey[idx]->id);
+	data->src_addr = 0;
+	data->src_bytes = sess->aad_len + op->sym->aead.data.length;
+	data->dst_bytes = data->src_bytes;
+	if (klm_dst)
+		data->dst_addr = (void *)(uintptr_t)(data->src_bytes + sess->tag_len);
+	else
+		data->dst_addr = 0;
+	if (sess->op_type == MLX5_CRYPTO_OP_TYPE_ENCRYPTION)
+		data->dst_bytes += sess->tag_len;
+	else
+		data->src_bytes += sess->tag_len;
+	/* Clear the padding memory. */
+	for (i = klm_size; i < klm_align; i++) {
+		klm[i].mkey = UINT32_MAX;
+		klm[i].address = 0;
+		klm[i].byte_count = 0;
+	}
+	/* Update PI and WQE */
+	qp->umr_pi += priv->umr_wqe_stride;
+	qp->umr_wqe = (uint8_t *)wqe;
+	return 0;
+}
+
+static __rte_always_inline void
+mlx5_crypto_gcm_build_send_en(struct mlx5_crypto_qp *qp)
+{
+	uint32_t wqe_offset = (qp->umr_pi & (qp->umr_wqbbs - 1)) * MLX5_SEND_WQE_BB;
+	struct mlx5_wqe_cseg *cs = RTE_PTR_ADD(qp->umr_qp_obj.wqes, wqe_offset);
+	struct mlx5_wqe_qseg *qs = RTE_PTR_ADD(cs, sizeof(struct mlx5_wqe_cseg));
+
+	cs->opcode = RTE_BE32(MLX5_OPCODE_SEND_EN | ((uint32_t)qp->umr_pi << 8));
+	cs->sq_ds = rte_cpu_to_be_32((qp->umr_qp_obj.qp->id << 8) | 2);
+	cs->flags = RTE_BE32((MLX5_COMP_ALWAYS << MLX5_COMP_MODE_OFFSET) |
+			MLX5_WQE_CTRL_FENCE);
+	cs->misc = RTE_BE32(0);
+	qs->max_index = rte_cpu_to_be_32(qp->pi);
+	qs->qpn_cqn = rte_cpu_to_be_32(qp->qp_obj.qp->id);
+	qp->umr_wqe = (uint8_t *)cs;
+	qp->umr_pi += 1;
+}
+
+static __rte_always_inline void
+mlx5_crypto_gcm_wqe_set(struct mlx5_crypto_qp *qp,
+			struct rte_crypto_op *op,
+			uint32_t idx,
+			struct mlx5_crypto_gcm_data *data)
+{
+	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+	struct mlx5_gga_wqe *wqe = &((struct mlx5_gga_wqe *)qp->qp_obj.wqes)[idx];
+	union mlx5_gga_crypto_opaque *opaq = qp->opaque_mr.addr;
+
+	memcpy(opaq[idx].cp.iv,
+		rte_crypto_op_ctod_offset(op, uint8_t *, sess->iv_offset), sess->iv_len);
+	opaq[idx].cp.tag_size = rte_cpu_to_be_32((uint32_t)sess->tag_len);
+	opaq[idx].cp.aad_size = rte_cpu_to_be_32((uint32_t)sess->aad_len);
+	/* Update control seg. */
+	wqe->opcode = rte_cpu_to_be_32(MLX5_MMO_CRYPTO_OPC + (qp->pi << 8));
+	wqe->gga_ctrl1 = sess->mmo_ctrl;
+	wqe->gga_ctrl2 = sess->dek_id;
+	/* Update input seg. */
+	wqe->gather.bcount = rte_cpu_to_be_32(data->src_bytes);
+	wqe->gather.lkey = data->mkey;
+	wqe->gather.pbuf = rte_cpu_to_be_64((uintptr_t)data->src_addr);
+	/* Update output seg. */
+	wqe->scatter.bcount = rte_cpu_to_be_32(data->dst_bytes);
+	wqe->scatter.lkey = data->mkey;
+	wqe->scatter.pbuf = rte_cpu_to_be_64((uintptr_t)data->dst_addr);
+	qp->wqe = (uint8_t *)wqe;
+}
+
+static uint16_t
+mlx5_crypto_gcm_enqueue_burst(void *queue_pair,
+			      struct rte_crypto_op **ops,
+			      uint16_t nb_ops)
+{
+	struct mlx5_crypto_qp *qp = queue_pair;
+	struct mlx5_crypto_session *sess;
+	struct mlx5_crypto_priv *priv = qp->priv;
+	struct mlx5_crypto_gcm_data gcm_data;
+	struct rte_crypto_op *op;
+	uint16_t mask = qp->entries_n - 1;
+	uint16_t remain = qp->entries_n - (qp->pi - qp->ci);
+	uint32_t idx;
+	uint16_t umr_cnt = 0;
+
+	if (remain < nb_ops)
+		nb_ops = remain;
+	else
+		remain = nb_ops;
+	if (unlikely(remain == 0))
+		return 0;
+	do {
+		op = *ops++;
+		sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+		idx = qp->pi & mask;
+		if (mlx5_crypto_is_gcm_input_continuous(op)) {
+			gcm_data.src_addr = op->sym->aead.aad.data;
+			gcm_data.src_bytes = op->sym->aead.data.length + sess->aad_len;
+			gcm_data.dst_addr = gcm_data.src_addr;
+			gcm_data.dst_bytes = gcm_data.src_bytes;
+			if (sess->op_type == MLX5_CRYPTO_OP_TYPE_ENCRYPTION)
+				gcm_data.dst_bytes += sess->tag_len;
+			else
+				gcm_data.src_bytes += sess->tag_len;
+			gcm_data.mkey = mlx5_mr_mb2mr(&qp->mr_ctrl, op->sym->m_src);
+		} else {
+			if (unlikely(mlx5_crypto_gcm_build_umr(qp, op, idx, &gcm_data))) {
+				qp->stats.enqueue_err_count++;
+				if (remain != nb_ops) {
+					qp->stats.enqueued_count -= remain;
+					break;
+				}
+				return 0;
+			}
+			umr_cnt++;
+		}
+		mlx5_crypto_gcm_wqe_set(qp, op, idx, &gcm_data);
+		qp->ops[idx] = op;
+		qp->pi++;
+	} while (--remain);
+	qp->stats.enqueued_count += nb_ops;
+	if (!umr_cnt) {
+		mlx5_doorbell_ring(&priv->uar.bf_db, *(volatile uint64_t *)qp->wqe,
+				   qp->pi, &qp->qp_obj.db_rec[MLX5_SND_DBR],
+				   !priv->uar.dbnc);
+	} else {
+		mlx5_crypto_gcm_build_send_en(qp);
+		mlx5_doorbell_ring(&priv->uar.bf_db, *(volatile uint64_t *)qp->umr_wqe,
+				   qp->umr_pi, &qp->umr_qp_obj.db_rec[MLX5_SND_DBR],
+				   !priv->uar.dbnc);
+	}
+	qp->has_umr = !!umr_cnt;
+	return nb_ops;
+}
+
+static __rte_noinline void
+mlx5_crypto_gcm_cqe_err_handle(struct mlx5_crypto_qp *qp, struct rte_crypto_op *op)
+{
+	const uint32_t idx = qp->ci & (qp->entries_n - 1);
+	volatile struct mlx5_err_cqe *cqe = (volatile struct mlx5_err_cqe *)
+							&qp->cq_obj.cqes[idx];
+
+	if (op)
+		op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+	qp->stats.dequeue_err_count++;
+	DRV_LOG(ERR, "CQE ERR:%x.\n", rte_be_to_cpu_32(cqe->syndrome));
+}
+
+static __rte_always_inline void
+mlx5_crypto_gcm_umr_cq_poll(struct mlx5_crypto_qp *qp)
+{
+	union {
+		struct {
+			uint16_t wqe_counter;
+			uint8_t rsvd5;
+			uint8_t op_own;
+		};
+		uint32_t word;
+	} last_word;
+	uint16_t cur_wqe_counter;
+
+	if (!qp->has_umr)
+		return;
+	last_word.word = rte_read32(&qp->umr_cq_obj.cqes[0].wqe_counter);
+	cur_wqe_counter = rte_be_to_cpu_16(last_word.wqe_counter);
+	if (cur_wqe_counter == qp->umr_last_pi)
+		return;
+	MLX5_ASSERT(MLX5_CQE_OPCODE(last_word.op_own) !=
+			MLX5_CQE_INVALID);
+	if (unlikely((MLX5_CQE_OPCODE(last_word.op_own) ==
+			   MLX5_CQE_RESP_ERR ||
+			   MLX5_CQE_OPCODE(last_word.op_own) ==
+			   MLX5_CQE_REQ_ERR)))
+		qp->umr_errors++;
+	qp->umr_last_pi = cur_wqe_counter;
+	qp->umr_ci++;
+	rte_io_wmb();
+	/* Ring CQ doorbell record. */
+	qp->umr_cq_obj.db_rec[0] = rte_cpu_to_be_32(qp->umr_ci);
+	qp->has_umr = false;
+}
+
+static uint16_t
+mlx5_crypto_gcm_dequeue_burst(void *queue_pair,
+			      struct rte_crypto_op **ops,
+			      uint16_t nb_ops)
+{
+	struct mlx5_crypto_qp *qp = queue_pair;
+	volatile struct mlx5_cqe *restrict cqe;
+	struct rte_crypto_op *restrict op;
+	const unsigned int cq_size = qp->entries_n;
+	const unsigned int mask = cq_size - 1;
+	uint32_t idx;
+	uint32_t next_idx = qp->ci & mask;
+	const uint16_t max = RTE_MIN((uint16_t)(qp->pi - qp->ci), nb_ops);
+	uint16_t i = 0;
+	int ret;
+
+	if (unlikely(max == 0))
+		return 0;
+	/* Handle UMR CQE firstly.*/
+	mlx5_crypto_gcm_umr_cq_poll(qp);
+	do {
+		idx = next_idx;
+		next_idx = (qp->ci + 1) & mask;
+		op = qp->ops[idx];
+		cqe = &qp->cq_obj.cqes[idx];
+		ret = check_cqe(cqe, cq_size, qp->ci);
+		rte_io_rmb();
+		if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
+			if (unlikely(ret != MLX5_CQE_STATUS_HW_OWN))
+				mlx5_crypto_gcm_cqe_err_handle(qp, op);
+			break;
+		}
+		op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
+		ops[i++] = op;
+		qp->ci++;
+	} while (i < max);
+	if (likely(i != 0)) {
+		rte_io_wmb();
+		qp->cq_obj.db_rec[0] = rte_cpu_to_be_32(qp->ci);
+		qp->stats.dequeued_count += i;
+	}
+	return i;
+}
+
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 {
@@ -386,6 +785,8 @@ mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 	dev_ops->sym_session_configure = mlx5_crypto_sym_gcm_session_configure;
 	dev_ops->queue_pair_setup = mlx5_crypto_gcm_qp_setup;
 	dev_ops->queue_pair_release = mlx5_crypto_gcm_qp_release;
+	crypto_dev->dequeue_burst = mlx5_crypto_gcm_dequeue_burst;
+	crypto_dev->enqueue_burst = mlx5_crypto_gcm_enqueue_burst;
 	/* Generate GCM capability. */
 	ret = mlx5_crypto_generate_gcm_cap(&cdev->config.hca_attr.crypto_mmo,
 					   mlx5_crypto_gcm_caps);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability
  2023-04-18  9:23 ` [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability Suanming Mou
@ 2023-05-17  7:37   ` Akhil Goyal
  2023-05-17  7:42     ` Suanming Mou
  0 siblings, 1 reply; 54+ messages in thread
From: Akhil Goyal @ 2023-05-17  7:37 UTC (permalink / raw)
  To: Suanming Mou, matan; +Cc: rasland, mkashani, dev, thomas

> Subject: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability
> 
> AES-GCM provides both authenticated encryption and the ability to check
> the integrity and authentication of additional authenticated data (AAD)
> that is sent in the clear.
> 
> This commit adds the AES-GCM capability query and check. An new devarg
> "algo" is added to identify if the crypto PMD will be initialized as
> AES-GCM(algo=1) or AES-XTS(algo=0, default).

Why do you need a devarg for identifying the algorithm?
Is it not sufficient to use enums rte_crypto_aead_algorithm and
rte_crypto_cipher_algorithm?

Devargs are normally added for things which are specific to a particular PMD
And which is not exposed via public APIs.
For identification of algo, it is not needed to use devargs.

> 
> Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
> ---

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability
  2023-05-17  7:37   ` [EXT] " Akhil Goyal
@ 2023-05-17  7:42     ` Suanming Mou
  2023-05-17  7:47       ` Akhil Goyal
  0 siblings, 1 reply; 54+ messages in thread
From: Suanming Mou @ 2023-05-17  7:42 UTC (permalink / raw)
  To: Akhil Goyal, Matan Azrad
  Cc: Raslan Darawsheh, Maayan Kashani, dev,
	NBU-Contact-Thomas Monjalon (EXTERNAL)



> -----Original Message-----
> From: Akhil Goyal <gakhil@marvell.com>
> Sent: Wednesday, May 17, 2023 3:37 PM
> To: Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>
> Cc: Raslan Darawsheh <rasland@nvidia.com>; Maayan Kashani
> <mkashani@nvidia.com>; dev@dpdk.org; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>
> Subject: RE: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability
> 
> > Subject: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability
> >
> > AES-GCM provides both authenticated encryption and the ability to
> > check the integrity and authentication of additional authenticated
> > data (AAD) that is sent in the clear.
> >
> > This commit adds the AES-GCM capability query and check. An new devarg
> > "algo" is added to identify if the crypto PMD will be initialized as
> > AES-GCM(algo=1) or AES-XTS(algo=0, default).
> 
> Why do you need a devarg for identifying the algorithm?
> Is it not sufficient to use enums rte_crypto_aead_algorithm and
> rte_crypto_cipher_algorithm?
> 
> Devargs are normally added for things which are specific to a particular PMD And
> which is not exposed via public APIs.
> For identification of algo, it is not needed to use devargs.
Due to current HW limitation, the NIC can only be initialized as GCM or XTS working mode during probe. It's not able to provide both in running time. That's the main reason for the devarg. 
Session configure with algo is too late.
> 
> >
> > Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
> > ---

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability
  2023-05-17  7:42     ` Suanming Mou
@ 2023-05-17  7:47       ` Akhil Goyal
  2023-05-17  7:51         ` Suanming Mou
  0 siblings, 1 reply; 54+ messages in thread
From: Akhil Goyal @ 2023-05-17  7:47 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad
  Cc: Raslan Darawsheh, Maayan Kashani, dev,
	NBU-Contact-Thomas Monjalon (EXTERNAL)

> > Subject: RE: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability
> >
> > > Subject: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability
> > >
> > > AES-GCM provides both authenticated encryption and the ability to
> > > check the integrity and authentication of additional authenticated
> > > data (AAD) that is sent in the clear.
> > >
> > > This commit adds the AES-GCM capability query and check. An new devarg
> > > "algo" is added to identify if the crypto PMD will be initialized as
> > > AES-GCM(algo=1) or AES-XTS(algo=0, default).
> >
> > Why do you need a devarg for identifying the algorithm?
> > Is it not sufficient to use enums rte_crypto_aead_algorithm and
> > rte_crypto_cipher_algorithm?
> >
> > Devargs are normally added for things which are specific to a particular PMD
> And
> > which is not exposed via public APIs.
> > For identification of algo, it is not needed to use devargs.
> Due to current HW limitation, the NIC can only be initialized as GCM or XTS
> working mode during probe. It's not able to provide both in running time. That's
> the main reason for the devarg.
> Session configure with algo is too late.

Is it not possible to reconfigure the NIC when GCM is detected in session create?



^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability
  2023-05-17  7:47       ` Akhil Goyal
@ 2023-05-17  7:51         ` Suanming Mou
  2023-05-17  8:02           ` Akhil Goyal
  0 siblings, 1 reply; 54+ messages in thread
From: Suanming Mou @ 2023-05-17  7:51 UTC (permalink / raw)
  To: Akhil Goyal, Matan Azrad
  Cc: Raslan Darawsheh, Maayan Kashani, dev,
	NBU-Contact-Thomas Monjalon (EXTERNAL)



> -----Original Message-----
> From: Akhil Goyal <gakhil@marvell.com>
> Sent: Wednesday, May 17, 2023 3:47 PM
> To: Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>
> Cc: Raslan Darawsheh <rasland@nvidia.com>; Maayan Kashani
> <mkashani@nvidia.com>; dev@dpdk.org; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>
> Subject: RE: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability
> 
> > > Subject: RE: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM
> > > capability
> > >
> > > > Subject: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability
> > > >
> > > > AES-GCM provides both authenticated encryption and the ability to
> > > > check the integrity and authentication of additional authenticated
> > > > data (AAD) that is sent in the clear.
> > > >
> > > > This commit adds the AES-GCM capability query and check. An new
> > > > devarg "algo" is added to identify if the crypto PMD will be
> > > > initialized as
> > > > AES-GCM(algo=1) or AES-XTS(algo=0, default).
> > >
> > > Why do you need a devarg for identifying the algorithm?
> > > Is it not sufficient to use enums rte_crypto_aead_algorithm and
> > > rte_crypto_cipher_algorithm?
> > >
> > > Devargs are normally added for things which are specific to a
> > > particular PMD
> > And
> > > which is not exposed via public APIs.
> > > For identification of algo, it is not needed to use devargs.
> > Due to current HW limitation, the NIC can only be initialized as GCM
> > or XTS working mode during probe. It's not able to provide both in
> > running time. That's the main reason for the devarg.
> > Session configure with algo is too late.
> 
> Is it not possible to reconfigure the NIC when GCM is detected in session create?
That means in dev info, we need to put both XTS and GCM in the capability.  But the fact is if we reconfigure the NIC to GCM, XTS will not be supported. If user wants to create both XTS and GCM session, one of them will fail. 
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability
  2023-05-17  7:51         ` Suanming Mou
@ 2023-05-17  8:02           ` Akhil Goyal
  2023-05-17  8:06             ` Suanming Mou
  0 siblings, 1 reply; 54+ messages in thread
From: Akhil Goyal @ 2023-05-17  8:02 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad
  Cc: Raslan Darawsheh, Maayan Kashani, dev,
	NBU-Contact-Thomas Monjalon (EXTERNAL)

> > Subject: RE: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability
> >
> > > > Subject: RE: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM
> > > > capability
> > > >
> > > > > Subject: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability
> > > > >
> > > > > AES-GCM provides both authenticated encryption and the ability to
> > > > > check the integrity and authentication of additional authenticated
> > > > > data (AAD) that is sent in the clear.
> > > > >
> > > > > This commit adds the AES-GCM capability query and check. An new
> > > > > devarg "algo" is added to identify if the crypto PMD will be
> > > > > initialized as
> > > > > AES-GCM(algo=1) or AES-XTS(algo=0, default).
> > > >
> > > > Why do you need a devarg for identifying the algorithm?
> > > > Is it not sufficient to use enums rte_crypto_aead_algorithm and
> > > > rte_crypto_cipher_algorithm?
> > > >
> > > > Devargs are normally added for things which are specific to a
> > > > particular PMD
> > > And
> > > > which is not exposed via public APIs.
> > > > For identification of algo, it is not needed to use devargs.
> > > Due to current HW limitation, the NIC can only be initialized as GCM
> > > or XTS working mode during probe. It's not able to provide both in
> > > running time. That's the main reason for the devarg.
> > > Session configure with algo is too late.
> >
> > Is it not possible to reconfigure the NIC when GCM is detected in session
> create?
> That means in dev info, we need to put both XTS and GCM in the capability.  But
> the fact is if we reconfigure the NIC to GCM, XTS will not be supported. If user
> wants to create both XTS and GCM session, one of them will fail.

That would fail even in current patchset.
On another thought, is it not good to create 2 separate instances of drivers in same
folder, like ipsec_mb and cnxk drivers are organized.
You can change the function pointers based on the driver instance(mlx5_gcm, mlx5_xts)


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability
  2023-05-17  8:02           ` Akhil Goyal
@ 2023-05-17  8:06             ` Suanming Mou
  0 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-05-17  8:06 UTC (permalink / raw)
  To: Akhil Goyal, Matan Azrad
  Cc: Raslan Darawsheh, Maayan Kashani, dev,
	NBU-Contact-Thomas Monjalon (EXTERNAL)



> -----Original Message-----
> From: Akhil Goyal <gakhil@marvell.com>
> Sent: Wednesday, May 17, 2023 4:03 PM
> To: Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>
> Cc: Raslan Darawsheh <rasland@nvidia.com>; Maayan Kashani
> <mkashani@nvidia.com>; dev@dpdk.org; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>
> Subject: RE: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability
> 
> > > Subject: RE: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM
> > > capability
> > >
> > > > > Subject: RE: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM
> > > > > capability
> > > > >
> > > > > > Subject: [EXT] [RFC PATCH 1/5] crypto/mlx5: add AES-GCM
> > > > > > capability
> > > > > >
> > > > > > AES-GCM provides both authenticated encryption and the ability
> > > > > > to check the integrity and authentication of additional
> > > > > > authenticated data (AAD) that is sent in the clear.
> > > > > >
> > > > > > This commit adds the AES-GCM capability query and check. An
> > > > > > new devarg "algo" is added to identify if the crypto PMD will
> > > > > > be initialized as
> > > > > > AES-GCM(algo=1) or AES-XTS(algo=0, default).
> > > > >
> > > > > Why do you need a devarg for identifying the algorithm?
> > > > > Is it not sufficient to use enums rte_crypto_aead_algorithm and
> > > > > rte_crypto_cipher_algorithm?
> > > > >
> > > > > Devargs are normally added for things which are specific to a
> > > > > particular PMD
> > > > And
> > > > > which is not exposed via public APIs.
> > > > > For identification of algo, it is not needed to use devargs.
> > > > Due to current HW limitation, the NIC can only be initialized as
> > > > GCM or XTS working mode during probe. It's not able to provide
> > > > both in running time. That's the main reason for the devarg.
> > > > Session configure with algo is too late.
> > >
> > > Is it not possible to reconfigure the NIC when GCM is detected in
> > > session
> > create?
> > That means in dev info, we need to put both XTS and GCM in the
> > capability.  But the fact is if we reconfigure the NIC to GCM, XTS
> > will not be supported. If user wants to create both XTS and GCM session, one of
> them will fail.
> 
> That would fail even in current patchset.
> On another thought, is it not good to create 2 separate instances of drivers in
> same folder, like ipsec_mb and cnxk drivers are organized.
> You can change the function pointers based on the driver instance(mlx5_gcm,
> mlx5_xts)
Current, we are initial the capability based on the algo, so it will not fail.
Regarding separate the instance, yes, we will do that in the next version. We will reuse the most of the common code in mlx5_crypto.c, mlx5_crypto_gcm.c for GCM and  mlx5_crypto_xts.c for XTS.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 0/9] crypto/mlx5: support AES-GCM
  2023-04-18  9:23 [RFC PATCH 0/5] crypto/mlx5: support AES-GCM Suanming Mou
                   ` (4 preceding siblings ...)
  2023-04-18  9:23 ` [RFC PATCH 5/5] crypto/mlx5: add enqueue and dequeue operations Suanming Mou
@ 2023-05-26  3:14 ` Suanming Mou
  2023-05-26  3:14   ` [PATCH v2 1/9] common/mlx5: export memory region lookup by address Suanming Mou
                     ` (9 more replies)
  2023-06-20  1:23 ` Suanming Mou
  2023-06-20 14:11 ` [PATCH v4 " Suanming Mou
  7 siblings, 10 replies; 54+ messages in thread
From: Suanming Mou @ 2023-05-26  3:14 UTC (permalink / raw)
  Cc: dev, rasland

AES-GCM provides both authenticated encryption and the ability to check
the integrity and authentication of additional authenticated data (AAD)
that is sent in the clear.

The crypto operations are performed with crypto WQE. If the input
buffers(AAD, mbuf, digest) are not contiguous and there is no enough
headroom or tailroom for AAD or digest, as the requirement from FW, an
UMR WQE is needed to generate contiguous address space for crypto WQE.
The UMR WQE and crypto WQE are handled in two different QPs.

The QP for UMR operation contains two types of WQE, UMR and SEND_EN
WQE. The WQEs are built dynamically according to the crypto operation 
buffer address. Crypto operation with non-contiguous buffers will
have its own UMR WQE, while the operation with contiguous buffers   
doesn't need the UMR WQE. Once the all the operations WQE in the
enqueue burst built finishes, if any UMR WQEs are built, additional
SEND_EN WQE will be as the final WQE of the burst in the UMR QP.
The purpose of that SEND_EN WQE is to trigger the crypto QP processing
with the UMR ready input memory address space buffers.

The QP for crypto operations contains only the crypto WQE and the QP
WQEs are built as fixed in QP setup. The QP processing is triggered
by doorbell ring or the SEND_EN WQE from UMR QP.

Suanming Mou (9):
  common/mlx5: export memory region lookup by address
  crypto/mlx5: split AES-XTS
  crypto/mlx5: add AES-GCM query and initialization
  crypto/mlx5: add AES-GCM encryption key
  crypto/mlx5: add AES-GCM session configure
  common/mlx5: add WQE-based QP synchronous basics
  crypto/mlx5: add queue pair setup for GCM
  crypto/mlx5: add enqueue and dequeue operations
  crypto/mlx5: enable AES-GCM capability

 doc/guides/cryptodevs/mlx5.rst         |  48 +-
 doc/guides/rel_notes/release_23_07.rst |   1 +
 drivers/common/mlx5/mlx5_common_mr.c   |   2 +-
 drivers/common/mlx5/mlx5_common_mr.h   |   5 +
 drivers/common/mlx5/mlx5_devx_cmds.c   |  21 +
 drivers/common/mlx5/mlx5_devx_cmds.h   |  16 +
 drivers/common/mlx5/mlx5_prm.h         |  65 +-
 drivers/common/mlx5/version.map        |   3 +
 drivers/crypto/mlx5/meson.build        |   2 +
 drivers/crypto/mlx5/mlx5_crypto.c      | 673 ++---------------
 drivers/crypto/mlx5/mlx5_crypto.h      | 101 ++-
 drivers/crypto/mlx5/mlx5_crypto_dek.c  | 102 ++-
 drivers/crypto/mlx5/mlx5_crypto_gcm.c  | 995 +++++++++++++++++++++++++
 drivers/crypto/mlx5/mlx5_crypto_xts.c  | 645 ++++++++++++++++
 14 files changed, 2014 insertions(+), 665 deletions(-)
 create mode 100644 drivers/crypto/mlx5/mlx5_crypto_gcm.c
 create mode 100644 drivers/crypto/mlx5/mlx5_crypto_xts.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 1/9] common/mlx5: export memory region lookup by address
  2023-05-26  3:14 ` [PATCH v2 0/9] crypto/mlx5: support AES-GCM Suanming Mou
@ 2023-05-26  3:14   ` Suanming Mou
  2023-05-26  3:14   ` [PATCH v2 2/9] crypto/mlx5: split AES-XTS Suanming Mou
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-05-26  3:14 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland

In case user provides the address without mempool. Export the
function to lookup the address without mempool is required.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/common/mlx5/mlx5_common_mr.c | 2 +-
 drivers/common/mlx5/mlx5_common_mr.h | 4 ++++
 drivers/common/mlx5/version.map      | 1 +
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/common/mlx5/mlx5_common_mr.c b/drivers/common/mlx5/mlx5_common_mr.c
index 7b14b0c7bf..40ff9153bd 100644
--- a/drivers/common/mlx5/mlx5_common_mr.c
+++ b/drivers/common/mlx5/mlx5_common_mr.c
@@ -1059,7 +1059,7 @@ mr_lookup_caches(struct mlx5_mr_ctrl *mr_ctrl,
  * @return
  *   Searched LKey on success, UINT32_MAX on no match.
  */
-static uint32_t
+uint32_t
 mlx5_mr_addr2mr_bh(struct mlx5_mr_ctrl *mr_ctrl, uintptr_t addr)
 {
 	uint32_t lkey;
diff --git a/drivers/common/mlx5/mlx5_common_mr.h b/drivers/common/mlx5/mlx5_common_mr.h
index 12def1585f..66623868a2 100644
--- a/drivers/common/mlx5/mlx5_common_mr.h
+++ b/drivers/common/mlx5/mlx5_common_mr.h
@@ -240,6 +240,10 @@ mlx5_mr_create(struct mlx5_common_device *cdev,
 	       struct mlx5_mr_share_cache *share_cache,
 	       struct mr_cache_entry *entry, uintptr_t addr);
 
+__rte_internal
+uint32_t
+mlx5_mr_addr2mr_bh(struct mlx5_mr_ctrl *mr_ctrl, uintptr_t addr);
+
 /* mlx5_common_verbs.c */
 
 __rte_internal
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index e05e1aa8c5..f860b069de 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -122,6 +122,7 @@ INTERNAL {
 	mlx5_mr_ctrl_init;
 	mlx5_mr_flush_local_cache;
 	mlx5_mr_mb2mr_bh;
+	mlx5_mr_addr2mr_bh;
 
 	mlx5_nl_allmulti; # WINDOWS_NO_EXPORT
 	mlx5_nl_ifindex; # WINDOWS_NO_EXPORT
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 2/9] crypto/mlx5: split AES-XTS
  2023-05-26  3:14 ` [PATCH v2 0/9] crypto/mlx5: support AES-GCM Suanming Mou
  2023-05-26  3:14   ` [PATCH v2 1/9] common/mlx5: export memory region lookup by address Suanming Mou
@ 2023-05-26  3:14   ` Suanming Mou
  2023-05-26  3:14   ` [PATCH v2 3/9] crypto/mlx5: add AES-GCM query and initialization Suanming Mou
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-05-26  3:14 UTC (permalink / raw)
  To: Matan Azrad; +Cc: dev, rasland

As there will be other crypto algo be supported. This commit splits
AES-XTS code to another *_xts.c file. The mlx5_crypto.c file will
just contain the common code.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/crypto/mlx5/meson.build       |   1 +
 drivers/crypto/mlx5/mlx5_crypto.c     | 642 ++------------------------
 drivers/crypto/mlx5/mlx5_crypto.h     |  33 ++
 drivers/crypto/mlx5/mlx5_crypto_xts.c | 594 ++++++++++++++++++++++++
 4 files changed, 667 insertions(+), 603 deletions(-)
 create mode 100644 drivers/crypto/mlx5/mlx5_crypto_xts.c

diff --git a/drivers/crypto/mlx5/meson.build b/drivers/crypto/mlx5/meson.build
index a2691ec0f0..045e8ce81d 100644
--- a/drivers/crypto/mlx5/meson.build
+++ b/drivers/crypto/mlx5/meson.build
@@ -15,6 +15,7 @@ endif
 
 sources = files(
         'mlx5_crypto.c',
+	'mlx5_crypto_xts.c',
         'mlx5_crypto_dek.c',
 )
 
diff --git a/drivers/crypto/mlx5/mlx5_crypto.c b/drivers/crypto/mlx5/mlx5_crypto.c
index 5267f48c1e..2e6bcc6ddc 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -40,33 +40,6 @@ int mlx5_crypto_logtype;
 
 uint8_t mlx5_crypto_driver_id;
 
-const struct rte_cryptodev_capabilities mlx5_crypto_caps[] = {
-	{		/* AES XTS */
-		.op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
-		{.sym = {
-			.xform_type = RTE_CRYPTO_SYM_XFORM_CIPHER,
-			{.cipher = {
-				.algo = RTE_CRYPTO_CIPHER_AES_XTS,
-				.block_size = 16,
-				.key_size = {
-					.min = 32,
-					.max = 64,
-					.increment = 32
-				},
-				.iv_size = {
-					.min = 16,
-					.max = 16,
-					.increment = 0
-				},
-				.dataunit_set =
-				RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_512_BYTES |
-				RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_4096_BYTES |
-				RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_1_MEGABYTES,
-			}, }
-		}, }
-	},
-};
-
 static const char mlx5_crypto_drv_name[] = RTE_STR(MLX5_CRYPTO_DRIVER_NAME);
 
 static const struct rte_driver mlx5_drv = {
@@ -76,21 +49,6 @@ static const struct rte_driver mlx5_drv = {
 
 static struct cryptodev_driver mlx5_cryptodev_driver;
 
-struct mlx5_crypto_session {
-	uint32_t bs_bpt_eo_es;
-	/**< bsf_size, bsf_p_type, encryption_order and encryption standard,
-	 * saved in big endian format.
-	 */
-	uint32_t bsp_res;
-	/**< crypto_block_size_pointer and reserved 24 bits saved in big
-	 * endian format.
-	 */
-	uint32_t iv_offset:16;
-	/**< Starting point for Initialisation Vector. */
-	struct mlx5_crypto_dek *dek; /**< Pointer to dek struct. */
-	uint32_t dek_id; /**< DEK ID */
-} __rte_packed;
-
 static void
 mlx5_crypto_dev_infos_get(struct rte_cryptodev *dev,
 			  struct rte_cryptodev_info *dev_info)
@@ -102,7 +60,7 @@ mlx5_crypto_dev_infos_get(struct rte_cryptodev *dev,
 		dev_info->driver_id = mlx5_crypto_driver_id;
 		dev_info->feature_flags =
 			MLX5_CRYPTO_FEATURE_FLAGS(priv->is_wrapped_mode);
-		dev_info->capabilities = mlx5_crypto_caps;
+		dev_info->capabilities = priv->caps;
 		dev_info->max_nb_queue_pairs = MLX5_CRYPTO_MAX_QPS;
 		dev_info->min_mbuf_headroom_req = 0;
 		dev_info->min_mbuf_tailroom_req = 0;
@@ -114,6 +72,38 @@ mlx5_crypto_dev_infos_get(struct rte_cryptodev *dev,
 	}
 }
 
+void
+mlx5_crypto_indirect_mkeys_release(struct mlx5_crypto_qp *qp,
+				   uint16_t n)
+{
+	uint32_t i;
+
+	for (i = 0; i < n; i++)
+		if (qp->mkey[i])
+			claim_zero(mlx5_devx_cmd_destroy(qp->mkey[i]));
+}
+
+int
+mlx5_crypto_indirect_mkeys_prepare(struct mlx5_crypto_priv *priv,
+				   struct mlx5_crypto_qp *qp,
+				   struct mlx5_devx_mkey_attr *attr,
+				   mlx5_crypto_mkey_update_t update_cb)
+{
+	uint32_t i;
+
+	for (i = 0; i < qp->entries_n; i++) {
+		attr->klm_array = update_cb(priv, qp, i);
+		qp->mkey[i] = mlx5_devx_cmd_mkey_create(priv->cdev->ctx, attr);
+		if (!qp->mkey[i])
+			goto error;
+	}
+	return 0;
+error:
+	DRV_LOG(ERR, "Failed to allocate indirect mkey.");
+	mlx5_crypto_indirect_mkeys_release(qp, i);
+	return -1;
+}
+
 static int
 mlx5_crypto_dev_configure(struct rte_cryptodev *dev,
 			  struct rte_cryptodev_config *config)
@@ -168,72 +158,6 @@ mlx5_crypto_sym_session_get_size(struct rte_cryptodev *dev __rte_unused)
 	return sizeof(struct mlx5_crypto_session);
 }
 
-static int
-mlx5_crypto_sym_session_configure(struct rte_cryptodev *dev,
-				  struct rte_crypto_sym_xform *xform,
-				  struct rte_cryptodev_sym_session *session)
-{
-	struct mlx5_crypto_priv *priv = dev->data->dev_private;
-	struct mlx5_crypto_session *sess_private_data =
-		CRYPTODEV_GET_SYM_SESS_PRIV(session);
-	struct rte_crypto_cipher_xform *cipher;
-	uint8_t encryption_order;
-
-	if (unlikely(xform->next != NULL)) {
-		DRV_LOG(ERR, "Xform next is not supported.");
-		return -ENOTSUP;
-	}
-	if (unlikely((xform->type != RTE_CRYPTO_SYM_XFORM_CIPHER) ||
-		     (xform->cipher.algo != RTE_CRYPTO_CIPHER_AES_XTS))) {
-		DRV_LOG(ERR, "Only AES-XTS algorithm is supported.");
-		return -ENOTSUP;
-	}
-	cipher = &xform->cipher;
-	sess_private_data->dek = mlx5_crypto_dek_prepare(priv, cipher);
-	if (sess_private_data->dek == NULL) {
-		DRV_LOG(ERR, "Failed to prepare dek.");
-		return -ENOMEM;
-	}
-	if (cipher->op == RTE_CRYPTO_CIPHER_OP_ENCRYPT)
-		encryption_order = MLX5_ENCRYPTION_ORDER_ENCRYPTED_RAW_MEMORY;
-	else
-		encryption_order = MLX5_ENCRYPTION_ORDER_ENCRYPTED_RAW_WIRE;
-	sess_private_data->bs_bpt_eo_es = rte_cpu_to_be_32
-			(MLX5_BSF_SIZE_64B << MLX5_BSF_SIZE_OFFSET |
-			 MLX5_BSF_P_TYPE_CRYPTO << MLX5_BSF_P_TYPE_OFFSET |
-			 encryption_order << MLX5_ENCRYPTION_ORDER_OFFSET |
-			 MLX5_ENCRYPTION_STANDARD_AES_XTS);
-	switch (xform->cipher.dataunit_len) {
-	case 0:
-		sess_private_data->bsp_res = 0;
-		break;
-	case 512:
-		sess_private_data->bsp_res = rte_cpu_to_be_32
-					     ((uint32_t)MLX5_BLOCK_SIZE_512B <<
-					     MLX5_BLOCK_SIZE_OFFSET);
-		break;
-	case 4096:
-		sess_private_data->bsp_res = rte_cpu_to_be_32
-					     ((uint32_t)MLX5_BLOCK_SIZE_4096B <<
-					     MLX5_BLOCK_SIZE_OFFSET);
-		break;
-	case 1048576:
-		sess_private_data->bsp_res = rte_cpu_to_be_32
-					     ((uint32_t)MLX5_BLOCK_SIZE_1MB <<
-					     MLX5_BLOCK_SIZE_OFFSET);
-		break;
-	default:
-		DRV_LOG(ERR, "Cipher data unit length is not supported.");
-		return -ENOTSUP;
-	}
-	sess_private_data->iv_offset = cipher->iv.offset;
-	sess_private_data->dek_id =
-			rte_cpu_to_be_32(sess_private_data->dek->obj->id &
-					 0xffffff);
-	DRV_LOG(DEBUG, "Session %p was configured.", sess_private_data);
-	return 0;
-}
-
 static void
 mlx5_crypto_sym_session_clear(struct rte_cryptodev *dev,
 			      struct rte_cryptodev_sym_session *sess)
@@ -249,412 +173,6 @@ mlx5_crypto_sym_session_clear(struct rte_cryptodev *dev,
 	DRV_LOG(DEBUG, "Session %p was cleared.", spriv);
 }
 
-static void
-mlx5_crypto_indirect_mkeys_release(struct mlx5_crypto_qp *qp, uint16_t n)
-{
-	uint16_t i;
-
-	for (i = 0; i < n; i++)
-		if (qp->mkey[i])
-			claim_zero(mlx5_devx_cmd_destroy(qp->mkey[i]));
-}
-
-static void
-mlx5_crypto_qp_release(struct mlx5_crypto_qp *qp)
-{
-	if (qp == NULL)
-		return;
-	mlx5_devx_qp_destroy(&qp->qp_obj);
-	mlx5_mr_btree_free(&qp->mr_ctrl.cache_bh);
-	mlx5_devx_cq_destroy(&qp->cq_obj);
-	rte_free(qp);
-}
-
-static int
-mlx5_crypto_queue_pair_release(struct rte_cryptodev *dev, uint16_t qp_id)
-{
-	struct mlx5_crypto_qp *qp = dev->data->queue_pairs[qp_id];
-
-	mlx5_crypto_indirect_mkeys_release(qp, qp->entries_n);
-	mlx5_crypto_qp_release(qp);
-	dev->data->queue_pairs[qp_id] = NULL;
-	return 0;
-}
-
-static __rte_noinline uint32_t
-mlx5_crypto_get_block_size(struct rte_crypto_op *op)
-{
-	uint32_t bl = op->sym->cipher.data.length;
-
-	switch (bl) {
-	case (1 << 20):
-		return RTE_BE32(MLX5_BLOCK_SIZE_1MB << MLX5_BLOCK_SIZE_OFFSET);
-	case (1 << 12):
-		return RTE_BE32(MLX5_BLOCK_SIZE_4096B <<
-				MLX5_BLOCK_SIZE_OFFSET);
-	case (1 << 9):
-		return RTE_BE32(MLX5_BLOCK_SIZE_512B << MLX5_BLOCK_SIZE_OFFSET);
-	default:
-		DRV_LOG(ERR, "Unknown block size: %u.", bl);
-		return UINT32_MAX;
-	}
-}
-
-static __rte_always_inline uint32_t
-mlx5_crypto_klm_set(struct mlx5_crypto_qp *qp, struct rte_mbuf *mbuf,
-		    struct mlx5_wqe_dseg *klm, uint32_t offset,
-		    uint32_t *remain)
-{
-	uint32_t data_len = (rte_pktmbuf_data_len(mbuf) - offset);
-	uintptr_t addr = rte_pktmbuf_mtod_offset(mbuf, uintptr_t, offset);
-
-	if (data_len > *remain)
-		data_len = *remain;
-	*remain -= data_len;
-	klm->bcount = rte_cpu_to_be_32(data_len);
-	klm->pbuf = rte_cpu_to_be_64(addr);
-	klm->lkey = mlx5_mr_mb2mr(&qp->mr_ctrl, mbuf);
-	return klm->lkey;
-
-}
-
-static __rte_always_inline uint32_t
-mlx5_crypto_klms_set(struct mlx5_crypto_qp *qp, struct rte_crypto_op *op,
-		     struct rte_mbuf *mbuf, struct mlx5_wqe_dseg *klm)
-{
-	uint32_t remain_len = op->sym->cipher.data.length;
-	uint32_t nb_segs = mbuf->nb_segs;
-	uint32_t klm_n = 1u;
-
-	/* First mbuf needs to take the cipher offset. */
-	if (unlikely(mlx5_crypto_klm_set(qp, mbuf, klm,
-		     op->sym->cipher.data.offset, &remain_len) == UINT32_MAX)) {
-		op->status = RTE_CRYPTO_OP_STATUS_ERROR;
-		return 0;
-	}
-	while (remain_len) {
-		nb_segs--;
-		mbuf = mbuf->next;
-		if (unlikely(mbuf == NULL || nb_segs == 0)) {
-			op->status = RTE_CRYPTO_OP_STATUS_INVALID_ARGS;
-			return 0;
-		}
-		if (unlikely(mlx5_crypto_klm_set(qp, mbuf, ++klm, 0,
-						 &remain_len) == UINT32_MAX)) {
-			op->status = RTE_CRYPTO_OP_STATUS_ERROR;
-			return 0;
-		}
-		klm_n++;
-	}
-	return klm_n;
-}
-
-static __rte_always_inline int
-mlx5_crypto_wqe_set(struct mlx5_crypto_priv *priv,
-			 struct mlx5_crypto_qp *qp,
-			 struct rte_crypto_op *op,
-			 struct mlx5_umr_wqe *umr)
-{
-	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
-	struct mlx5_wqe_cseg *cseg = &umr->ctr;
-	struct mlx5_wqe_mkey_cseg *mkc = &umr->mkc;
-	struct mlx5_wqe_dseg *klms = &umr->kseg[0];
-	struct mlx5_wqe_umr_bsf_seg *bsf = ((struct mlx5_wqe_umr_bsf_seg *)
-				      RTE_PTR_ADD(umr, priv->umr_wqe_size)) - 1;
-	uint32_t ds;
-	bool ipl = op->sym->m_dst == NULL || op->sym->m_dst == op->sym->m_src;
-	/* Set UMR WQE. */
-	uint32_t klm_n = mlx5_crypto_klms_set(qp, op,
-				   ipl ? op->sym->m_src : op->sym->m_dst, klms);
-
-	if (unlikely(klm_n == 0))
-		return 0;
-	bsf->bs_bpt_eo_es = sess->bs_bpt_eo_es;
-	if (unlikely(!sess->bsp_res)) {
-		bsf->bsp_res = mlx5_crypto_get_block_size(op);
-		if (unlikely(bsf->bsp_res == UINT32_MAX)) {
-			op->status = RTE_CRYPTO_OP_STATUS_INVALID_ARGS;
-			return 0;
-		}
-	} else {
-		bsf->bsp_res = sess->bsp_res;
-	}
-	bsf->raw_data_size = rte_cpu_to_be_32(op->sym->cipher.data.length);
-	memcpy(bsf->xts_initial_tweak,
-	       rte_crypto_op_ctod_offset(op, uint8_t *, sess->iv_offset), 16);
-	bsf->res_dp = sess->dek_id;
-	mkc->len = rte_cpu_to_be_64(op->sym->cipher.data.length);
-	cseg->opcode = rte_cpu_to_be_32((qp->db_pi << 8) | MLX5_OPCODE_UMR);
-	qp->db_pi += priv->umr_wqe_stride;
-	/* Set RDMA_WRITE WQE. */
-	cseg = RTE_PTR_ADD(cseg, priv->umr_wqe_size);
-	klms = RTE_PTR_ADD(cseg, sizeof(struct mlx5_rdma_write_wqe));
-	if (!ipl) {
-		klm_n = mlx5_crypto_klms_set(qp, op, op->sym->m_src, klms);
-		if (unlikely(klm_n == 0))
-			return 0;
-	} else {
-		memcpy(klms, &umr->kseg[0], sizeof(*klms) * klm_n);
-	}
-	ds = 2 + klm_n;
-	cseg->sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) | ds);
-	cseg->opcode = rte_cpu_to_be_32((qp->db_pi << 8) |
-							MLX5_OPCODE_RDMA_WRITE);
-	ds = RTE_ALIGN(ds, 4);
-	qp->db_pi += ds >> 2;
-	/* Set NOP WQE if needed. */
-	if (priv->max_rdmar_ds > ds) {
-		cseg += ds;
-		ds = priv->max_rdmar_ds - ds;
-		cseg->sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) | ds);
-		cseg->opcode = rte_cpu_to_be_32((qp->db_pi << 8) |
-							       MLX5_OPCODE_NOP);
-		qp->db_pi += ds >> 2; /* Here, DS is 4 aligned for sure. */
-	}
-	qp->wqe = (uint8_t *)cseg;
-	return 1;
-}
-
-static uint16_t
-mlx5_crypto_enqueue_burst(void *queue_pair, struct rte_crypto_op **ops,
-			  uint16_t nb_ops)
-{
-	struct mlx5_crypto_qp *qp = queue_pair;
-	struct mlx5_crypto_priv *priv = qp->priv;
-	struct mlx5_umr_wqe *umr;
-	struct rte_crypto_op *op;
-	uint16_t mask = qp->entries_n - 1;
-	uint16_t remain = qp->entries_n - (qp->pi - qp->ci);
-	uint32_t idx;
-
-	if (remain < nb_ops)
-		nb_ops = remain;
-	else
-		remain = nb_ops;
-	if (unlikely(remain == 0))
-		return 0;
-	do {
-		idx = qp->pi & mask;
-		op = *ops++;
-		umr = RTE_PTR_ADD(qp->qp_obj.umem_buf,
-			priv->wqe_set_size * idx);
-		if (unlikely(mlx5_crypto_wqe_set(priv, qp, op, umr) == 0)) {
-			qp->stats.enqueue_err_count++;
-			if (remain != nb_ops) {
-				qp->stats.enqueued_count -= remain;
-				break;
-			}
-			return 0;
-		}
-		qp->ops[idx] = op;
-		qp->pi++;
-	} while (--remain);
-	qp->stats.enqueued_count += nb_ops;
-	mlx5_doorbell_ring(&priv->uar.bf_db, *(volatile uint64_t *)qp->wqe,
-			   qp->db_pi, &qp->qp_obj.db_rec[MLX5_SND_DBR],
-			   !priv->uar.dbnc);
-	return nb_ops;
-}
-
-static __rte_noinline void
-mlx5_crypto_cqe_err_handle(struct mlx5_crypto_qp *qp, struct rte_crypto_op *op)
-{
-	const uint32_t idx = qp->ci & (qp->entries_n - 1);
-	volatile struct mlx5_err_cqe *cqe = (volatile struct mlx5_err_cqe *)
-							&qp->cq_obj.cqes[idx];
-
-	op->status = RTE_CRYPTO_OP_STATUS_ERROR;
-	qp->stats.dequeue_err_count++;
-	DRV_LOG(ERR, "CQE ERR:%x.\n", rte_be_to_cpu_32(cqe->syndrome));
-}
-
-static uint16_t
-mlx5_crypto_dequeue_burst(void *queue_pair, struct rte_crypto_op **ops,
-			  uint16_t nb_ops)
-{
-	struct mlx5_crypto_qp *qp = queue_pair;
-	volatile struct mlx5_cqe *restrict cqe;
-	struct rte_crypto_op *restrict op;
-	const unsigned int cq_size = qp->entries_n;
-	const unsigned int mask = cq_size - 1;
-	uint32_t idx;
-	uint32_t next_idx = qp->ci & mask;
-	const uint16_t max = RTE_MIN((uint16_t)(qp->pi - qp->ci), nb_ops);
-	uint16_t i = 0;
-	int ret;
-
-	if (unlikely(max == 0))
-		return 0;
-	do {
-		idx = next_idx;
-		next_idx = (qp->ci + 1) & mask;
-		op = qp->ops[idx];
-		cqe = &qp->cq_obj.cqes[idx];
-		ret = check_cqe(cqe, cq_size, qp->ci);
-		rte_io_rmb();
-		if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
-			if (unlikely(ret != MLX5_CQE_STATUS_HW_OWN))
-				mlx5_crypto_cqe_err_handle(qp, op);
-			break;
-		}
-		op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
-		ops[i++] = op;
-		qp->ci++;
-	} while (i < max);
-	if (likely(i != 0)) {
-		rte_io_wmb();
-		qp->cq_obj.db_rec[0] = rte_cpu_to_be_32(qp->ci);
-		qp->stats.dequeued_count += i;
-	}
-	return i;
-}
-
-static void
-mlx5_crypto_qp_init(struct mlx5_crypto_priv *priv, struct mlx5_crypto_qp *qp)
-{
-	uint32_t i;
-
-	for (i = 0 ; i < qp->entries_n; i++) {
-		struct mlx5_wqe_cseg *cseg = RTE_PTR_ADD(qp->qp_obj.umem_buf,
-			i * priv->wqe_set_size);
-		struct mlx5_wqe_umr_cseg *ucseg = (struct mlx5_wqe_umr_cseg *)
-								     (cseg + 1);
-		struct mlx5_wqe_umr_bsf_seg *bsf =
-			(struct mlx5_wqe_umr_bsf_seg *)(RTE_PTR_ADD(cseg,
-						       priv->umr_wqe_size)) - 1;
-		struct mlx5_wqe_rseg *rseg;
-
-		/* Init UMR WQE. */
-		cseg->sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) |
-					 (priv->umr_wqe_size / MLX5_WSEG_SIZE));
-		cseg->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
-				       MLX5_COMP_MODE_OFFSET);
-		cseg->misc = rte_cpu_to_be_32(qp->mkey[i]->id);
-		ucseg->if_cf_toe_cq_res = RTE_BE32(1u << MLX5_UMRC_IF_OFFSET);
-		ucseg->mkey_mask = RTE_BE64(1u << 0); /* Mkey length bit. */
-		ucseg->ko_to_bs = rte_cpu_to_be_32
-			((MLX5_CRYPTO_KLM_SEGS_NUM(priv->umr_wqe_size) <<
-			 MLX5_UMRC_KO_OFFSET) | (4 << MLX5_UMRC_TO_BS_OFFSET));
-		bsf->keytag = priv->keytag;
-		/* Init RDMA WRITE WQE. */
-		cseg = RTE_PTR_ADD(cseg, priv->umr_wqe_size);
-		cseg->flags = RTE_BE32((MLX5_COMP_ALWAYS <<
-				      MLX5_COMP_MODE_OFFSET) |
-				      MLX5_WQE_CTRL_INITIATOR_SMALL_FENCE);
-		rseg = (struct mlx5_wqe_rseg *)(cseg + 1);
-		rseg->rkey = rte_cpu_to_be_32(qp->mkey[i]->id);
-	}
-}
-
-static int
-mlx5_crypto_indirect_mkeys_prepare(struct mlx5_crypto_priv *priv,
-				  struct mlx5_crypto_qp *qp)
-{
-	struct mlx5_umr_wqe *umr;
-	uint32_t i;
-	struct mlx5_devx_mkey_attr attr = {
-		.pd = priv->cdev->pdn,
-		.umr_en = 1,
-		.crypto_en = 1,
-		.set_remote_rw = 1,
-		.klm_num = MLX5_CRYPTO_KLM_SEGS_NUM(priv->umr_wqe_size),
-	};
-
-	for (umr = (struct mlx5_umr_wqe *)qp->qp_obj.umem_buf, i = 0;
-	   i < qp->entries_n; i++, umr = RTE_PTR_ADD(umr, priv->wqe_set_size)) {
-		attr.klm_array = (struct mlx5_klm *)&umr->kseg[0];
-		qp->mkey[i] = mlx5_devx_cmd_mkey_create(priv->cdev->ctx, &attr);
-		if (!qp->mkey[i])
-			goto error;
-	}
-	return 0;
-error:
-	DRV_LOG(ERR, "Failed to allocate indirect mkey.");
-	mlx5_crypto_indirect_mkeys_release(qp, i);
-	return -1;
-}
-
-static int
-mlx5_crypto_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
-			     const struct rte_cryptodev_qp_conf *qp_conf,
-			     int socket_id)
-{
-	struct mlx5_crypto_priv *priv = dev->data->dev_private;
-	struct mlx5_devx_qp_attr attr = {0};
-	struct mlx5_crypto_qp *qp;
-	uint16_t log_nb_desc = rte_log2_u32(qp_conf->nb_descriptors);
-	uint32_t ret;
-	uint32_t alloc_size = sizeof(*qp);
-	uint32_t log_wqbb_n;
-	struct mlx5_devx_cq_attr cq_attr = {
-		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
-	};
-
-	if (dev->data->queue_pairs[qp_id] != NULL)
-		mlx5_crypto_queue_pair_release(dev, qp_id);
-	alloc_size = RTE_ALIGN(alloc_size, RTE_CACHE_LINE_SIZE);
-	alloc_size += (sizeof(struct rte_crypto_op *) +
-		       sizeof(struct mlx5_devx_obj *)) *
-		       RTE_BIT32(log_nb_desc);
-	qp = rte_zmalloc_socket(__func__, alloc_size, RTE_CACHE_LINE_SIZE,
-				socket_id);
-	if (qp == NULL) {
-		DRV_LOG(ERR, "Failed to allocate QP memory.");
-		rte_errno = ENOMEM;
-		return -rte_errno;
-	}
-	if (mlx5_devx_cq_create(priv->cdev->ctx, &qp->cq_obj, log_nb_desc,
-				&cq_attr, socket_id) != 0) {
-		DRV_LOG(ERR, "Failed to create CQ.");
-		goto error;
-	}
-	log_wqbb_n = rte_log2_u32(RTE_BIT32(log_nb_desc) *
-				(priv->wqe_set_size / MLX5_SEND_WQE_BB));
-	attr.pd = priv->cdev->pdn;
-	attr.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar.obj);
-	attr.cqn = qp->cq_obj.cq->id;
-	attr.num_of_receive_wqes = 0;
-	attr.num_of_send_wqbbs = RTE_BIT32(log_wqbb_n);
-	attr.ts_format =
-		mlx5_ts_format_conv(priv->cdev->config.hca_attr.qp_ts_format);
-	ret = mlx5_devx_qp_create(priv->cdev->ctx, &qp->qp_obj,
-					attr.num_of_send_wqbbs * MLX5_WQE_SIZE,
-					&attr, socket_id);
-	if (ret) {
-		DRV_LOG(ERR, "Failed to create QP.");
-		goto error;
-	}
-	if (mlx5_mr_ctrl_init(&qp->mr_ctrl, &priv->cdev->mr_scache.dev_gen,
-			      priv->dev_config.socket_id) != 0) {
-		DRV_LOG(ERR, "Cannot allocate MR Btree for qp %u.",
-			(uint32_t)qp_id);
-		rte_errno = ENOMEM;
-		goto error;
-	}
-	/*
-	 * In Order to configure self loopback, when calling devx qp2rts the
-	 * remote QP id that is used is the id of the same QP.
-	 */
-	if (mlx5_devx_qp2rts(&qp->qp_obj, qp->qp_obj.qp->id))
-		goto error;
-	qp->mkey = (struct mlx5_devx_obj **)RTE_ALIGN((uintptr_t)(qp + 1),
-							   RTE_CACHE_LINE_SIZE);
-	qp->ops = (struct rte_crypto_op **)(qp->mkey + RTE_BIT32(log_nb_desc));
-	qp->entries_n = 1 << log_nb_desc;
-	if (mlx5_crypto_indirect_mkeys_prepare(priv, qp)) {
-		DRV_LOG(ERR, "Cannot allocate indirect memory regions.");
-		rte_errno = ENOMEM;
-		goto error;
-	}
-	mlx5_crypto_qp_init(priv, qp);
-	qp->priv = priv;
-	dev->data->queue_pairs[qp_id] = qp;
-	return 0;
-error:
-	mlx5_crypto_qp_release(qp);
-	return -1;
-}
-
 static void
 mlx5_crypto_stats_get(struct rte_cryptodev *dev,
 		      struct rte_cryptodev_stats *stats)
@@ -691,10 +209,7 @@ static struct rte_cryptodev_ops mlx5_crypto_ops = {
 	.dev_infos_get			= mlx5_crypto_dev_infos_get,
 	.stats_get			= mlx5_crypto_stats_get,
 	.stats_reset			= mlx5_crypto_stats_reset,
-	.queue_pair_setup		= mlx5_crypto_queue_pair_setup,
-	.queue_pair_release		= mlx5_crypto_queue_pair_release,
 	.sym_session_get_size		= mlx5_crypto_sym_session_get_size,
-	.sym_session_configure		= mlx5_crypto_sym_session_configure,
 	.sym_session_clear		= mlx5_crypto_sym_session_clear,
 	.sym_get_raw_dp_ctx_size	= NULL,
 	.sym_configure_raw_dp_ctx	= NULL,
@@ -796,81 +311,6 @@ mlx5_crypto_parse_devargs(struct mlx5_kvargs_ctrl *mkvlist,
 	return 0;
 }
 
-/*
- * Calculate UMR WQE size and RDMA Write WQE size with the
- * following limitations:
- *	- Each WQE size is multiple of 64.
- *	- The summarize of both UMR WQE and RDMA_W WQE is a power of 2.
- *	- The number of entries in the UMR WQE's KLM list is multiple of 4.
- */
-static void
-mlx5_crypto_get_wqe_sizes(uint32_t segs_num, uint32_t *umr_size,
-			uint32_t *rdmaw_size)
-{
-	uint32_t diff, wqe_set_size;
-
-	*umr_size = MLX5_CRYPTO_UMR_WQE_STATIC_SIZE +
-			RTE_ALIGN(segs_num, 4) *
-			sizeof(struct mlx5_wqe_dseg);
-	/* Make sure UMR WQE size is multiple of WQBB. */
-	*umr_size = RTE_ALIGN(*umr_size, MLX5_SEND_WQE_BB);
-	*rdmaw_size = sizeof(struct mlx5_rdma_write_wqe) +
-			sizeof(struct mlx5_wqe_dseg) *
-			(segs_num <= 2 ? 2 : 2 +
-			RTE_ALIGN(segs_num - 2, 4));
-	/* Make sure RDMA_WRITE WQE size is multiple of WQBB. */
-	*rdmaw_size = RTE_ALIGN(*rdmaw_size, MLX5_SEND_WQE_BB);
-	wqe_set_size = *rdmaw_size + *umr_size;
-	diff = rte_align32pow2(wqe_set_size) - wqe_set_size;
-	/* Make sure wqe_set size is power of 2. */
-	if (diff)
-		*umr_size += diff;
-}
-
-static uint8_t
-mlx5_crypto_max_segs_num(uint16_t max_wqe_size)
-{
-	int klms_sizes = max_wqe_size - MLX5_CRYPTO_UMR_WQE_STATIC_SIZE;
-	uint32_t max_segs_cap = RTE_ALIGN_FLOOR(klms_sizes, MLX5_SEND_WQE_BB) /
-			sizeof(struct mlx5_wqe_dseg);
-
-	MLX5_ASSERT(klms_sizes >= MLX5_SEND_WQE_BB);
-	while (max_segs_cap) {
-		uint32_t umr_wqe_size, rdmw_wqe_size;
-
-		mlx5_crypto_get_wqe_sizes(max_segs_cap, &umr_wqe_size,
-						&rdmw_wqe_size);
-		if (umr_wqe_size <= max_wqe_size &&
-				rdmw_wqe_size <= max_wqe_size)
-			break;
-		max_segs_cap -= 4;
-	}
-	return max_segs_cap;
-}
-
-static int
-mlx5_crypto_configure_wqe_size(struct mlx5_crypto_priv *priv,
-				uint16_t max_wqe_size, uint32_t max_segs_num)
-{
-	uint32_t rdmw_wqe_size, umr_wqe_size;
-
-	mlx5_crypto_get_wqe_sizes(max_segs_num, &umr_wqe_size,
-					&rdmw_wqe_size);
-	priv->wqe_set_size = rdmw_wqe_size + umr_wqe_size;
-	if (umr_wqe_size > max_wqe_size ||
-				rdmw_wqe_size > max_wqe_size) {
-		DRV_LOG(ERR, "Invalid max_segs_num: %u. should be %u or lower.",
-			max_segs_num,
-			mlx5_crypto_max_segs_num(max_wqe_size));
-		rte_errno = EINVAL;
-		return -EINVAL;
-	}
-	priv->umr_wqe_size = (uint16_t)umr_wqe_size;
-	priv->umr_wqe_stride = priv->umr_wqe_size / MLX5_SEND_WQE_BB;
-	priv->max_rdmar_ds = rdmw_wqe_size / sizeof(struct mlx5_wqe_dseg);
-	return 0;
-}
-
 static int
 mlx5_crypto_dev_probe(struct mlx5_common_device *cdev,
 		      struct mlx5_kvargs_ctrl *mkvlist)
@@ -916,14 +356,18 @@ mlx5_crypto_dev_probe(struct mlx5_common_device *cdev,
 	DRV_LOG(INFO,
 		"Crypto device %s was created successfully.", ibdev_name);
 	crypto_dev->dev_ops = &mlx5_crypto_ops;
-	crypto_dev->dequeue_burst = mlx5_crypto_dequeue_burst;
-	crypto_dev->enqueue_burst = mlx5_crypto_enqueue_burst;
 	crypto_dev->feature_flags = MLX5_CRYPTO_FEATURE_FLAGS(wrapped_mode);
 	crypto_dev->driver_id = mlx5_crypto_driver_id;
 	priv = crypto_dev->data->dev_private;
 	priv->cdev = cdev;
 	priv->crypto_dev = crypto_dev;
 	priv->is_wrapped_mode = wrapped_mode;
+	priv->max_segs_num = devarg_prms.max_segs_num;
+	ret = mlx5_crypto_xts_init(priv);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to init AES-XTS crypto.");
+		return -ENOTSUP;
+	}
 	if (mlx5_devx_uar_prepare(cdev, &priv->uar) != 0) {
 		rte_cryptodev_pmd_destroy(priv->crypto_dev);
 		return -1;
@@ -939,14 +383,6 @@ mlx5_crypto_dev_probe(struct mlx5_common_device *cdev,
 		}
 		priv->login_obj = login;
 	}
-	ret = mlx5_crypto_configure_wqe_size(priv,
-		cdev->config.hca_attr.max_wqe_sz_sq, devarg_prms.max_segs_num);
-	if (ret) {
-		claim_zero(mlx5_devx_cmd_destroy(priv->login_obj));
-		mlx5_devx_uar_release(&priv->uar);
-		rte_cryptodev_pmd_destroy(priv->crypto_dev);
-		return -1;
-	}
 	priv->keytag = rte_cpu_to_be_64(devarg_prms.keytag);
 	DRV_LOG(INFO, "Max number of segments: %u.",
 		(unsigned int)RTE_MIN(
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index a2771b3dab..05d8fe97fe 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -31,6 +31,7 @@ struct mlx5_crypto_priv {
 	struct mlx5_uar uar; /* User Access Region. */
 	uint32_t max_segs_num; /* Maximum supported data segs. */
 	struct mlx5_hlist *dek_hlist; /* Dek hash list. */
+	const struct rte_cryptodev_capabilities *caps;
 	struct rte_cryptodev_config dev_config;
 	struct mlx5_devx_obj *login_obj;
 	uint64_t keytag;
@@ -70,6 +71,35 @@ struct mlx5_crypto_devarg_params {
 	uint32_t max_segs_num;
 };
 
+struct mlx5_crypto_session {
+	uint32_t bs_bpt_eo_es;
+	/**< bsf_size, bsf_p_type, encryption_order and encryption standard,
+	 * saved in big endian format.
+	 */
+	uint32_t bsp_res;
+	/**< crypto_block_size_pointer and reserved 24 bits saved in big
+	 * endian format.
+	 */
+	uint32_t iv_offset:16;
+	/**< Starting point for Initialisation Vector. */
+	struct mlx5_crypto_dek *dek; /**< Pointer to dek struct. */
+	uint32_t dek_id; /**< DEK ID */
+} __rte_packed;
+
+typedef void *(*mlx5_crypto_mkey_update_t)(struct mlx5_crypto_priv *priv,
+					   struct mlx5_crypto_qp *qp,
+					   uint32_t idx);
+
+void
+mlx5_crypto_indirect_mkeys_release(struct mlx5_crypto_qp *qp,
+				   uint16_t n);
+
+int
+mlx5_crypto_indirect_mkeys_prepare(struct mlx5_crypto_priv *priv,
+				   struct mlx5_crypto_qp *qp,
+				   struct mlx5_devx_mkey_attr *attr,
+				   mlx5_crypto_mkey_update_t update_cb);
+
 int
 mlx5_crypto_dek_destroy(struct mlx5_crypto_priv *priv,
 			struct mlx5_crypto_dek *dek);
@@ -84,4 +114,7 @@ mlx5_crypto_dek_setup(struct mlx5_crypto_priv *priv);
 void
 mlx5_crypto_dek_unset(struct mlx5_crypto_priv *priv);
 
+int
+mlx5_crypto_xts_init(struct mlx5_crypto_priv *priv);
+
 #endif /* MLX5_CRYPTO_H_ */
diff --git a/drivers/crypto/mlx5/mlx5_crypto_xts.c b/drivers/crypto/mlx5/mlx5_crypto_xts.c
new file mode 100644
index 0000000000..964d02e6ed
--- /dev/null
+++ b/drivers/crypto/mlx5/mlx5_crypto_xts.c
@@ -0,0 +1,594 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 NVIDIA Corporation & Affiliates
+ */
+
+#include <rte_malloc.h>
+#include <rte_mempool.h>
+#include <rte_eal_paging.h>
+#include <rte_errno.h>
+#include <rte_log.h>
+#include <bus_pci_driver.h>
+#include <rte_memory.h>
+
+#include <mlx5_glue.h>
+#include <mlx5_common.h>
+#include <mlx5_devx_cmds.h>
+#include <mlx5_common_os.h>
+
+#include "mlx5_crypto_utils.h"
+#include "mlx5_crypto.h"
+
+const struct rte_cryptodev_capabilities mlx5_crypto_caps[] = {
+	{		/* AES XTS */
+		.op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+		{.sym = {
+			.xform_type = RTE_CRYPTO_SYM_XFORM_CIPHER,
+			{.cipher = {
+				.algo = RTE_CRYPTO_CIPHER_AES_XTS,
+				.block_size = 16,
+				.key_size = {
+					.min = 32,
+					.max = 64,
+					.increment = 32
+				},
+				.iv_size = {
+					.min = 16,
+					.max = 16,
+					.increment = 0
+				},
+				.dataunit_set =
+				RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_512_BYTES |
+				RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_4096_BYTES |
+				RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_1_MEGABYTES,
+			}, }
+		}, }
+	},
+};
+
+static int
+mlx5_crypto_xts_sym_session_configure(struct rte_cryptodev *dev,
+				      struct rte_crypto_sym_xform *xform,
+				      struct rte_cryptodev_sym_session *session)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_crypto_session *sess_private_data =
+		CRYPTODEV_GET_SYM_SESS_PRIV(session);
+	struct rte_crypto_cipher_xform *cipher;
+	uint8_t encryption_order;
+
+	if (unlikely(xform->next != NULL)) {
+		DRV_LOG(ERR, "Xform next is not supported.");
+		return -ENOTSUP;
+	}
+	if (unlikely((xform->type != RTE_CRYPTO_SYM_XFORM_CIPHER) ||
+		     (xform->cipher.algo != RTE_CRYPTO_CIPHER_AES_XTS))) {
+		DRV_LOG(ERR, "Only AES-XTS algorithm is supported.");
+		return -ENOTSUP;
+	}
+	cipher = &xform->cipher;
+	sess_private_data->dek = mlx5_crypto_dek_prepare(priv, cipher);
+	if (sess_private_data->dek == NULL) {
+		DRV_LOG(ERR, "Failed to prepare dek.");
+		return -ENOMEM;
+	}
+	if (cipher->op == RTE_CRYPTO_CIPHER_OP_ENCRYPT)
+		encryption_order = MLX5_ENCRYPTION_ORDER_ENCRYPTED_RAW_MEMORY;
+	else
+		encryption_order = MLX5_ENCRYPTION_ORDER_ENCRYPTED_RAW_WIRE;
+	sess_private_data->bs_bpt_eo_es = rte_cpu_to_be_32
+			(MLX5_BSF_SIZE_64B << MLX5_BSF_SIZE_OFFSET |
+			 MLX5_BSF_P_TYPE_CRYPTO << MLX5_BSF_P_TYPE_OFFSET |
+			 encryption_order << MLX5_ENCRYPTION_ORDER_OFFSET |
+			 MLX5_ENCRYPTION_STANDARD_AES_XTS);
+	switch (xform->cipher.dataunit_len) {
+	case 0:
+		sess_private_data->bsp_res = 0;
+		break;
+	case 512:
+		sess_private_data->bsp_res = rte_cpu_to_be_32
+					     ((uint32_t)MLX5_BLOCK_SIZE_512B <<
+					     MLX5_BLOCK_SIZE_OFFSET);
+		break;
+	case 4096:
+		sess_private_data->bsp_res = rte_cpu_to_be_32
+					     ((uint32_t)MLX5_BLOCK_SIZE_4096B <<
+					     MLX5_BLOCK_SIZE_OFFSET);
+		break;
+	case 1048576:
+		sess_private_data->bsp_res = rte_cpu_to_be_32
+					     ((uint32_t)MLX5_BLOCK_SIZE_1MB <<
+					     MLX5_BLOCK_SIZE_OFFSET);
+		break;
+	default:
+		DRV_LOG(ERR, "Cipher data unit length is not supported.");
+		return -ENOTSUP;
+	}
+	sess_private_data->iv_offset = cipher->iv.offset;
+	sess_private_data->dek_id =
+			rte_cpu_to_be_32(sess_private_data->dek->obj->id &
+					 0xffffff);
+	DRV_LOG(DEBUG, "Session %p was configured.", sess_private_data);
+	return 0;
+}
+
+static void
+mlx5_crypto_xts_qp_release(struct mlx5_crypto_qp *qp)
+{
+	if (qp == NULL)
+		return;
+	mlx5_devx_qp_destroy(&qp->qp_obj);
+	mlx5_mr_btree_free(&qp->mr_ctrl.cache_bh);
+	mlx5_devx_cq_destroy(&qp->cq_obj);
+	rte_free(qp);
+}
+
+static int
+mlx5_crypto_xts_queue_pair_release(struct rte_cryptodev *dev, uint16_t qp_id)
+{
+	struct mlx5_crypto_qp *qp = dev->data->queue_pairs[qp_id];
+
+	mlx5_crypto_indirect_mkeys_release(qp, qp->entries_n);
+	mlx5_crypto_xts_qp_release(qp);
+	dev->data->queue_pairs[qp_id] = NULL;
+	return 0;
+}
+
+static __rte_noinline uint32_t
+mlx5_crypto_xts_get_block_size(struct rte_crypto_op *op)
+{
+	uint32_t bl = op->sym->cipher.data.length;
+
+	switch (bl) {
+	case (1 << 20):
+		return RTE_BE32(MLX5_BLOCK_SIZE_1MB << MLX5_BLOCK_SIZE_OFFSET);
+	case (1 << 12):
+		return RTE_BE32(MLX5_BLOCK_SIZE_4096B <<
+				MLX5_BLOCK_SIZE_OFFSET);
+	case (1 << 9):
+		return RTE_BE32(MLX5_BLOCK_SIZE_512B << MLX5_BLOCK_SIZE_OFFSET);
+	default:
+		DRV_LOG(ERR, "Unknown block size: %u.", bl);
+		return UINT32_MAX;
+	}
+}
+
+static __rte_always_inline uint32_t
+mlx5_crypto_xts_klm_set(struct mlx5_crypto_qp *qp, struct rte_mbuf *mbuf,
+			struct mlx5_wqe_dseg *klm, uint32_t offset,
+			uint32_t *remain)
+{
+	uint32_t data_len = (rte_pktmbuf_data_len(mbuf) - offset);
+	uintptr_t addr = rte_pktmbuf_mtod_offset(mbuf, uintptr_t, offset);
+
+	if (data_len > *remain)
+		data_len = *remain;
+	*remain -= data_len;
+	klm->bcount = rte_cpu_to_be_32(data_len);
+	klm->pbuf = rte_cpu_to_be_64(addr);
+	klm->lkey = mlx5_mr_mb2mr(&qp->mr_ctrl, mbuf);
+	return klm->lkey;
+
+}
+
+static __rte_always_inline uint32_t
+mlx5_crypto_xts_klms_set(struct mlx5_crypto_qp *qp, struct rte_crypto_op *op,
+			 struct rte_mbuf *mbuf, struct mlx5_wqe_dseg *klm)
+{
+	uint32_t remain_len = op->sym->cipher.data.length;
+	uint32_t nb_segs = mbuf->nb_segs;
+	uint32_t klm_n = 1u;
+
+	/* First mbuf needs to take the cipher offset. */
+	if (unlikely(mlx5_crypto_xts_klm_set(qp, mbuf, klm,
+		     op->sym->cipher.data.offset, &remain_len) == UINT32_MAX)) {
+		op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+		return 0;
+	}
+	while (remain_len) {
+		nb_segs--;
+		mbuf = mbuf->next;
+		if (unlikely(mbuf == NULL || nb_segs == 0)) {
+			op->status = RTE_CRYPTO_OP_STATUS_INVALID_ARGS;
+			return 0;
+		}
+		if (unlikely(mlx5_crypto_xts_klm_set(qp, mbuf, ++klm, 0,
+						&remain_len) == UINT32_MAX)) {
+			op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+			return 0;
+		}
+		klm_n++;
+	}
+	return klm_n;
+}
+
+static __rte_always_inline int
+mlx5_crypto_xts_wqe_set(struct mlx5_crypto_priv *priv,
+			 struct mlx5_crypto_qp *qp,
+			 struct rte_crypto_op *op,
+			 struct mlx5_umr_wqe *umr)
+{
+	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+	struct mlx5_wqe_cseg *cseg = &umr->ctr;
+	struct mlx5_wqe_mkey_cseg *mkc = &umr->mkc;
+	struct mlx5_wqe_dseg *klms = &umr->kseg[0];
+	struct mlx5_wqe_umr_bsf_seg *bsf = ((struct mlx5_wqe_umr_bsf_seg *)
+				      RTE_PTR_ADD(umr, priv->umr_wqe_size)) - 1;
+	uint32_t ds;
+	bool ipl = op->sym->m_dst == NULL || op->sym->m_dst == op->sym->m_src;
+	/* Set UMR WQE. */
+	uint32_t klm_n = mlx5_crypto_xts_klms_set(qp, op,
+				   ipl ? op->sym->m_src : op->sym->m_dst, klms);
+
+	if (unlikely(klm_n == 0))
+		return 0;
+	bsf->bs_bpt_eo_es = sess->bs_bpt_eo_es;
+	if (unlikely(!sess->bsp_res)) {
+		bsf->bsp_res = mlx5_crypto_xts_get_block_size(op);
+		if (unlikely(bsf->bsp_res == UINT32_MAX)) {
+			op->status = RTE_CRYPTO_OP_STATUS_INVALID_ARGS;
+			return 0;
+		}
+	} else {
+		bsf->bsp_res = sess->bsp_res;
+	}
+	bsf->raw_data_size = rte_cpu_to_be_32(op->sym->cipher.data.length);
+	memcpy(bsf->xts_initial_tweak,
+	       rte_crypto_op_ctod_offset(op, uint8_t *, sess->iv_offset), 16);
+	bsf->res_dp = sess->dek_id;
+	mkc->len = rte_cpu_to_be_64(op->sym->cipher.data.length);
+	cseg->opcode = rte_cpu_to_be_32((qp->db_pi << 8) | MLX5_OPCODE_UMR);
+	qp->db_pi += priv->umr_wqe_stride;
+	/* Set RDMA_WRITE WQE. */
+	cseg = RTE_PTR_ADD(cseg, priv->umr_wqe_size);
+	klms = RTE_PTR_ADD(cseg, sizeof(struct mlx5_rdma_write_wqe));
+	if (!ipl) {
+		klm_n = mlx5_crypto_xts_klms_set(qp, op, op->sym->m_src, klms);
+		if (unlikely(klm_n == 0))
+			return 0;
+	} else {
+		memcpy(klms, &umr->kseg[0], sizeof(*klms) * klm_n);
+	}
+	ds = 2 + klm_n;
+	cseg->sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) | ds);
+	cseg->opcode = rte_cpu_to_be_32((qp->db_pi << 8) |
+							MLX5_OPCODE_RDMA_WRITE);
+	ds = RTE_ALIGN(ds, 4);
+	qp->db_pi += ds >> 2;
+	/* Set NOP WQE if needed. */
+	if (priv->max_rdmar_ds > ds) {
+		cseg += ds;
+		ds = priv->max_rdmar_ds - ds;
+		cseg->sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) | ds);
+		cseg->opcode = rte_cpu_to_be_32((qp->db_pi << 8) |
+							       MLX5_OPCODE_NOP);
+		qp->db_pi += ds >> 2; /* Here, DS is 4 aligned for sure. */
+	}
+	qp->wqe = (uint8_t *)cseg;
+	return 1;
+}
+
+static uint16_t
+mlx5_crypto_xts_enqueue_burst(void *queue_pair, struct rte_crypto_op **ops,
+			      uint16_t nb_ops)
+{
+	struct mlx5_crypto_qp *qp = queue_pair;
+	struct mlx5_crypto_priv *priv = qp->priv;
+	struct mlx5_umr_wqe *umr;
+	struct rte_crypto_op *op;
+	uint16_t mask = qp->entries_n - 1;
+	uint16_t remain = qp->entries_n - (qp->pi - qp->ci);
+	uint32_t idx;
+
+	if (remain < nb_ops)
+		nb_ops = remain;
+	else
+		remain = nb_ops;
+	if (unlikely(remain == 0))
+		return 0;
+	do {
+		idx = qp->pi & mask;
+		op = *ops++;
+		umr = RTE_PTR_ADD(qp->qp_obj.umem_buf,
+			priv->wqe_set_size * idx);
+		if (unlikely(mlx5_crypto_xts_wqe_set(priv, qp, op, umr) == 0)) {
+			qp->stats.enqueue_err_count++;
+			if (remain != nb_ops) {
+				qp->stats.enqueued_count -= remain;
+				break;
+			}
+			return 0;
+		}
+		qp->ops[idx] = op;
+		qp->pi++;
+	} while (--remain);
+	qp->stats.enqueued_count += nb_ops;
+	mlx5_doorbell_ring(&priv->uar.bf_db, *(volatile uint64_t *)qp->wqe,
+			   qp->db_pi, &qp->qp_obj.db_rec[MLX5_SND_DBR],
+			   !priv->uar.dbnc);
+	return nb_ops;
+}
+
+static __rte_noinline void
+mlx5_crypto_xts_cqe_err_handle(struct mlx5_crypto_qp *qp, struct rte_crypto_op *op)
+{
+	const uint32_t idx = qp->ci & (qp->entries_n - 1);
+	volatile struct mlx5_err_cqe *cqe = (volatile struct mlx5_err_cqe *)
+							&qp->cq_obj.cqes[idx];
+
+	op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+	qp->stats.dequeue_err_count++;
+	DRV_LOG(ERR, "CQE ERR:%x.\n", rte_be_to_cpu_32(cqe->syndrome));
+}
+
+static uint16_t
+mlx5_crypto_xts_dequeue_burst(void *queue_pair, struct rte_crypto_op **ops,
+			  uint16_t nb_ops)
+{
+	struct mlx5_crypto_qp *qp = queue_pair;
+	volatile struct mlx5_cqe *restrict cqe;
+	struct rte_crypto_op *restrict op;
+	const unsigned int cq_size = qp->entries_n;
+	const unsigned int mask = cq_size - 1;
+	uint32_t idx;
+	uint32_t next_idx = qp->ci & mask;
+	const uint16_t max = RTE_MIN((uint16_t)(qp->pi - qp->ci), nb_ops);
+	uint16_t i = 0;
+	int ret;
+
+	if (unlikely(max == 0))
+		return 0;
+	do {
+		idx = next_idx;
+		next_idx = (qp->ci + 1) & mask;
+		op = qp->ops[idx];
+		cqe = &qp->cq_obj.cqes[idx];
+		ret = check_cqe(cqe, cq_size, qp->ci);
+		rte_io_rmb();
+		if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
+			if (unlikely(ret != MLX5_CQE_STATUS_HW_OWN))
+				mlx5_crypto_xts_cqe_err_handle(qp, op);
+			break;
+		}
+		op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
+		ops[i++] = op;
+		qp->ci++;
+	} while (i < max);
+	if (likely(i != 0)) {
+		rte_io_wmb();
+		qp->cq_obj.db_rec[0] = rte_cpu_to_be_32(qp->ci);
+		qp->stats.dequeued_count += i;
+	}
+	return i;
+}
+
+static void
+mlx5_crypto_xts_qp_init(struct mlx5_crypto_priv *priv, struct mlx5_crypto_qp *qp)
+{
+	uint32_t i;
+
+	for (i = 0 ; i < qp->entries_n; i++) {
+		struct mlx5_wqe_cseg *cseg = RTE_PTR_ADD(qp->qp_obj.umem_buf,
+			i * priv->wqe_set_size);
+		struct mlx5_wqe_umr_cseg *ucseg = (struct mlx5_wqe_umr_cseg *)
+								     (cseg + 1);
+		struct mlx5_wqe_umr_bsf_seg *bsf =
+			(struct mlx5_wqe_umr_bsf_seg *)(RTE_PTR_ADD(cseg,
+						       priv->umr_wqe_size)) - 1;
+		struct mlx5_wqe_rseg *rseg;
+
+		/* Init UMR WQE. */
+		cseg->sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) |
+					 (priv->umr_wqe_size / MLX5_WSEG_SIZE));
+		cseg->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
+				       MLX5_COMP_MODE_OFFSET);
+		cseg->misc = rte_cpu_to_be_32(qp->mkey[i]->id);
+		ucseg->if_cf_toe_cq_res = RTE_BE32(1u << MLX5_UMRC_IF_OFFSET);
+		ucseg->mkey_mask = RTE_BE64(1u << 0); /* Mkey length bit. */
+		ucseg->ko_to_bs = rte_cpu_to_be_32
+			((MLX5_CRYPTO_KLM_SEGS_NUM(priv->umr_wqe_size) <<
+			 MLX5_UMRC_KO_OFFSET) | (4 << MLX5_UMRC_TO_BS_OFFSET));
+		bsf->keytag = priv->keytag;
+		/* Init RDMA WRITE WQE. */
+		cseg = RTE_PTR_ADD(cseg, priv->umr_wqe_size);
+		cseg->flags = RTE_BE32((MLX5_COMP_ALWAYS <<
+				      MLX5_COMP_MODE_OFFSET) |
+				      MLX5_WQE_CTRL_INITIATOR_SMALL_FENCE);
+		rseg = (struct mlx5_wqe_rseg *)(cseg + 1);
+		rseg->rkey = rte_cpu_to_be_32(qp->mkey[i]->id);
+	}
+}
+
+static void *
+mlx5_crypto_gcm_mkey_klm_update(struct mlx5_crypto_priv *priv,
+				struct mlx5_crypto_qp *qp,
+				uint32_t idx)
+{
+	return RTE_PTR_ADD(qp->qp_obj.umem_buf, priv->wqe_set_size * idx);
+}
+
+static int
+mlx5_crypto_xts_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
+				 const struct rte_cryptodev_qp_conf *qp_conf,
+				 int socket_id)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_devx_qp_attr attr = {0};
+	struct mlx5_crypto_qp *qp;
+	uint16_t log_nb_desc = rte_log2_u32(qp_conf->nb_descriptors);
+	uint32_t ret;
+	uint32_t alloc_size = sizeof(*qp);
+	uint32_t log_wqbb_n;
+	struct mlx5_devx_cq_attr cq_attr = {
+		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
+	};
+	struct mlx5_devx_mkey_attr mkey_attr = {
+		.pd = priv->cdev->pdn,
+		.umr_en = 1,
+		.crypto_en = 1,
+		.set_remote_rw = 1,
+		.klm_num = MLX5_CRYPTO_KLM_SEGS_NUM(priv->umr_wqe_size),
+	};
+
+	if (dev->data->queue_pairs[qp_id] != NULL)
+		mlx5_crypto_xts_queue_pair_release(dev, qp_id);
+	alloc_size = RTE_ALIGN(alloc_size, RTE_CACHE_LINE_SIZE);
+	alloc_size += (sizeof(struct rte_crypto_op *) +
+		       sizeof(struct mlx5_devx_obj *)) *
+		       RTE_BIT32(log_nb_desc);
+	qp = rte_zmalloc_socket(__func__, alloc_size, RTE_CACHE_LINE_SIZE,
+				socket_id);
+	if (qp == NULL) {
+		DRV_LOG(ERR, "Failed to allocate QP memory.");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	if (mlx5_devx_cq_create(priv->cdev->ctx, &qp->cq_obj, log_nb_desc,
+				&cq_attr, socket_id) != 0) {
+		DRV_LOG(ERR, "Failed to create CQ.");
+		goto error;
+	}
+	log_wqbb_n = rte_log2_u32(RTE_BIT32(log_nb_desc) *
+				(priv->wqe_set_size / MLX5_SEND_WQE_BB));
+	attr.pd = priv->cdev->pdn;
+	attr.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar.obj);
+	attr.cqn = qp->cq_obj.cq->id;
+	attr.num_of_receive_wqes = 0;
+	attr.num_of_send_wqbbs = RTE_BIT32(log_wqbb_n);
+	attr.ts_format =
+		mlx5_ts_format_conv(priv->cdev->config.hca_attr.qp_ts_format);
+	ret = mlx5_devx_qp_create(priv->cdev->ctx, &qp->qp_obj,
+					attr.num_of_send_wqbbs * MLX5_WQE_SIZE,
+					&attr, socket_id);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to create QP.");
+		goto error;
+	}
+	if (mlx5_mr_ctrl_init(&qp->mr_ctrl, &priv->cdev->mr_scache.dev_gen,
+			      priv->dev_config.socket_id) != 0) {
+		DRV_LOG(ERR, "Cannot allocate MR Btree for qp %u.",
+			(uint32_t)qp_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	/*
+	 * In Order to configure self loopback, when calling devx qp2rts the
+	 * remote QP id that is used is the id of the same QP.
+	 */
+	if (mlx5_devx_qp2rts(&qp->qp_obj, qp->qp_obj.qp->id))
+		goto error;
+	qp->mkey = (struct mlx5_devx_obj **)RTE_ALIGN((uintptr_t)(qp + 1),
+							   RTE_CACHE_LINE_SIZE);
+	qp->ops = (struct rte_crypto_op **)(qp->mkey + RTE_BIT32(log_nb_desc));
+	qp->entries_n = 1 << log_nb_desc;
+	if (mlx5_crypto_indirect_mkeys_prepare(priv, qp, &mkey_attr,
+					       mlx5_crypto_gcm_mkey_klm_update)) {
+		DRV_LOG(ERR, "Cannot allocate indirect memory regions.");
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	mlx5_crypto_xts_qp_init(priv, qp);
+	qp->priv = priv;
+	dev->data->queue_pairs[qp_id] = qp;
+	return 0;
+error:
+	mlx5_crypto_xts_qp_release(qp);
+	return -1;
+}
+
+/*
+ * Calculate UMR WQE size and RDMA Write WQE size with the
+ * following limitations:
+ *	- Each WQE size is multiple of 64.
+ *	- The summarize of both UMR WQE and RDMA_W WQE is a power of 2.
+ *	- The number of entries in the UMR WQE's KLM list is multiple of 4.
+ */
+static void
+mlx5_crypto_xts_get_wqe_sizes(uint32_t segs_num, uint32_t *umr_size,
+			      uint32_t *rdmaw_size)
+{
+	uint32_t diff, wqe_set_size;
+
+	*umr_size = MLX5_CRYPTO_UMR_WQE_STATIC_SIZE +
+			RTE_ALIGN(segs_num, 4) *
+			sizeof(struct mlx5_wqe_dseg);
+	/* Make sure UMR WQE size is multiple of WQBB. */
+	*umr_size = RTE_ALIGN(*umr_size, MLX5_SEND_WQE_BB);
+	*rdmaw_size = sizeof(struct mlx5_rdma_write_wqe) +
+			sizeof(struct mlx5_wqe_dseg) *
+			(segs_num <= 2 ? 2 : 2 +
+			RTE_ALIGN(segs_num - 2, 4));
+	/* Make sure RDMA_WRITE WQE size is multiple of WQBB. */
+	*rdmaw_size = RTE_ALIGN(*rdmaw_size, MLX5_SEND_WQE_BB);
+	wqe_set_size = *rdmaw_size + *umr_size;
+	diff = rte_align32pow2(wqe_set_size) - wqe_set_size;
+	/* Make sure wqe_set size is power of 2. */
+	if (diff)
+		*umr_size += diff;
+}
+
+static uint8_t
+mlx5_crypto_xts_max_segs_num(uint16_t max_wqe_size)
+{
+	int klms_sizes = max_wqe_size - MLX5_CRYPTO_UMR_WQE_STATIC_SIZE;
+	uint32_t max_segs_cap = RTE_ALIGN_FLOOR(klms_sizes, MLX5_SEND_WQE_BB) /
+			sizeof(struct mlx5_wqe_dseg);
+
+	MLX5_ASSERT(klms_sizes >= MLX5_SEND_WQE_BB);
+	while (max_segs_cap) {
+		uint32_t umr_wqe_size, rdmw_wqe_size;
+
+		mlx5_crypto_xts_get_wqe_sizes(max_segs_cap, &umr_wqe_size,
+						&rdmw_wqe_size);
+		if (umr_wqe_size <= max_wqe_size &&
+				rdmw_wqe_size <= max_wqe_size)
+			break;
+		max_segs_cap -= 4;
+	}
+	return max_segs_cap;
+}
+
+static int
+mlx5_crypto_xts_configure_wqe_size(struct mlx5_crypto_priv *priv,
+				   uint16_t max_wqe_size, uint32_t max_segs_num)
+{
+	uint32_t rdmw_wqe_size, umr_wqe_size;
+
+	mlx5_crypto_xts_get_wqe_sizes(max_segs_num, &umr_wqe_size,
+			&rdmw_wqe_size);
+	priv->wqe_set_size = rdmw_wqe_size + umr_wqe_size;
+	if (umr_wqe_size > max_wqe_size ||
+				rdmw_wqe_size > max_wqe_size) {
+		DRV_LOG(ERR, "Invalid max_segs_num: %u. should be %u or lower.",
+			max_segs_num,
+			mlx5_crypto_xts_max_segs_num(max_wqe_size));
+		rte_errno = EINVAL;
+		return -EINVAL;
+	}
+	priv->umr_wqe_size = (uint16_t)umr_wqe_size;
+	priv->umr_wqe_stride = priv->umr_wqe_size / MLX5_SEND_WQE_BB;
+	priv->max_rdmar_ds = rdmw_wqe_size / sizeof(struct mlx5_wqe_dseg);
+	return 0;
+}
+
+int
+mlx5_crypto_xts_init(struct mlx5_crypto_priv *priv)
+{
+	struct mlx5_common_device *cdev = priv->cdev;
+	struct rte_cryptodev *crypto_dev = priv->crypto_dev;
+	struct rte_cryptodev_ops *dev_ops = crypto_dev->dev_ops;
+	int ret;
+
+	ret = mlx5_crypto_xts_configure_wqe_size(priv,
+		cdev->config.hca_attr.max_wqe_sz_sq, priv->max_segs_num);
+	if (ret)
+		return -EINVAL;
+	/* Override AES-XST specified ops. */
+	dev_ops->sym_session_configure = mlx5_crypto_xts_sym_session_configure;
+	dev_ops->queue_pair_setup = mlx5_crypto_xts_queue_pair_setup;
+	dev_ops->queue_pair_release = mlx5_crypto_xts_queue_pair_release;
+	crypto_dev->dequeue_burst = mlx5_crypto_xts_dequeue_burst;
+	crypto_dev->enqueue_burst = mlx5_crypto_xts_enqueue_burst;
+	priv->caps = mlx5_crypto_caps;
+	return 0;
+}
+
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 3/9] crypto/mlx5: add AES-GCM query and initialization
  2023-05-26  3:14 ` [PATCH v2 0/9] crypto/mlx5: support AES-GCM Suanming Mou
  2023-05-26  3:14   ` [PATCH v2 1/9] common/mlx5: export memory region lookup by address Suanming Mou
  2023-05-26  3:14   ` [PATCH v2 2/9] crypto/mlx5: split AES-XTS Suanming Mou
@ 2023-05-26  3:14   ` Suanming Mou
  2023-05-26  3:14   ` [PATCH v2 4/9] crypto/mlx5: add AES-GCM encryption key Suanming Mou
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-05-26  3:14 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland

AES-GCM provides both authenticated encryption and the ability to check
the integrity and authentication of additional authenticated data (AAD)
that is sent in the clear.

This commit adds the AES-GCM attributes query and initialization function.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c  | 15 +++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h  | 13 ++++++++++
 drivers/common/mlx5/mlx5_prm.h        | 19 +++++++++++---
 drivers/crypto/mlx5/meson.build       |  1 +
 drivers/crypto/mlx5/mlx5_crypto.c     |  4 ++-
 drivers/crypto/mlx5/mlx5_crypto.h     |  3 +++
 drivers/crypto/mlx5/mlx5_crypto_gcm.c | 36 +++++++++++++++++++++++++++
 7 files changed, 87 insertions(+), 4 deletions(-)
 create mode 100644 drivers/crypto/mlx5/mlx5_crypto_gcm.c

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 1e418a0353..4332081165 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -1117,6 +1117,21 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 		attr->crypto_wrapped_import_method = !!(MLX5_GET(crypto_caps,
 						hcattr, wrapped_import_method)
 						& 1 << 2);
+		attr->crypto_mmo.crypto_mmo_qp = MLX5_GET(crypto_caps, hcattr, crypto_mmo_qp);
+		attr->crypto_mmo.gcm_256_encrypt =
+			MLX5_GET(crypto_caps, hcattr, crypto_aes_gcm_256_encrypt);
+		attr->crypto_mmo.gcm_128_encrypt =
+			MLX5_GET(crypto_caps, hcattr, crypto_aes_gcm_128_encrypt);
+		attr->crypto_mmo.gcm_256_decrypt =
+			MLX5_GET(crypto_caps, hcattr, crypto_aes_gcm_256_decrypt);
+		attr->crypto_mmo.gcm_128_decrypt =
+			MLX5_GET(crypto_caps, hcattr, crypto_aes_gcm_128_decrypt);
+		attr->crypto_mmo.gcm_auth_tag_128 =
+			MLX5_GET(crypto_caps, hcattr, gcm_auth_tag_128);
+		attr->crypto_mmo.gcm_auth_tag_96 =
+			MLX5_GET(crypto_caps, hcattr, gcm_auth_tag_96);
+		attr->crypto_mmo.log_crypto_mmo_max_size =
+			MLX5_GET(crypto_caps, hcattr, log_crypto_mmo_max_size);
 	}
 	if (hca_cap_2_sup) {
 		hcattr = mlx5_devx_get_hca_cap(ctx, in, out, &rc,
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index dc3359268d..cb3f3a211b 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -125,6 +125,18 @@ struct mlx5_hca_flex_attr {
 	uint8_t  header_length_mask_width;
 };
 
+__extension__
+struct mlx5_hca_crypto_mmo_attr {
+	uint32_t crypto_mmo_qp:1;
+	uint32_t gcm_256_encrypt:1;
+	uint32_t gcm_128_encrypt:1;
+	uint32_t gcm_256_decrypt:1;
+	uint32_t gcm_128_decrypt:1;
+	uint32_t gcm_auth_tag_128:1;
+	uint32_t gcm_auth_tag_96:1;
+	uint32_t log_crypto_mmo_max_size:6;
+};
+
 /* ISO C restricts enumerator values to range of 'int' */
 __extension__
 enum {
@@ -250,6 +262,7 @@ struct mlx5_hca_attr {
 	struct mlx5_hca_vdpa_attr vdpa;
 	struct mlx5_hca_flow_attr flow;
 	struct mlx5_hca_flex_attr flex;
+	struct mlx5_hca_crypto_mmo_attr crypto_mmo;
 	int log_max_qp_sz;
 	int log_max_cq_sz;
 	int log_max_qp;
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 9f749a2dcc..b4446f56b9 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -4577,7 +4577,9 @@ struct mlx5_ifc_crypto_caps_bits {
 	u8 synchronize_dek[0x1];
 	u8 int_kek_manual[0x1];
 	u8 int_kek_auto[0x1];
-	u8 reserved_at_6[0x12];
+	u8 reserved_at_6[0xd];
+	u8 sw_wrapped_dek_key_purpose[0x1];
+	u8 reserved_at_14[0x4];
 	u8 wrapped_import_method[0x8];
 	u8 reserved_at_20[0x3];
 	u8 log_dek_max_alloc[0x5];
@@ -4594,8 +4596,19 @@ struct mlx5_ifc_crypto_caps_bits {
 	u8 log_dek_granularity[0x5];
 	u8 reserved_at_68[0x3];
 	u8 log_max_num_int_kek[0x5];
-	u8 reserved_at_70[0x10];
-	u8 reserved_at_80[0x780];
+	u8 sw_wrapped_dek_new[0x10];
+	u8 reserved_at_80[0x80];
+	u8 crypto_mmo_qp[0x1];
+	u8 crypto_aes_gcm_256_encrypt[0x1];
+	u8 crypto_aes_gcm_128_encrypt[0x1];
+	u8 crypto_aes_gcm_256_decrypt[0x1];
+	u8 crypto_aes_gcm_128_decrypt[0x1];
+	u8 gcm_auth_tag_128[0x1];
+	u8 gcm_auth_tag_96[0x1];
+	u8 reserved_at_107[0x3];
+	u8 log_crypto_mmo_max_size[0x6];
+	u8 reserved_at_110[0x10];
+	u8 reserved_at_120[0x6e0];
 };
 
 struct mlx5_ifc_crypto_commissioning_register_bits {
diff --git a/drivers/crypto/mlx5/meson.build b/drivers/crypto/mlx5/meson.build
index 045e8ce81d..17ffce89f0 100644
--- a/drivers/crypto/mlx5/meson.build
+++ b/drivers/crypto/mlx5/meson.build
@@ -16,6 +16,7 @@ endif
 sources = files(
         'mlx5_crypto.c',
 	'mlx5_crypto_xts.c',
+	'mlx5_crypto_gcm.c',
         'mlx5_crypto_dek.c',
 )
 
diff --git a/drivers/crypto/mlx5/mlx5_crypto.c b/drivers/crypto/mlx5/mlx5_crypto.c
index 2e6bcc6ddc..ff632cd69a 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -335,7 +335,9 @@ mlx5_crypto_dev_probe(struct mlx5_common_device *cdev,
 		rte_errno = ENOTSUP;
 		return -rte_errno;
 	}
-	if (!cdev->config.hca_attr.crypto || !cdev->config.hca_attr.aes_xts) {
+	if (!cdev->config.hca_attr.crypto ||
+	   (!cdev->config.hca_attr.aes_xts &&
+	    !cdev->config.hca_attr.crypto_mmo.crypto_mmo_qp)) {
 		DRV_LOG(ERR, "Not enough capabilities to support crypto "
 			"operations, maybe old FW/OFED version?");
 		rte_errno = ENOTSUP;
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index 05d8fe97fe..76f368ee91 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -117,4 +117,7 @@ mlx5_crypto_dek_unset(struct mlx5_crypto_priv *priv);
 int
 mlx5_crypto_xts_init(struct mlx5_crypto_priv *priv);
 
+int
+mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv);
+
 #endif /* MLX5_CRYPTO_H_ */
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
new file mode 100644
index 0000000000..bd78c6d66b
--- /dev/null
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 NVIDIA Corporation & Affiliates
+ */
+
+#include <rte_malloc.h>
+#include <rte_mempool.h>
+#include <rte_eal_paging.h>
+#include <rte_errno.h>
+#include <rte_log.h>
+#include <bus_pci_driver.h>
+#include <rte_memory.h>
+
+#include <mlx5_glue.h>
+#include <mlx5_common.h>
+#include <mlx5_devx_cmds.h>
+#include <mlx5_common_os.h>
+
+#include "mlx5_crypto_utils.h"
+#include "mlx5_crypto.h"
+
+static struct rte_cryptodev_capabilities mlx5_crypto_gcm_caps[] = {
+	{
+		.op = RTE_CRYPTO_OP_TYPE_UNDEFINED,
+	},
+	{
+		.op = RTE_CRYPTO_OP_TYPE_UNDEFINED,
+	}
+};
+
+int
+mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
+{
+	priv->caps = mlx5_crypto_gcm_caps;
+	return 0;
+}
+
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 4/9] crypto/mlx5: add AES-GCM encryption key
  2023-05-26  3:14 ` [PATCH v2 0/9] crypto/mlx5: support AES-GCM Suanming Mou
                     ` (2 preceding siblings ...)
  2023-05-26  3:14   ` [PATCH v2 3/9] crypto/mlx5: add AES-GCM query and initialization Suanming Mou
@ 2023-05-26  3:14   ` Suanming Mou
  2023-05-26  3:14   ` [PATCH v2 5/9] crypto/mlx5: add AES-GCM session configure Suanming Mou
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-05-26  3:14 UTC (permalink / raw)
  To: Matan Azrad; +Cc: dev, rasland

The crypto device requires the DEK(data encryption key) object for
data encryption/decryption operation.

This commit adds the AES-GCM DEK object management support.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/crypto/mlx5/mlx5_crypto.h     |  17 ++++-
 drivers/crypto/mlx5/mlx5_crypto_dek.c | 102 +++++++++++++-------------
 drivers/crypto/mlx5/mlx5_crypto_gcm.c |  31 ++++++++
 drivers/crypto/mlx5/mlx5_crypto_xts.c |  53 ++++++++++++-
 4 files changed, 148 insertions(+), 55 deletions(-)

diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index 76f368ee91..bb5a557a38 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -86,6 +86,11 @@ struct mlx5_crypto_session {
 	uint32_t dek_id; /**< DEK ID */
 } __rte_packed;
 
+struct mlx5_crypto_dek_ctx {
+	struct rte_crypto_sym_xform *xform;
+	struct mlx5_crypto_priv *priv;
+};
+
 typedef void *(*mlx5_crypto_mkey_update_t)(struct mlx5_crypto_priv *priv,
 					   struct mlx5_crypto_qp *qp,
 					   uint32_t idx);
@@ -106,7 +111,7 @@ mlx5_crypto_dek_destroy(struct mlx5_crypto_priv *priv,
 
 struct mlx5_crypto_dek *
 mlx5_crypto_dek_prepare(struct mlx5_crypto_priv *priv,
-			struct rte_crypto_cipher_xform *cipher);
+			struct rte_crypto_sym_xform *xform);
 
 int
 mlx5_crypto_dek_setup(struct mlx5_crypto_priv *priv);
@@ -120,4 +125,14 @@ mlx5_crypto_xts_init(struct mlx5_crypto_priv *priv);
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv);
 
+int
+mlx5_crypto_dek_fill_xts_attr(struct mlx5_crypto_dek *dek,
+			      struct mlx5_devx_dek_attr *dek_attr,
+			      void *cb_ctx);
+
+int
+mlx5_crypto_dek_fill_gcm_attr(struct mlx5_crypto_dek *dek,
+			      struct mlx5_devx_dek_attr *dek_attr,
+			      void *cb_ctx);
+
 #endif /* MLX5_CRYPTO_H_ */
diff --git a/drivers/crypto/mlx5/mlx5_crypto_dek.c b/drivers/crypto/mlx5/mlx5_crypto_dek.c
index 7339ef2bd9..716bcc0545 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_dek.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_dek.c
@@ -13,10 +13,24 @@
 #include "mlx5_crypto_utils.h"
 #include "mlx5_crypto.h"
 
-struct mlx5_crypto_dek_ctx {
-	struct rte_crypto_cipher_xform *cipher;
-	struct mlx5_crypto_priv *priv;
-};
+static int
+mlx5_crypto_dek_get_key(struct rte_crypto_sym_xform *xform,
+			const uint8_t **key,
+			uint16_t *key_len)
+{
+	if (xform->type == RTE_CRYPTO_SYM_XFORM_CIPHER) {
+		*key = xform->cipher.key.data;
+		*key_len = xform->cipher.key.length;
+	} else if (xform->type == RTE_CRYPTO_SYM_XFORM_AEAD) {
+		*key = xform->aead.key.data;
+		*key_len = xform->aead.key.length;
+	} else {
+		DRV_LOG(ERR, "Xform dek type not supported.");
+		rte_errno = -EINVAL;
+		return -1;
+	}
+	return 0;
+}
 
 int
 mlx5_crypto_dek_destroy(struct mlx5_crypto_priv *priv,
@@ -27,19 +41,22 @@ mlx5_crypto_dek_destroy(struct mlx5_crypto_priv *priv,
 
 struct mlx5_crypto_dek *
 mlx5_crypto_dek_prepare(struct mlx5_crypto_priv *priv,
-			struct rte_crypto_cipher_xform *cipher)
+			struct rte_crypto_sym_xform *xform)
 {
+	const uint8_t *key;
+	uint16_t key_len;
 	struct mlx5_hlist *dek_hlist = priv->dek_hlist;
 	struct mlx5_crypto_dek_ctx dek_ctx = {
-		.cipher = cipher,
+		.xform = xform,
 		.priv = priv,
 	};
-	struct rte_crypto_cipher_xform *cipher_ctx = cipher;
-	uint64_t key64 = __rte_raw_cksum(cipher_ctx->key.data,
-					 cipher_ctx->key.length, 0);
-	struct mlx5_list_entry *entry = mlx5_hlist_register(dek_hlist,
-							     key64, &dek_ctx);
+	uint64_t key64;
+	struct mlx5_list_entry *entry;
 
+	if (mlx5_crypto_dek_get_key(xform, &key, &key_len))
+		return NULL;
+	key64 = __rte_raw_cksum(key, key_len, 0);
+	entry = mlx5_hlist_register(dek_hlist, key64, &dek_ctx);
 	return entry == NULL ? NULL :
 			     container_of(entry, struct mlx5_crypto_dek, entry);
 }
@@ -76,76 +93,55 @@ mlx5_crypto_dek_match_cb(void *tool_ctx __rte_unused,
 			 struct mlx5_list_entry *entry, void *cb_ctx)
 {
 	struct mlx5_crypto_dek_ctx *ctx = cb_ctx;
-	struct rte_crypto_cipher_xform *cipher_ctx = ctx->cipher;
+	struct rte_crypto_sym_xform *xform = ctx->xform;
 	struct mlx5_crypto_dek *dek =
 			container_of(entry, typeof(*dek), entry);
 	uint32_t key_len = dek->size;
+	uint16_t xkey_len;
+	const uint8_t *key;
 
-	if (key_len != cipher_ctx->key.length)
+	if (mlx5_crypto_dek_get_key(xform, &key, &xkey_len))
+		return -1;
+	if (key_len != xkey_len)
 		return -1;
-	return memcmp(cipher_ctx->key.data, dek->data, cipher_ctx->key.length);
+	return memcmp(key, dek->data, xkey_len);
 }
 
 static struct mlx5_list_entry *
 mlx5_crypto_dek_create_cb(void *tool_ctx __rte_unused, void *cb_ctx)
 {
 	struct mlx5_crypto_dek_ctx *ctx = cb_ctx;
-	struct rte_crypto_cipher_xform *cipher_ctx = ctx->cipher;
+	struct rte_crypto_sym_xform *xform = ctx->xform;
 	struct mlx5_crypto_dek *dek = rte_zmalloc(__func__, sizeof(*dek),
 						  RTE_CACHE_LINE_SIZE);
 	struct mlx5_devx_dek_attr dek_attr = {
 		.pd = ctx->priv->cdev->pdn,
-		.key_purpose = MLX5_CRYPTO_KEY_PURPOSE_AES_XTS,
-		.has_keytag = 1,
 	};
-	bool is_wrapped = ctx->priv->is_wrapped_mode;
+	int ret = -1;
 
 	if (dek == NULL) {
 		DRV_LOG(ERR, "Failed to allocate dek memory.");
 		return NULL;
 	}
-	if (is_wrapped) {
-		switch (cipher_ctx->key.length) {
-		case 48:
-			dek->size = 48;
-			dek_attr.key_size = MLX5_CRYPTO_KEY_SIZE_128b;
-			break;
-		case 80:
-			dek->size = 80;
-			dek_attr.key_size = MLX5_CRYPTO_KEY_SIZE_256b;
-			break;
-		default:
-			DRV_LOG(ERR, "Wrapped key size not supported.");
-			return NULL;
-		}
-	} else {
-		switch (cipher_ctx->key.length) {
-		case 32:
-			dek->size = 40;
-			dek_attr.key_size = MLX5_CRYPTO_KEY_SIZE_128b;
-			break;
-		case 64:
-			dek->size = 72;
-			dek_attr.key_size = MLX5_CRYPTO_KEY_SIZE_256b;
-			break;
-		default:
-			DRV_LOG(ERR, "Key size not supported.");
-			return NULL;
-		}
-		memcpy(&dek_attr.key[cipher_ctx->key.length],
-						&ctx->priv->keytag, 8);
-	}
-	memcpy(&dek_attr.key, cipher_ctx->key.data, cipher_ctx->key.length);
+	if (xform->type == RTE_CRYPTO_SYM_XFORM_CIPHER)
+		ret = mlx5_crypto_dek_fill_xts_attr(dek, &dek_attr, cb_ctx);
+	else if (xform->type == RTE_CRYPTO_SYM_XFORM_AEAD)
+		ret = mlx5_crypto_dek_fill_gcm_attr(dek, &dek_attr, cb_ctx);
+	if (ret)
+		goto fail;
 	dek->obj = mlx5_devx_cmd_create_dek_obj(ctx->priv->cdev->ctx,
 						&dek_attr);
 	if (dek->obj == NULL) {
-		rte_free(dek);
-		return NULL;
+		DRV_LOG(ERR, "Failed to create dek obj.");
+		goto fail;
 	}
-	memcpy(&dek->data, cipher_ctx->key.data, cipher_ctx->key.length);
 	return &dek->entry;
+fail:
+	rte_free(dek);
+	return NULL;
 }
 
+
 static void
 mlx5_crypto_dek_remove_cb(void *tool_ctx __rte_unused,
 			  struct mlx5_list_entry *entry)
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index bd78c6d66b..676bec6b18 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -27,6 +27,37 @@ static struct rte_cryptodev_capabilities mlx5_crypto_gcm_caps[] = {
 	}
 };
 
+int
+mlx5_crypto_dek_fill_gcm_attr(struct mlx5_crypto_dek *dek,
+			      struct mlx5_devx_dek_attr *dek_attr,
+			      void *cb_ctx)
+{
+	struct mlx5_crypto_dek_ctx *ctx = cb_ctx;
+	struct rte_crypto_aead_xform *aead_ctx = &ctx->xform->aead;
+
+	if (aead_ctx->algo != RTE_CRYPTO_AEAD_AES_GCM) {
+		DRV_LOG(ERR, "Only AES-GCM algo supported.");
+		return -EINVAL;
+	}
+	dek_attr->key_purpose = MLX5_CRYPTO_KEY_PURPOSE_GCM;
+	switch (aead_ctx->key.length) {
+	case 16:
+		dek->size = 16;
+		dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_128b;
+		break;
+	case 32:
+		dek->size = 32;
+		dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_256b;
+		break;
+	default:
+		DRV_LOG(ERR, "Wrapped key size not supported.");
+		return -EINVAL;
+	}
+	memcpy(&dek_attr->key, aead_ctx->key.data, aead_ctx->key.length);
+	memcpy(&dek->data, aead_ctx->key.data, aead_ctx->key.length);
+	return 0;
+}
+
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 {
diff --git a/drivers/crypto/mlx5/mlx5_crypto_xts.c b/drivers/crypto/mlx5/mlx5_crypto_xts.c
index 964d02e6ed..661da5f589 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_xts.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_xts.c
@@ -45,6 +45,57 @@ const struct rte_cryptodev_capabilities mlx5_crypto_caps[] = {
 	},
 };
 
+int
+mlx5_crypto_dek_fill_xts_attr(struct mlx5_crypto_dek *dek,
+			      struct mlx5_devx_dek_attr *dek_attr,
+			      void *cb_ctx)
+{
+	struct mlx5_crypto_dek_ctx *ctx = cb_ctx;
+	struct rte_crypto_cipher_xform *cipher_ctx = &ctx->xform->cipher;
+	bool is_wrapped = ctx->priv->is_wrapped_mode;
+
+	if (cipher_ctx->algo != RTE_CRYPTO_CIPHER_AES_XTS) {
+		DRV_LOG(ERR, "Only AES-XTS algo supported.");
+		return -EINVAL;
+	}
+	dek_attr->key_purpose = MLX5_CRYPTO_KEY_PURPOSE_AES_XTS;
+	dek_attr->has_keytag = 1;
+	if (is_wrapped) {
+		switch (cipher_ctx->key.length) {
+		case 48:
+			dek->size = 48;
+			dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_128b;
+			break;
+		case 80:
+			dek->size = 80;
+			dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_256b;
+			break;
+		default:
+			DRV_LOG(ERR, "Wrapped key size not supported.");
+			return -EINVAL;
+		}
+	} else {
+		switch (cipher_ctx->key.length) {
+		case 32:
+			dek->size = 40;
+			dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_128b;
+			break;
+		case 64:
+			dek->size = 72;
+			dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_256b;
+			break;
+		default:
+			DRV_LOG(ERR, "Key size not supported.");
+			return -EINVAL;
+		}
+		memcpy(&dek_attr->key[cipher_ctx->key.length],
+						&ctx->priv->keytag, 8);
+	}
+	memcpy(&dek_attr->key, cipher_ctx->key.data, cipher_ctx->key.length);
+	memcpy(&dek->data, cipher_ctx->key.data, cipher_ctx->key.length);
+	return 0;
+}
+
 static int
 mlx5_crypto_xts_sym_session_configure(struct rte_cryptodev *dev,
 				      struct rte_crypto_sym_xform *xform,
@@ -66,7 +117,7 @@ mlx5_crypto_xts_sym_session_configure(struct rte_cryptodev *dev,
 		return -ENOTSUP;
 	}
 	cipher = &xform->cipher;
-	sess_private_data->dek = mlx5_crypto_dek_prepare(priv, cipher);
+	sess_private_data->dek = mlx5_crypto_dek_prepare(priv, xform);
 	if (sess_private_data->dek == NULL) {
 		DRV_LOG(ERR, "Failed to prepare dek.");
 		return -ENOMEM;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 5/9] crypto/mlx5: add AES-GCM session configure
  2023-05-26  3:14 ` [PATCH v2 0/9] crypto/mlx5: support AES-GCM Suanming Mou
                     ` (3 preceding siblings ...)
  2023-05-26  3:14   ` [PATCH v2 4/9] crypto/mlx5: add AES-GCM encryption key Suanming Mou
@ 2023-05-26  3:14   ` Suanming Mou
  2023-05-26  3:14   ` [PATCH v2 6/9] common/mlx5: add WQE-based QP synchronous basics Suanming Mou
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-05-26  3:14 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland

Sessions are used in symmetric transformations in order to prepare
objects and data for packet processing stage.

The AES-GCM session includes IV, AAD, digest(tag), DEK, operation
mode information.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/common/mlx5/mlx5_prm.h        | 12 +++++++
 drivers/crypto/mlx5/mlx5_crypto.h     | 40 ++++++++++++++++++-----
 drivers/crypto/mlx5/mlx5_crypto_gcm.c | 47 +++++++++++++++++++++++++++
 3 files changed, 91 insertions(+), 8 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index b4446f56b9..3b26499a47 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -523,11 +523,23 @@ enum {
 	MLX5_BLOCK_SIZE_4048B	= 0x6,
 };
 
+enum {
+	MLX5_ENCRYPTION_TYPE_AES_GCM = 0x3,
+};
+
+enum {
+	MLX5_CRYPTO_OP_TYPE_ENCRYPTION = 0x0,
+	MLX5_CRYPTO_OP_TYPE_DECRYPTION = 0x1,
+};
+
 #define MLX5_BSF_SIZE_OFFSET		30
 #define MLX5_BSF_P_TYPE_OFFSET		24
 #define MLX5_ENCRYPTION_ORDER_OFFSET	16
 #define MLX5_BLOCK_SIZE_OFFSET		24
 
+#define MLX5_CRYPTO_MMO_TYPE_OFFSET 24
+#define MLX5_CRYPTO_MMO_OP_OFFSET 20
+
 struct mlx5_wqe_umr_bsf_seg {
 	/*
 	 * bs_bpt_eo_es contains:
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index bb5a557a38..6cb4d4ddec 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -72,16 +72,40 @@ struct mlx5_crypto_devarg_params {
 };
 
 struct mlx5_crypto_session {
-	uint32_t bs_bpt_eo_es;
-	/**< bsf_size, bsf_p_type, encryption_order and encryption standard,
-	 * saved in big endian format.
-	 */
-	uint32_t bsp_res;
-	/**< crypto_block_size_pointer and reserved 24 bits saved in big
-	 * endian format.
-	 */
+	union {
+		/**< AES-XTS configuration. */
+		struct {
+			uint32_t bs_bpt_eo_es;
+			/**< bsf_size, bsf_p_type, encryption_order and encryption standard,
+			 * saved in big endian format.
+			 */
+			uint32_t bsp_res;
+			/**< crypto_block_size_pointer and reserved 24 bits saved in big
+			 * endian format.
+			 */
+		};
+		/**< AES-GCM configuration. */
+		struct {
+			uint32_t mmo_ctrl;
+			/**< Crypto control fields with algo type and op type in big
+			 * endian format.
+			 */
+			uint32_t wqe_aad_len;
+			/**< Crypto AAD length field in big endian format. */
+			uint32_t wqe_tag_len;
+			/**< Crypto tag length field in big endian format. */
+			uint16_t tag_len;
+			/**< AES-GCM crypto digest size in bytes. */
+			uint16_t aad_len;
+			/**< The length of the additional authenticated data (AAD) in bytes. */
+			uint32_t op_type;
+			/**< Operation type. */
+		};
+	};
 	uint32_t iv_offset:16;
 	/**< Starting point for Initialisation Vector. */
+	uint32_t iv_len;
+	/**< Initialisation Vector length. */
 	struct mlx5_crypto_dek *dek; /**< Pointer to dek struct. */
 	uint32_t dek_id; /**< DEK ID */
 } __rte_packed;
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index 676bec6b18..6b6a3df57c 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -58,9 +58,56 @@ mlx5_crypto_dek_fill_gcm_attr(struct mlx5_crypto_dek *dek,
 	return 0;
 }
 
+static int
+mlx5_crypto_sym_gcm_session_configure(struct rte_cryptodev *dev,
+				  struct rte_crypto_sym_xform *xform,
+				  struct rte_cryptodev_sym_session *session)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_crypto_session *sess_private_data = CRYPTODEV_GET_SYM_SESS_PRIV(session);
+	struct rte_crypto_aead_xform *aead = &xform->aead;
+	uint32_t op_type;
+
+	if (unlikely(xform->next != NULL)) {
+		DRV_LOG(ERR, "Xform next is not supported.");
+		return -ENOTSUP;
+	}
+	if (aead->algo != RTE_CRYPTO_AEAD_AES_GCM) {
+		DRV_LOG(ERR, "Only AES-GCM algorithm is supported.");
+		return -ENOTSUP;
+	}
+	if (aead->op == RTE_CRYPTO_AEAD_OP_ENCRYPT)
+		op_type = MLX5_CRYPTO_OP_TYPE_ENCRYPTION;
+	else
+		op_type = MLX5_CRYPTO_OP_TYPE_DECRYPTION;
+	sess_private_data->op_type = op_type;
+	sess_private_data->mmo_ctrl = rte_cpu_to_be_32
+			(op_type << MLX5_CRYPTO_MMO_OP_OFFSET |
+			 MLX5_ENCRYPTION_TYPE_AES_GCM << MLX5_CRYPTO_MMO_TYPE_OFFSET);
+	sess_private_data->aad_len = aead->aad_length;
+	sess_private_data->tag_len = aead->digest_length;
+	sess_private_data->iv_offset = aead->iv.offset;
+	sess_private_data->iv_len = aead->iv.length;
+	sess_private_data->dek = mlx5_crypto_dek_prepare(priv, xform);
+	if (sess_private_data->dek == NULL) {
+		DRV_LOG(ERR, "Failed to prepare dek.");
+		return -ENOMEM;
+	}
+	sess_private_data->dek_id =
+			rte_cpu_to_be_32(sess_private_data->dek->obj->id &
+					 0xffffff);
+	DRV_LOG(DEBUG, "Session %p was configured.", sess_private_data);
+	return 0;
+}
+
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 {
+	struct rte_cryptodev *crypto_dev = priv->crypto_dev;
+	struct rte_cryptodev_ops *dev_ops = crypto_dev->dev_ops;
+
+	/* Override AES-GCM specified ops. */
+	dev_ops->sym_session_configure = mlx5_crypto_sym_gcm_session_configure;
 	priv->caps = mlx5_crypto_gcm_caps;
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 6/9] common/mlx5: add WQE-based QP synchronous basics
  2023-05-26  3:14 ` [PATCH v2 0/9] crypto/mlx5: support AES-GCM Suanming Mou
                     ` (4 preceding siblings ...)
  2023-05-26  3:14   ` [PATCH v2 5/9] crypto/mlx5: add AES-GCM session configure Suanming Mou
@ 2023-05-26  3:14   ` Suanming Mou
  2023-05-26  3:14   ` [PATCH v2 7/9] crypto/mlx5: add queue pair setup for GCM Suanming Mou
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-05-26  3:14 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland

Nvidia HW provides a synchronous mechanism between QPs. When
creating the QPs, user can set one as primary and another as
follower. The follower QP's WQE execution can be controlled
by primary QP via SEND_EN WQE.

This commit introduces the SEND_EN WQE to improve the WQE
execution sync-up between primary and follower QPs.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  6 ++++++
 drivers/common/mlx5/mlx5_devx_cmds.h |  3 +++
 drivers/common/mlx5/mlx5_prm.h       | 11 +++++++++++
 3 files changed, 20 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 4332081165..ef87862a6d 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2475,6 +2475,12 @@ mlx5_devx_cmd_create_qp(void *ctx,
 				 attr->dbr_umem_valid);
 			MLX5_SET(qpc, qpc, dbr_umem_id, attr->dbr_umem_id);
 		}
+		if (attr->cd_master)
+			MLX5_SET(qpc, qpc, cd_master, attr->cd_master);
+		if (attr->cd_slave_send)
+			MLX5_SET(qpc, qpc, cd_slave_send, attr->cd_slave_send);
+		if (attr->cd_slave_recv)
+			MLX5_SET(qpc, qpc, cd_slave_receive, attr->cd_slave_recv);
 		MLX5_SET64(qpc, qpc, dbr_addr, attr->dbr_address);
 		MLX5_SET64(create_qp_in, in, wq_umem_offset,
 			   attr->wq_umem_offset);
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index cb3f3a211b..e071cd841f 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -559,6 +559,9 @@ struct mlx5_devx_qp_attr {
 	uint64_t wq_umem_offset;
 	uint32_t user_index:24;
 	uint32_t mmo:1;
+	uint32_t cd_master:1;
+	uint32_t cd_slave_send:1;
+	uint32_t cd_slave_recv:1;
 };
 
 struct mlx5_devx_virtio_q_couners_attr {
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 3b26499a47..96d5eb8de3 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -589,6 +589,17 @@ struct mlx5_rdma_write_wqe {
 	struct mlx5_wqe_dseg dseg[];
 } __rte_packed;
 
+struct mlx5_wqe_send_en_seg {
+	uint32_t reserve[2];
+	uint32_t sqnpc;
+	uint32_t qpn;
+} __rte_packed;
+
+struct mlx5_wqe_send_en_wqe {
+	struct mlx5_wqe_cseg ctr;
+	struct mlx5_wqe_send_en_seg sseg;
+} __rte_packed;
+
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 7/9] crypto/mlx5: add queue pair setup for GCM
  2023-05-26  3:14 ` [PATCH v2 0/9] crypto/mlx5: support AES-GCM Suanming Mou
                     ` (5 preceding siblings ...)
  2023-05-26  3:14   ` [PATCH v2 6/9] common/mlx5: add WQE-based QP synchronous basics Suanming Mou
@ 2023-05-26  3:14   ` Suanming Mou
  2023-05-26  3:14   ` [PATCH v2 8/9] crypto/mlx5: add enqueue and dequeue operations Suanming Mou
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-05-26  3:14 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland

Crypto queue pair is for handling the encryption/decryption operations.

As AES-GCM AEAD API provides AAD, mbuf, digest separately, low-level FW
only accepts the data in a single contiguous memory region, two internal
QPs are created for AES-GCM queue pair. One for organizing the memory
to be contiguous if they are not. The other is for crypto.

If the buffers are checked as implicitly contiguous, the buffer will be
sent to the crypto QP directly for encryption/decryption. If not, the
buffers will be handled by the first UMR QP. The UMR QP will convert
the buffers to be contiguous one. Then the well organized "new" buffer
can be handled by crypto QP.

The crypto QP is initialized as follower, and UMR as leader. Once
crypto operation input buffer requires memory address space converting
by UMR QP, the crypto QP processing will be triggered by UMR QP.
Otherwise, the ring crypto QP doorbell directly.

The existing max_segs_num devarg is used for define how many segments
the chained mbuf contains same as AES-XTS before.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/common/mlx5/mlx5_common_mr.h  |   1 +
 drivers/common/mlx5/mlx5_prm.h        |  22 +++
 drivers/common/mlx5/version.map       |   2 +
 drivers/crypto/mlx5/mlx5_crypto.h     |  15 ++
 drivers/crypto/mlx5/mlx5_crypto_gcm.c | 230 ++++++++++++++++++++++++++
 5 files changed, 270 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_common_mr.h b/drivers/common/mlx5/mlx5_common_mr.h
index 66623868a2..8789d403b1 100644
--- a/drivers/common/mlx5/mlx5_common_mr.h
+++ b/drivers/common/mlx5/mlx5_common_mr.h
@@ -254,6 +254,7 @@ __rte_internal
 void
 mlx5_common_verbs_dereg_mr(struct mlx5_pmd_mr *pmd_mr);
 
+__rte_internal
 void
 mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb, mlx5_dereg_mr_t *dereg_mr_cb);
 
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 96d5eb8de3..a502e29bd8 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -470,6 +470,15 @@ struct mlx5_wqe_rseg {
 #define MLX5_UMRC_KO_OFFSET 16u
 #define MLX5_UMRC_TO_BS_OFFSET 0u
 
+/*
+ * As PRM describes, the address of the UMR pointer must be
+ * aligned to 2KB.
+ */
+#define MLX5_UMR_KLM_PTR_ALIGN (1 << 11)
+
+#define MLX5_UMR_KLM_NUM_ALIGN \
+	(MLX5_UMR_KLM_PTR_ALIGN / sizeof(struct mlx5_klm))
+
 struct mlx5_wqe_umr_cseg {
 	uint32_t if_cf_toe_cq_res;
 	uint32_t ko_to_bs;
@@ -674,6 +683,19 @@ union mlx5_gga_compress_opaque {
 	uint32_t data[64];
 };
 
+union mlx5_gga_crypto_opaque {
+	struct {
+		uint32_t syndrome;
+		uint32_t reserved0[2];
+		struct {
+			uint32_t iv[3];
+			uint32_t tag_size;
+			uint32_t aad_size;
+		} cp __rte_packed;
+	} __rte_packed;
+	uint8_t data[64];
+};
+
 struct mlx5_ifc_regexp_mmo_control_bits {
 	uint8_t reserved_at_31[0x2];
 	uint8_t le[0x1];
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index f860b069de..0758ba76de 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -159,5 +159,7 @@ INTERNAL {
 
 	mlx5_os_interrupt_handler_create; # WINDOWS_NO_EXPORT
 	mlx5_os_interrupt_handler_destroy; # WINDOWS_NO_EXPORT
+
+	mlx5_os_set_reg_mr_cb;
 	local: *;
 };
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index 6cb4d4ddec..88a09a6b1c 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -28,8 +28,11 @@ struct mlx5_crypto_priv {
 	TAILQ_ENTRY(mlx5_crypto_priv) next;
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
 	struct rte_cryptodev *crypto_dev;
+	mlx5_reg_mr_t reg_mr_cb; /* Callback to reg_mr func */
+	mlx5_dereg_mr_t dereg_mr_cb; /* Callback to dereg_mr func */
 	struct mlx5_uar uar; /* User Access Region. */
 	uint32_t max_segs_num; /* Maximum supported data segs. */
+	uint32_t max_klm_num; /* Maximum supported klm. */
 	struct mlx5_hlist *dek_hlist; /* Dek hash list. */
 	const struct rte_cryptodev_capabilities *caps;
 	struct rte_cryptodev_config dev_config;
@@ -46,15 +49,27 @@ struct mlx5_crypto_qp {
 	struct mlx5_crypto_priv *priv;
 	struct mlx5_devx_cq cq_obj;
 	struct mlx5_devx_qp qp_obj;
+	struct mlx5_devx_qp umr_qp_obj;
 	struct rte_cryptodev_stats stats;
 	struct rte_crypto_op **ops;
 	struct mlx5_devx_obj **mkey; /* WQE's indirect mekys. */
+	struct mlx5_klm *klm_array;
+	union mlx5_gga_crypto_opaque *opaque_addr;
 	struct mlx5_mr_ctrl mr_ctrl;
+	struct mlx5_pmd_mr mr;
+	/* Crypto QP. */
 	uint8_t *wqe;
 	uint16_t entries_n;
+	uint16_t cq_entries_n;
 	uint16_t pi;
 	uint16_t ci;
 	uint16_t db_pi;
+	/* UMR QP. */
+	uint8_t *umr_wqe;
+	uint16_t umr_wqbbs;
+	uint16_t umr_pi;
+	uint16_t umr_ci;
+	uint32_t umr_errors;
 };
 
 struct mlx5_crypto_dek {
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index 6b6a3df57c..dfef5455b4 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -18,6 +18,20 @@
 #include "mlx5_crypto_utils.h"
 #include "mlx5_crypto.h"
 
+/*
+ * AES-GCM uses indirect KLM mode. The UMR WQE comprises of WQE control +
+ * UMR control + mkey context + indirect KLM. The WQE size is aligned to
+ * be 3 WQEBBS.
+ */
+#define MLX5_UMR_GCM_WQE_SIZE \
+	(RTE_ALIGN(sizeof(struct mlx5_umr_wqe) + sizeof(struct mlx5_wqe_dseg), \
+			MLX5_SEND_WQE_BB))
+
+#define MLX5_UMR_GCM_WQE_SET_SIZE \
+	(MLX5_UMR_GCM_WQE_SIZE + \
+	 RTE_ALIGN(sizeof(struct mlx5_wqe_send_en_wqe), \
+	 MLX5_SEND_WQE_BB))
+
 static struct rte_cryptodev_capabilities mlx5_crypto_gcm_caps[] = {
 	{
 		.op = RTE_CRYPTO_OP_TYPE_UNDEFINED,
@@ -84,6 +98,8 @@ mlx5_crypto_sym_gcm_session_configure(struct rte_cryptodev *dev,
 	sess_private_data->mmo_ctrl = rte_cpu_to_be_32
 			(op_type << MLX5_CRYPTO_MMO_OP_OFFSET |
 			 MLX5_ENCRYPTION_TYPE_AES_GCM << MLX5_CRYPTO_MMO_TYPE_OFFSET);
+	sess_private_data->wqe_aad_len = rte_cpu_to_be_32((uint32_t)aead->aad_length);
+	sess_private_data->wqe_tag_len = rte_cpu_to_be_32((uint32_t)aead->digest_length);
 	sess_private_data->aad_len = aead->aad_length;
 	sess_private_data->tag_len = aead->digest_length;
 	sess_private_data->iv_offset = aead->iv.offset;
@@ -100,6 +116,216 @@ mlx5_crypto_sym_gcm_session_configure(struct rte_cryptodev *dev,
 	return 0;
 }
 
+static void *
+mlx5_crypto_gcm_mkey_klm_update(struct mlx5_crypto_priv *priv,
+				struct mlx5_crypto_qp *qp __rte_unused,
+				uint32_t idx)
+{
+	return &qp->klm_array[idx * priv->max_klm_num];
+}
+
+static int
+mlx5_crypto_gcm_qp_release(struct rte_cryptodev *dev, uint16_t qp_id)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_crypto_qp *qp = dev->data->queue_pairs[qp_id];
+
+	if (qp->umr_qp_obj.qp != NULL)
+		mlx5_devx_qp_destroy(&qp->umr_qp_obj);
+	if (qp->qp_obj.qp != NULL)
+		mlx5_devx_qp_destroy(&qp->qp_obj);
+	if (qp->cq_obj.cq != NULL)
+		mlx5_devx_cq_destroy(&qp->cq_obj);
+	if (qp->mr.obj != NULL) {
+		void *opaq = qp->mr.addr;
+
+		priv->dereg_mr_cb(&qp->mr);
+		rte_free(opaq);
+	}
+	mlx5_crypto_indirect_mkeys_release(qp, qp->entries_n);
+	mlx5_mr_btree_free(&qp->mr_ctrl.cache_bh);
+	rte_free(qp);
+	dev->data->queue_pairs[qp_id] = NULL;
+	return 0;
+}
+
+static void
+mlx5_crypto_gcm_init_qp(struct mlx5_crypto_qp *qp)
+{
+	volatile struct mlx5_gga_wqe *restrict wqe =
+				    (volatile struct mlx5_gga_wqe *)qp->qp_obj.wqes;
+	volatile union mlx5_gga_crypto_opaque *opaq = qp->opaque_addr;
+	const uint32_t sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) | 4u);
+	const uint32_t flags = RTE_BE32(MLX5_COMP_ALWAYS <<
+					MLX5_COMP_MODE_OFFSET);
+	const uint32_t opaq_lkey = rte_cpu_to_be_32(qp->mr.lkey);
+	int i;
+
+	/* All the next fields state should stay constant. */
+	for (i = 0; i < qp->entries_n; ++i, ++wqe) {
+		wqe->sq_ds = sq_ds;
+		wqe->flags = flags;
+		wqe->opaque_lkey = opaq_lkey;
+		wqe->opaque_vaddr = rte_cpu_to_be_64((uint64_t)(uintptr_t)&opaq[i]);
+	}
+}
+
+static inline int
+mlx5_crypto_gcm_umr_qp_setup(struct rte_cryptodev *dev, struct mlx5_crypto_qp *qp,
+			     int socket_id)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_devx_qp_attr attr = {0};
+	uint32_t ret;
+	uint32_t log_wqbb_n;
+
+	/* Set UMR + SEND_EN WQE as maximum same with crypto. */
+	log_wqbb_n = rte_log2_u32(qp->entries_n *
+			(MLX5_UMR_GCM_WQE_SET_SIZE / MLX5_SEND_WQE_BB));
+	attr.pd = priv->cdev->pdn;
+	attr.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar.obj);
+	attr.cqn = qp->cq_obj.cq->id;
+	attr.num_of_receive_wqes = 0;
+	attr.num_of_send_wqbbs = RTE_BIT32(log_wqbb_n);
+	attr.ts_format =
+		mlx5_ts_format_conv(priv->cdev->config.hca_attr.qp_ts_format);
+	attr.cd_master = 1;
+	ret = mlx5_devx_qp_create(priv->cdev->ctx, &qp->umr_qp_obj,
+				  attr.num_of_send_wqbbs * MLX5_SEND_WQE_BB,
+				  &attr, socket_id);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to create UMR QP.");
+		return -1;
+	}
+	if (mlx5_devx_qp2rts(&qp->umr_qp_obj, qp->umr_qp_obj.qp->id)) {
+		DRV_LOG(ERR, "Failed to change UMR QP state to RTS.");
+		return -1;
+	}
+	/* Save the UMR WQEBBS for checking the WQE boundary. */
+	qp->umr_wqbbs = attr.num_of_send_wqbbs;
+	return 0;
+}
+
+static int
+mlx5_crypto_gcm_qp_setup(struct rte_cryptodev *dev, uint16_t qp_id,
+			 const struct rte_cryptodev_qp_conf *qp_conf,
+			 int socket_id)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_hca_attr *attr = &priv->cdev->config.hca_attr;
+	struct mlx5_crypto_qp *qp;
+	struct mlx5_devx_cq_attr cq_attr = {
+		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
+	};
+	struct mlx5_devx_qp_attr qp_attr = {
+		.pd = priv->cdev->pdn,
+		.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
+		.user_index = qp_id,
+	};
+	struct mlx5_devx_mkey_attr mkey_attr = {
+		.pd = priv->cdev->pdn,
+		.umr_en = 1,
+		.klm_num = priv->max_klm_num,
+	};
+	uint32_t log_ops_n = rte_log2_u32(qp_conf->nb_descriptors);
+	uint32_t entries = RTE_BIT32(log_ops_n);
+	uint32_t alloc_size = sizeof(*qp);
+	size_t mr_size, opaq_size;
+	void *mr_buf;
+	int ret;
+
+	alloc_size = RTE_ALIGN(alloc_size, RTE_CACHE_LINE_SIZE);
+	alloc_size += (sizeof(struct rte_crypto_op *) +
+		       sizeof(struct mlx5_devx_obj *)) * entries;
+	qp = rte_zmalloc_socket(__func__, alloc_size, RTE_CACHE_LINE_SIZE,
+				socket_id);
+	if (qp == NULL) {
+		DRV_LOG(ERR, "Failed to allocate qp memory.");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	qp->priv = priv;
+	qp->entries_n = entries;
+	if (mlx5_mr_ctrl_init(&qp->mr_ctrl, &priv->cdev->mr_scache.dev_gen,
+				  priv->dev_config.socket_id)) {
+		DRV_LOG(ERR, "Cannot allocate MR Btree for qp %u.",
+			(uint32_t)qp_id);
+		rte_errno = ENOMEM;
+		goto err;
+	}
+	/*
+	 * The following KLM pointer must be aligned with
+	 * MLX5_UMR_KLM_PTR_ALIGN. Aligned opaq_size here
+	 * to make the KLM pointer with offset be aligned.
+	 */
+	opaq_size = RTE_ALIGN(sizeof(union mlx5_gga_crypto_opaque) * entries,
+			      MLX5_UMR_KLM_PTR_ALIGN);
+	mr_size = (priv->max_klm_num * sizeof(struct mlx5_klm) * entries) + opaq_size;
+	mr_buf = rte_calloc(__func__, (size_t)1, mr_size, MLX5_UMR_KLM_PTR_ALIGN);
+	if (mr_buf == NULL) {
+		DRV_LOG(ERR, "Failed to allocate mr memory.");
+		rte_errno = ENOMEM;
+		goto err;
+	}
+	if (priv->reg_mr_cb(priv->cdev->pd, mr_buf, mr_size, &qp->mr) != 0) {
+		rte_free(mr_buf);
+		DRV_LOG(ERR, "Failed to register opaque MR.");
+		rte_errno = ENOMEM;
+		goto err;
+	}
+	qp->opaque_addr = qp->mr.addr;
+	qp->klm_array = RTE_PTR_ADD(qp->opaque_addr, opaq_size);
+	/*
+	 * Triple the CQ size as UMR QP which contains UMR and SEND_EN WQE
+	 * will share this CQ .
+	 */
+	qp->cq_entries_n = rte_align32pow2(entries * 3);
+	ret = mlx5_devx_cq_create(priv->cdev->ctx, &qp->cq_obj,
+				  rte_log2_u32(qp->cq_entries_n),
+				  &cq_attr, socket_id);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Failed to create CQ.");
+		goto err;
+	}
+	qp_attr.cqn = qp->cq_obj.cq->id;
+	qp_attr.ts_format = mlx5_ts_format_conv(attr->qp_ts_format);
+	qp_attr.num_of_receive_wqes = 0;
+	qp_attr.num_of_send_wqbbs = entries;
+	qp_attr.mmo = attr->crypto_mmo.crypto_mmo_qp;
+	/* Set MMO QP as follower as the input data may depend on UMR. */
+	qp_attr.cd_slave_send = 1;
+	ret = mlx5_devx_qp_create(priv->cdev->ctx, &qp->qp_obj,
+				  qp_attr.num_of_send_wqbbs * MLX5_WQE_SIZE,
+				  &qp_attr, socket_id);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Failed to create QP.");
+		goto err;
+	}
+	mlx5_crypto_gcm_init_qp(qp);
+	ret = mlx5_devx_qp2rts(&qp->qp_obj, 0);
+	if (ret)
+		goto err;
+	qp->ops = (struct rte_crypto_op **)(qp + 1);
+	qp->mkey = (struct mlx5_devx_obj **)(qp->ops + entries);
+	if (mlx5_crypto_gcm_umr_qp_setup(dev, qp, socket_id)) {
+		DRV_LOG(ERR, "Failed to setup UMR QP.");
+		goto err;
+	}
+	DRV_LOG(INFO, "QP %u: SQN=0x%X CQN=0x%X entries num = %u",
+		(uint32_t)qp_id, qp->qp_obj.qp->id, qp->cq_obj.cq->id, entries);
+	if (mlx5_crypto_indirect_mkeys_prepare(priv, qp, &mkey_attr,
+					       mlx5_crypto_gcm_mkey_klm_update)) {
+		DRV_LOG(ERR, "Cannot allocate indirect memory regions.");
+		rte_errno = ENOMEM;
+		goto err;
+	}
+	dev->data->queue_pairs[qp_id] = qp;
+	return 0;
+err:
+	mlx5_crypto_gcm_qp_release(dev, qp_id);
+	return -1;
+}
+
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 {
@@ -108,6 +334,10 @@ mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 
 	/* Override AES-GCM specified ops. */
 	dev_ops->sym_session_configure = mlx5_crypto_sym_gcm_session_configure;
+	mlx5_os_set_reg_mr_cb(&priv->reg_mr_cb, &priv->dereg_mr_cb);
+	dev_ops->queue_pair_setup = mlx5_crypto_gcm_qp_setup;
+	dev_ops->queue_pair_release = mlx5_crypto_gcm_qp_release;
+	priv->max_klm_num = RTE_ALIGN((priv->max_segs_num + 1) * 2 + 1, MLX5_UMR_KLM_NUM_ALIGN);
 	priv->caps = mlx5_crypto_gcm_caps;
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 8/9] crypto/mlx5: add enqueue and dequeue operations
  2023-05-26  3:14 ` [PATCH v2 0/9] crypto/mlx5: support AES-GCM Suanming Mou
                     ` (6 preceding siblings ...)
  2023-05-26  3:14   ` [PATCH v2 7/9] crypto/mlx5: add queue pair setup for GCM Suanming Mou
@ 2023-05-26  3:14   ` Suanming Mou
  2023-05-26  3:14   ` [PATCH v2 9/9] crypto/mlx5: enable AES-GCM capability Suanming Mou
  2023-06-14 18:11   ` [EXT] [PATCH v2 0/9] crypto/mlx5: support AES-GCM Akhil Goyal
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-05-26  3:14 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko; +Cc: dev, rasland

The crypto operations are performed with crypto WQE. If the input
buffers(AAD, mbuf, digest) are not contiguous and there is no enough
headroom/tailroom for copying AAD/digest, as the requirement from FW,
an UMR WQE is needed to generate contiguous address space for crypto
WQE. The UMR WQE and crypto WQE are handled in two different QPs.

Crypto operation with non-contiguous buffers will have its own UMR
WQE, while the operation with contiguous buffers doesn't need the
UMR WQE. Once the all the operations WQE in the enqueue burst built
finishes, if any UMR WQEs are built, an additional SEND_EN WQE will
be as the final WQE of the burst in the UMR QP. The purpose of that
SEND_EN WQE is to trigger the crypto QP processing with the UMR ready
input memory address space buffers.

The QP for crypto operations contains only the crypto WQE and the QP
WQEs are built as fixed in QP setup. The QP processing is triggered
by doorbell ring or the SEND_EN WQE from UMR QP.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/common/mlx5/mlx5_prm.h        |   1 +
 drivers/crypto/mlx5/mlx5_crypto.c     |   9 +-
 drivers/crypto/mlx5/mlx5_crypto.h     |   8 +
 drivers/crypto/mlx5/mlx5_crypto_gcm.c | 588 ++++++++++++++++++++++++++
 4 files changed, 604 insertions(+), 2 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index a502e29bd8..98b71a4031 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -617,6 +617,7 @@ struct mlx5_wqe_send_en_wqe {
 /* MMO metadata segment */
 
 #define	MLX5_OPCODE_MMO	0x2fu
+#define	MLX5_OPC_MOD_MMO_CRYPTO 0x6u
 #define	MLX5_OPC_MOD_MMO_REGEX 0x4u
 #define	MLX5_OPC_MOD_MMO_COMP 0x2u
 #define	MLX5_OPC_MOD_MMO_DECOMP 0x3u
diff --git a/drivers/crypto/mlx5/mlx5_crypto.c b/drivers/crypto/mlx5/mlx5_crypto.c
index ff632cd69a..4d7d3ef2a3 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -62,8 +62,13 @@ mlx5_crypto_dev_infos_get(struct rte_cryptodev *dev,
 			MLX5_CRYPTO_FEATURE_FLAGS(priv->is_wrapped_mode);
 		dev_info->capabilities = priv->caps;
 		dev_info->max_nb_queue_pairs = MLX5_CRYPTO_MAX_QPS;
-		dev_info->min_mbuf_headroom_req = 0;
-		dev_info->min_mbuf_tailroom_req = 0;
+		if (priv->caps->sym.xform_type == RTE_CRYPTO_SYM_XFORM_AEAD) {
+			dev_info->min_mbuf_headroom_req = MLX5_CRYPTO_GCM_MAX_AAD;
+			dev_info->min_mbuf_tailroom_req = MLX5_CRYPTO_GCM_MAX_DIGEST;
+		} else {
+			dev_info->min_mbuf_headroom_req = 0;
+			dev_info->min_mbuf_tailroom_req = 0;
+		}
 		dev_info->sym.max_nb_sessions = 0;
 		/*
 		 * If 0, the device does not have any limitation in number of
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index 88a09a6b1c..6dcb41b27c 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -23,6 +23,8 @@
 #define MLX5_CRYPTO_KLM_SEGS_NUM(umr_wqe_sz) ((umr_wqe_sz -\
 					MLX5_CRYPTO_UMR_WQE_STATIC_SIZE) /\
 					MLX5_WSEG_SIZE)
+#define MLX5_CRYPTO_GCM_MAX_AAD 64
+#define MLX5_CRYPTO_GCM_MAX_DIGEST 16
 
 struct mlx5_crypto_priv {
 	TAILQ_ENTRY(mlx5_crypto_priv) next;
@@ -61,6 +63,9 @@ struct mlx5_crypto_qp {
 	uint8_t *wqe;
 	uint16_t entries_n;
 	uint16_t cq_entries_n;
+	uint16_t reported_ci;
+	uint16_t qp_ci;
+	uint16_t cq_ci;
 	uint16_t pi;
 	uint16_t ci;
 	uint16_t db_pi;
@@ -70,6 +75,9 @@ struct mlx5_crypto_qp {
 	uint16_t umr_pi;
 	uint16_t umr_ci;
 	uint32_t umr_errors;
+	uint16_t last_gga_pi;
+	bool has_umr;
+	uint16_t cpy_tag_op;
 };
 
 struct mlx5_crypto_dek {
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index dfef5455b4..2231bcbe6f 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -9,6 +9,7 @@
 #include <rte_log.h>
 #include <bus_pci_driver.h>
 #include <rte_memory.h>
+#include <rte_io.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
@@ -32,6 +33,40 @@
 	 RTE_ALIGN(sizeof(struct mlx5_wqe_send_en_wqe), \
 	 MLX5_SEND_WQE_BB))
 
+#define MLX5_UMR_GCM_WQE_STRIDE \
+	(MLX5_UMR_GCM_WQE_SIZE / MLX5_SEND_WQE_BB)
+
+#define MLX5_MMO_CRYPTO_OPC (MLX5_OPCODE_MMO | \
+	(MLX5_OPC_MOD_MMO_CRYPTO << WQE_CSEG_OPC_MOD_OFFSET))
+
+/*
+ * The status default value is RTE_CRYPTO_OP_STATUS_SUCCESS.
+ * Copy tag should fill different value to status.
+ */
+#define MLX5_CRYPTO_OP_STATUS_GCM_TAG_COPY (RTE_CRYPTO_OP_STATUS_SUCCESS + 1)
+
+struct mlx5_crypto_gcm_op_info {
+	bool need_umr;
+	bool is_oop;
+	bool is_enc;
+	void *digest;
+	void *src_addr;
+};
+
+struct mlx5_crypto_gcm_data {
+	void *src_addr;
+	uint32_t src_bytes;
+	void *dst_addr;
+	uint32_t dst_bytes;
+	uint32_t src_mkey;
+	uint32_t dst_mkey;
+};
+
+struct mlx5_crypto_gcm_tag_cpy_info {
+	void *digest;
+	uint8_t tag_len;
+} __rte_packed;
+
 static struct rte_cryptodev_capabilities mlx5_crypto_gcm_caps[] = {
 	{
 		.op = RTE_CRYPTO_OP_TYPE_UNDEFINED,
@@ -326,6 +361,557 @@ mlx5_crypto_gcm_qp_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 	return -1;
 }
 
+static __rte_always_inline void
+mlx5_crypto_gcm_get_op_info(struct mlx5_crypto_qp *qp,
+			    struct rte_crypto_op *op,
+			    struct mlx5_crypto_gcm_op_info *op_info)
+{
+	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+	struct rte_mbuf *m_src = op->sym->m_src;
+	void *aad_addr = op->sym->aead.aad.data;
+	void *tag_addr = op->sym->aead.digest.data;
+	void *src_addr = rte_pktmbuf_mtod_offset(m_src, void *, op->sym->aead.data.offset);
+	struct rte_mbuf *m_dst = m_src;
+	void *dst_addr = src_addr;
+	void *expected_aad = NULL;
+	void *expected_tag = NULL;
+	bool is_enc = sess->op_type == MLX5_CRYPTO_OP_TYPE_ENCRYPTION;
+	bool cp_aad = false;
+	bool cp_tag = false;
+
+	op_info->is_oop = false;
+	op_info->need_umr = false;
+	op_info->is_enc = is_enc;
+	op_info->digest = NULL;
+	op_info->src_addr = aad_addr;
+	if (op->sym->m_dst && op->sym->m_dst != m_src) {
+		op_info->is_oop = true;
+		m_dst = op->sym->m_dst;
+		dst_addr = rte_pktmbuf_mtod_offset(m_dst, void *, op->sym->aead.data.offset);
+		if (m_dst->nb_segs > 1) {
+			op_info->need_umr = true;
+			return;
+		}
+		/*
+		 * If the op's mbuf has extra data offset, don't copy AAD to
+		 * this area.
+		 */
+		if (rte_pktmbuf_headroom(m_dst) < sess->aad_len ||
+		    op->sym->aead.data.offset) {
+			op_info->need_umr = true;
+			return;
+		}
+	}
+	if (m_src->nb_segs > 1) {
+		op_info->need_umr = true;
+		return;
+	}
+	expected_aad = RTE_PTR_SUB(src_addr, sess->aad_len);
+	if (expected_aad != aad_addr) {
+		/*
+		 * If the op's mbuf has extra data offset, don't copy AAD to
+		 * this area.
+		 */
+		if (sess->aad_len > MLX5_CRYPTO_GCM_MAX_AAD ||
+		    sess->aad_len > rte_pktmbuf_headroom(m_src) ||
+		    op->sym->aead.data.offset) {
+			op_info->need_umr = true;
+			return;
+		}
+		cp_aad = true;
+		op_info->src_addr = expected_aad;
+	}
+	expected_tag = RTE_PTR_ADD(is_enc ? dst_addr : src_addr, op->sym->aead.data.length);
+	if (expected_tag != tag_addr) {
+		struct rte_mbuf *mbuf = is_enc ? m_dst : m_src;
+
+		/*
+		 * If op's mbuf is not fully set as payload, don't copy digest to
+		 * the left area.
+		 */
+		if (rte_pktmbuf_tailroom(mbuf) < sess->tag_len ||
+		    rte_pktmbuf_data_len(mbuf) != op->sym->aead.data.length) {
+			op_info->need_umr = true;
+			return;
+		}
+		if (is_enc) {
+			op_info->digest = expected_tag;
+			qp->cpy_tag_op++;
+		} else {
+			cp_tag = true;
+		}
+	}
+	if (cp_aad)
+		memcpy(expected_aad, aad_addr, sess->aad_len);
+	if (cp_tag)
+		memcpy(expected_tag, tag_addr, sess->tag_len);
+}
+
+static __rte_always_inline uint32_t
+_mlx5_crypto_gcm_umr_build_mbuf_klm(struct mlx5_crypto_qp *qp,
+				    struct rte_mbuf *mbuf,
+				    struct mlx5_klm *klm,
+				    uint32_t offset,
+				    uint32_t *remain)
+{
+	uint32_t data_len = (rte_pktmbuf_data_len(mbuf) - offset);
+	uintptr_t addr = rte_pktmbuf_mtod_offset(mbuf, uintptr_t, offset);
+
+	if (data_len > *remain)
+		data_len = *remain;
+	*remain -= data_len;
+	klm->byte_count = rte_cpu_to_be_32(data_len);
+	klm->address = rte_cpu_to_be_64(addr);
+	klm->mkey = mlx5_mr_mb2mr(&qp->mr_ctrl, mbuf);
+	return klm->mkey;
+}
+
+static __rte_always_inline int
+mlx5_crypto_gcm_build_mbuf_chain_klms(struct mlx5_crypto_qp *qp,
+				      struct rte_crypto_op *op,
+				      struct rte_mbuf *mbuf,
+				      struct mlx5_klm *klm)
+{
+	uint32_t remain_len = op->sym->aead.data.length;
+	__rte_unused uint32_t nb_segs = mbuf->nb_segs;
+	uint32_t klm_n = 0;
+
+	/* mbuf seg num should be less than max_segs_num. */
+	MLX5_ASSERT(nb_segs <= qp->priv->max_segs_num);
+	/* First mbuf needs to take the data offset. */
+	if (unlikely(_mlx5_crypto_gcm_umr_build_mbuf_klm(qp, mbuf, klm,
+		     op->sym->aead.data.offset, &remain_len) == UINT32_MAX)) {
+		op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+		return 0;
+	}
+	klm++;
+	klm_n++;
+	while (remain_len) {
+		nb_segs--;
+		mbuf = mbuf->next;
+		MLX5_ASSERT(mbuf && nb_segs);
+		if (unlikely(_mlx5_crypto_gcm_umr_build_mbuf_klm(qp, mbuf, klm,
+						0, &remain_len) == UINT32_MAX)) {
+			op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+			return 0;
+		}
+		klm++;
+		klm_n++;
+	}
+	return klm_n;
+}
+
+static __rte_always_inline int
+mlx5_crypto_gcm_build_klm_by_addr(struct mlx5_crypto_qp *qp,
+				  struct mlx5_klm *klm,
+				  void *addr,
+				  uint32_t len)
+{
+	klm->byte_count = rte_cpu_to_be_32(len);
+	klm->address = rte_cpu_to_be_64((uintptr_t)addr);
+	klm->mkey = mlx5_mr_addr2mr_bh(&qp->mr_ctrl, (uintptr_t)addr);
+	if (klm->mkey == UINT32_MAX)
+		return 0;
+	return 1;
+}
+
+static __rte_always_inline int
+mlx5_crypto_gcm_build_op_klm(struct mlx5_crypto_qp *qp,
+			     struct rte_crypto_op *op,
+			     struct mlx5_crypto_gcm_op_info *op_info,
+			     struct mlx5_klm *klm,
+			     uint32_t *len)
+{
+	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+	struct mlx5_klm *digest = NULL, *aad = NULL;
+	uint32_t total_len = op->sym->aead.data.length + sess->aad_len + sess->tag_len;
+	uint32_t klm_n = 0, klm_src = 0, klm_dst = 0;
+
+	/* Build AAD KLM. */
+	aad = klm;
+	if (!mlx5_crypto_gcm_build_klm_by_addr(qp, aad, op->sym->aead.aad.data, sess->aad_len))
+		return 0;
+	klm_n++;
+	/* Build src mubf KLM. */
+	klm_src = mlx5_crypto_gcm_build_mbuf_chain_klms(qp, op, op->sym->m_src, &klm[klm_n]);
+	if (!klm_src)
+		return 0;
+	klm_n += klm_src;
+	/* Reserve digest KLM if needed. */
+	if (!op_info->is_oop ||
+	    sess->op_type == MLX5_CRYPTO_OP_TYPE_DECRYPTION) {
+		digest = &klm[klm_n];
+		klm_n++;
+	}
+	/* Build dst mbuf KLM. */
+	if (op_info->is_oop) {
+		klm[klm_n] = *aad;
+		klm_n++;
+		klm_dst = mlx5_crypto_gcm_build_mbuf_chain_klms(qp, op, op->sym->m_dst,
+								&klm[klm_n]);
+		if (!klm_dst)
+			return 0;
+		klm_n += klm_dst;
+		total_len += (op->sym->aead.data.length + sess->aad_len);
+	}
+	/* Update digest at the end if it is not set. */
+	if (!digest) {
+		digest = &klm[klm_n];
+		klm_n++;
+	}
+	/* Build digest KLM. */
+	if (!mlx5_crypto_gcm_build_klm_by_addr(qp, digest, op->sym->aead.digest.data,
+					       sess->tag_len))
+		return 0;
+	*len = total_len;
+	return klm_n;
+}
+
+static __rte_always_inline struct mlx5_wqe_cseg *
+mlx5_crypto_gcm_get_umr_wqe(struct mlx5_crypto_qp *qp)
+{
+	uint32_t wqe_offset = qp->umr_pi & (qp->umr_wqbbs - 1);
+	uint32_t left_wqbbs = qp->umr_wqbbs - wqe_offset;
+	struct mlx5_wqe_cseg *wqe;
+
+	/* If UMR WQE is near the boundary. */
+	if (left_wqbbs < MLX5_UMR_GCM_WQE_STRIDE) {
+		/* Append NOP WQE as the left WQEBBS is not enough for UMR. */
+		wqe = RTE_PTR_ADD(qp->umr_qp_obj.umem_buf, wqe_offset * MLX5_SEND_WQE_BB);
+		wqe->opcode = rte_cpu_to_be_32(MLX5_OPCODE_NOP | ((uint32_t)qp->umr_pi << 8));
+		wqe->sq_ds = rte_cpu_to_be_32((qp->umr_qp_obj.qp->id << 8) | (left_wqbbs << 2));
+		wqe->flags = RTE_BE32(0);
+		wqe->misc = RTE_BE32(0);
+		qp->umr_pi += left_wqbbs;
+		wqe_offset = qp->umr_pi & (qp->umr_wqbbs - 1);
+	}
+	wqe_offset *= MLX5_SEND_WQE_BB;
+	return RTE_PTR_ADD(qp->umr_qp_obj.umem_buf, wqe_offset);
+}
+
+static __rte_always_inline int
+mlx5_crypto_gcm_build_umr(struct mlx5_crypto_qp *qp,
+			  struct rte_crypto_op *op,
+			  uint32_t idx,
+			  struct mlx5_crypto_gcm_op_info *op_info,
+			  struct mlx5_crypto_gcm_data *data)
+{
+	struct mlx5_crypto_priv *priv = qp->priv;
+	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+	struct mlx5_wqe_cseg *wqe;
+	struct mlx5_wqe_umr_cseg *ucseg;
+	struct mlx5_wqe_mkey_cseg *mkc;
+	struct mlx5_klm *iklm;
+	struct mlx5_klm *klm = &qp->klm_array[idx * priv->max_klm_num];
+	uint16_t klm_size, klm_align;
+	uint32_t total_len;
+
+	/* Build KLM base on the op. */
+	klm_size = mlx5_crypto_gcm_build_op_klm(qp, op, op_info, klm, &total_len);
+	if (!klm_size)
+		return -EINVAL;
+	klm_align = RTE_ALIGN(klm_size, 4);
+	/* Get UMR WQE memory. */
+	wqe = mlx5_crypto_gcm_get_umr_wqe(qp);
+	memset(wqe, 0, MLX5_UMR_GCM_WQE_SIZE);
+	/* Set WQE control seg. Non-inline KLM UMR WQE size must be 9 WQE_DS. */
+	wqe->opcode = rte_cpu_to_be_32(MLX5_OPCODE_UMR | ((uint32_t)qp->umr_pi << 8));
+	wqe->sq_ds = rte_cpu_to_be_32((qp->umr_qp_obj.qp->id << 8) | 9);
+	wqe->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR << MLX5_COMP_MODE_OFFSET);
+	wqe->misc = rte_cpu_to_be_32(qp->mkey[idx]->id);
+	/* Set UMR WQE control seg. */
+	ucseg = (struct mlx5_wqe_umr_cseg *)(wqe + 1);
+	ucseg->mkey_mask |= RTE_BE64(1u << 0);
+	ucseg->ko_to_bs = rte_cpu_to_be_32(klm_align << MLX5_UMRC_KO_OFFSET);
+	/* Set mkey context seg. */
+	mkc = (struct mlx5_wqe_mkey_cseg *)(ucseg + 1);
+	mkc->len = rte_cpu_to_be_64(total_len);
+	mkc->qpn_mkey = rte_cpu_to_be_32(0xffffff00 | (qp->mkey[idx]->id & 0xff));
+	/* Set UMR pointer to data seg. */
+	iklm = (struct mlx5_klm *)(mkc + 1);
+	iklm->address = rte_cpu_to_be_64((uintptr_t)((char *)klm));
+	iklm->mkey = rte_cpu_to_be_32(qp->mr.lkey);
+	data->src_mkey = rte_cpu_to_be_32(qp->mkey[idx]->id);
+	data->dst_mkey = data->src_mkey;
+	data->src_addr = 0;
+	data->src_bytes = sess->aad_len + op->sym->aead.data.length;
+	data->dst_bytes = data->src_bytes;
+	if (op_info->is_enc)
+		data->dst_bytes += sess->tag_len;
+	else
+		data->src_bytes += sess->tag_len;
+	if (op_info->is_oop)
+		data->dst_addr = (void *)(uintptr_t)(data->src_bytes);
+	else
+		data->dst_addr = 0;
+	/* Clear the padding memory. */
+	memset(&klm[klm_size], 0, sizeof(struct mlx5_klm) * (klm_align - klm_size));
+	/* Update PI and WQE */
+	qp->umr_pi += MLX5_UMR_GCM_WQE_STRIDE;
+	qp->umr_wqe = (uint8_t *)wqe;
+	return 0;
+}
+
+static __rte_always_inline void
+mlx5_crypto_gcm_build_send_en(struct mlx5_crypto_qp *qp)
+{
+	uint32_t wqe_offset = (qp->umr_pi & (qp->umr_wqbbs - 1)) * MLX5_SEND_WQE_BB;
+	struct mlx5_wqe_cseg *cs = RTE_PTR_ADD(qp->umr_qp_obj.wqes, wqe_offset);
+	struct mlx5_wqe_qseg *qs = RTE_PTR_ADD(cs, sizeof(struct mlx5_wqe_cseg));
+
+	cs->opcode = rte_cpu_to_be_32(MLX5_OPCODE_SEND_EN | ((uint32_t)qp->umr_pi << 8));
+	cs->sq_ds = rte_cpu_to_be_32((qp->umr_qp_obj.qp->id << 8) | 2);
+	/*
+	 * No need to generate the SEND_EN CQE as we want only GGA CQE
+	 * in the CQ normally. We can compare qp->last_send_gga_pi with
+	 * qp->pi to know if all SEND_EN be consumed.
+	 */
+	cs->flags = RTE_BE32((MLX5_COMP_ONLY_FIRST_ERR << MLX5_COMP_MODE_OFFSET) |
+			MLX5_WQE_CTRL_INITIATOR_SMALL_FENCE);
+	cs->misc = RTE_BE32(0);
+	qs->max_index = rte_cpu_to_be_32(qp->pi);
+	qs->qpn_cqn = rte_cpu_to_be_32(qp->qp_obj.qp->id);
+	qp->umr_wqe = (uint8_t *)cs;
+	qp->umr_pi += 1;
+}
+
+static __rte_always_inline void
+mlx5_crypto_gcm_wqe_set(struct mlx5_crypto_qp *qp,
+			struct rte_crypto_op *op,
+			uint32_t idx,
+			struct mlx5_crypto_gcm_data *data)
+{
+	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+	struct mlx5_gga_wqe *wqe = &((struct mlx5_gga_wqe *)qp->qp_obj.wqes)[idx];
+	union mlx5_gga_crypto_opaque *opaq = qp->opaque_addr;
+
+	memcpy(opaq[idx].cp.iv,
+		rte_crypto_op_ctod_offset(op, uint8_t *, sess->iv_offset), sess->iv_len);
+	opaq[idx].cp.tag_size = sess->wqe_tag_len;
+	opaq[idx].cp.aad_size = sess->wqe_aad_len;
+	/* Update control seg. */
+	wqe->opcode = rte_cpu_to_be_32(MLX5_MMO_CRYPTO_OPC + (qp->pi << 8));
+	wqe->gga_ctrl1 = sess->mmo_ctrl;
+	wqe->gga_ctrl2 = sess->dek_id;
+	wqe->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR << MLX5_COMP_MODE_OFFSET);
+	/* Update op_info seg. */
+	wqe->gather.bcount = rte_cpu_to_be_32(data->src_bytes);
+	wqe->gather.lkey = data->src_mkey;
+	wqe->gather.pbuf = rte_cpu_to_be_64((uintptr_t)data->src_addr);
+	/* Update output seg. */
+	wqe->scatter.bcount = rte_cpu_to_be_32(data->dst_bytes);
+	wqe->scatter.lkey = data->dst_mkey;
+	wqe->scatter.pbuf = rte_cpu_to_be_64((uintptr_t)data->dst_addr);
+	qp->wqe = (uint8_t *)wqe;
+}
+
+static uint16_t
+mlx5_crypto_gcm_enqueue_burst(void *queue_pair,
+			      struct rte_crypto_op **ops,
+			      uint16_t nb_ops)
+{
+	struct mlx5_crypto_qp *qp = queue_pair;
+	struct mlx5_crypto_session *sess;
+	struct mlx5_crypto_priv *priv = qp->priv;
+	struct mlx5_crypto_gcm_tag_cpy_info *tag;
+	struct mlx5_crypto_gcm_data gcm_data;
+	struct rte_crypto_op *op;
+	struct mlx5_crypto_gcm_op_info op_info;
+	uint16_t mask = qp->entries_n - 1;
+	uint16_t remain = qp->entries_n - (qp->pi - qp->qp_ci);
+	uint32_t idx;
+	uint16_t umr_cnt = 0;
+
+	if (remain < nb_ops)
+		nb_ops = remain;
+	else
+		remain = nb_ops;
+	if (unlikely(remain == 0))
+		return 0;
+	do {
+		op = *ops++;
+		sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+		idx = qp->pi & mask;
+		mlx5_crypto_gcm_get_op_info(qp, op, &op_info);
+		if (!op_info.need_umr) {
+			gcm_data.src_addr = op_info.src_addr;
+			gcm_data.src_bytes = op->sym->aead.data.length + sess->aad_len;
+			gcm_data.src_mkey = mlx5_mr_mb2mr(&qp->mr_ctrl, op->sym->m_src);
+			if (op_info.is_oop) {
+				gcm_data.dst_addr = RTE_PTR_SUB
+					(rte_pktmbuf_mtod_offset(op->sym->m_dst,
+					 void *, op->sym->aead.data.offset), sess->aad_len);
+				gcm_data.dst_mkey = mlx5_mr_mb2mr(&qp->mr_ctrl, op->sym->m_dst);
+			} else {
+				gcm_data.dst_addr = gcm_data.src_addr;
+				gcm_data.dst_mkey = gcm_data.src_mkey;
+			}
+			gcm_data.dst_bytes = gcm_data.src_bytes;
+			if (op_info.is_enc)
+				gcm_data.dst_bytes += sess->tag_len;
+			else
+				gcm_data.src_bytes += sess->tag_len;
+		} else {
+			if (unlikely(mlx5_crypto_gcm_build_umr(qp, op, idx,
+							&op_info, &gcm_data))) {
+				qp->stats.enqueue_err_count++;
+				if (remain != nb_ops) {
+					qp->stats.enqueued_count -= remain;
+					break;
+				}
+				return 0;
+			}
+			umr_cnt++;
+		}
+		mlx5_crypto_gcm_wqe_set(qp, op, idx, &gcm_data);
+		if (op_info.digest) {
+			tag = (struct mlx5_crypto_gcm_tag_cpy_info *)op->sym->aead.digest.data;
+			tag->digest = op_info.digest;
+			tag->tag_len = sess->tag_len;
+			op->status = MLX5_CRYPTO_OP_STATUS_GCM_TAG_COPY;
+		} else {
+			op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
+		}
+		qp->ops[idx] = op;
+		qp->pi++;
+	} while (--remain);
+	qp->stats.enqueued_count += nb_ops;
+	/* Update the last GGA cseg with COMP. */
+	((struct mlx5_wqe_cseg *)qp->wqe)->flags =
+		RTE_BE32(MLX5_COMP_ALWAYS << MLX5_COMP_MODE_OFFSET);
+	/* Only when there are no pending SEND_EN WQEs in background. */
+	if (!umr_cnt && !qp->has_umr) {
+		mlx5_doorbell_ring(&priv->uar.bf_db, *(volatile uint64_t *)qp->wqe,
+				   qp->pi, &qp->qp_obj.db_rec[MLX5_SND_DBR],
+				   !priv->uar.dbnc);
+	} else {
+		mlx5_crypto_gcm_build_send_en(qp);
+		mlx5_doorbell_ring(&priv->uar.bf_db, *(volatile uint64_t *)qp->umr_wqe,
+				   qp->umr_pi, &qp->umr_qp_obj.db_rec[MLX5_SND_DBR],
+				   !priv->uar.dbnc);
+		qp->last_gga_pi = qp->pi;
+		qp->has_umr = true;
+	}
+	return nb_ops;
+}
+
+static __rte_noinline void
+mlx5_crypto_gcm_cqe_err_handle(struct mlx5_crypto_qp *qp, struct rte_crypto_op *op)
+{
+	uint8_t op_code;
+	const uint32_t idx = qp->cq_ci & (qp->entries_n - 1);
+	volatile struct mlx5_err_cqe *cqe = (volatile struct mlx5_err_cqe *)
+							&qp->cq_obj.cqes[idx];
+
+	op_code = rte_be_to_cpu_32(cqe->s_wqe_opcode_qpn) >> MLX5_CQ_INDEX_WIDTH;
+	DRV_LOG(ERR, "CQE ERR:0x%x, Vender_ERR:0x%x, OP:0x%x, QPN:0x%x, WQE_CNT:0x%x",
+		cqe->syndrome, cqe->vendor_err_synd, op_code,
+		(rte_be_to_cpu_32(cqe->s_wqe_opcode_qpn) & 0xffffff),
+		rte_be_to_cpu_16(cqe->wqe_counter));
+	if (op && op_code == MLX5_OPCODE_MMO) {
+		op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+		qp->stats.dequeue_err_count++;
+	}
+}
+
+static __rte_always_inline void
+mlx5_crypto_gcm_fill_op(struct mlx5_crypto_qp *qp,
+			struct rte_crypto_op **ops,
+			uint16_t orci,
+			uint16_t rci,
+			uint16_t op_mask)
+{
+	uint16_t n;
+
+	orci &= op_mask;
+	rci &= op_mask;
+	if (unlikely(orci > rci)) {
+		n = op_mask - orci + 1;
+		memcpy(ops, &qp->ops[orci], n * sizeof(*ops));
+		orci = 0;
+	} else {
+		n = 0;
+	}
+	/* rci can be 0 here, memcpy will skip that. */
+	memcpy(&ops[n], &qp->ops[orci], (rci - orci) * sizeof(*ops));
+}
+
+static __rte_always_inline void
+mlx5_crypto_gcm_cpy_tag(struct mlx5_crypto_qp *qp,
+			uint16_t orci,
+			uint16_t rci,
+			uint16_t op_mask)
+{
+	struct rte_crypto_op *op;
+	struct mlx5_crypto_gcm_tag_cpy_info *tag;
+
+	while (qp->cpy_tag_op && orci != rci) {
+		op = qp->ops[orci & op_mask];
+		if (op->status == MLX5_CRYPTO_OP_STATUS_GCM_TAG_COPY) {
+			tag = (struct mlx5_crypto_gcm_tag_cpy_info *)op->sym->aead.digest.data;
+			memcpy(op->sym->aead.digest.data, tag->digest, tag->tag_len);
+			op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
+			qp->cpy_tag_op--;
+		}
+		orci++;
+	}
+}
+
+static uint16_t
+mlx5_crypto_gcm_dequeue_burst(void *queue_pair,
+			      struct rte_crypto_op **ops,
+			      uint16_t nb_ops)
+{
+	struct mlx5_crypto_qp *qp = queue_pair;
+	volatile struct mlx5_cqe *restrict cqe;
+	const unsigned int cq_size = qp->cq_entries_n;
+	const unsigned int mask = cq_size - 1;
+	const unsigned int op_mask = qp->entries_n - 1;
+	uint32_t idx;
+	uint32_t next_idx = qp->cq_ci & mask;
+	uint16_t reported_ci = qp->reported_ci;
+	uint16_t qp_ci = qp->qp_ci;
+	const uint16_t max = RTE_MIN((uint16_t)(qp->pi - reported_ci), nb_ops);
+	uint16_t op_num = 0;
+	int ret;
+
+	if (unlikely(max == 0))
+		return 0;
+	while (qp_ci - reported_ci < max) {
+		idx = next_idx;
+		next_idx = (qp->cq_ci + 1) & mask;
+		cqe = &qp->cq_obj.cqes[idx];
+		ret = check_cqe(cqe, cq_size, qp->cq_ci);
+		if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
+			if (unlikely(ret != MLX5_CQE_STATUS_HW_OWN))
+				mlx5_crypto_gcm_cqe_err_handle(qp,
+						qp->ops[reported_ci & op_mask]);
+			break;
+		}
+		qp_ci = rte_be_to_cpu_16(cqe->wqe_counter) + 1;
+		if (qp->has_umr &&
+		    (qp->last_gga_pi + 1) == qp_ci)
+			qp->has_umr = false;
+		qp->cq_ci++;
+	}
+	/* If wqe_counter changed, means CQE handled. */
+	if (likely(qp->qp_ci != qp_ci)) {
+		qp->qp_ci = qp_ci;
+		rte_io_wmb();
+		qp->cq_obj.db_rec[0] = rte_cpu_to_be_32(qp->cq_ci);
+	}
+	/* If reported_ci is not same with qp_ci, means op retrieved. */
+	if (qp_ci != reported_ci) {
+		op_num = RTE_MIN((uint16_t)(qp_ci - reported_ci), max);
+		reported_ci += op_num;
+		mlx5_crypto_gcm_cpy_tag(qp, qp->reported_ci, reported_ci, op_mask);
+		mlx5_crypto_gcm_fill_op(qp, ops, qp->reported_ci, reported_ci, op_mask);
+		qp->stats.dequeued_count += op_num;
+		qp->reported_ci = reported_ci;
+	}
+	return op_num;
+}
+
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 {
@@ -337,6 +923,8 @@ mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 	mlx5_os_set_reg_mr_cb(&priv->reg_mr_cb, &priv->dereg_mr_cb);
 	dev_ops->queue_pair_setup = mlx5_crypto_gcm_qp_setup;
 	dev_ops->queue_pair_release = mlx5_crypto_gcm_qp_release;
+	crypto_dev->dequeue_burst = mlx5_crypto_gcm_dequeue_burst;
+	crypto_dev->enqueue_burst = mlx5_crypto_gcm_enqueue_burst;
 	priv->max_klm_num = RTE_ALIGN((priv->max_segs_num + 1) * 2 + 1, MLX5_UMR_KLM_NUM_ALIGN);
 	priv->caps = mlx5_crypto_gcm_caps;
 	return 0;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 9/9] crypto/mlx5: enable AES-GCM capability
  2023-05-26  3:14 ` [PATCH v2 0/9] crypto/mlx5: support AES-GCM Suanming Mou
                     ` (7 preceding siblings ...)
  2023-05-26  3:14   ` [PATCH v2 8/9] crypto/mlx5: add enqueue and dequeue operations Suanming Mou
@ 2023-05-26  3:14   ` Suanming Mou
  2023-06-14 18:11   ` [EXT] [PATCH v2 0/9] crypto/mlx5: support AES-GCM Akhil Goyal
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-05-26  3:14 UTC (permalink / raw)
  To: Matan Azrad; +Cc: dev, rasland

This commit generates AES-GCM capability based on the NIC
attributes and enables AES-GCM algo.

An new devarg "algo" is added to identify if the crypto PMD will
be initialized as AES-GCM(algo=1) or AES-XTS(algo=0, default).

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 doc/guides/cryptodevs/mlx5.rst         | 48 +++++++++++++++++++-
 doc/guides/rel_notes/release_23_07.rst |  1 +
 drivers/crypto/mlx5/mlx5_crypto.c      | 26 +++++++++--
 drivers/crypto/mlx5/mlx5_crypto.h      |  1 +
 drivers/crypto/mlx5/mlx5_crypto_gcm.c  | 63 ++++++++++++++++++++++++++
 5 files changed, 134 insertions(+), 5 deletions(-)

diff --git a/doc/guides/cryptodevs/mlx5.rst b/doc/guides/cryptodevs/mlx5.rst
index b35ac5f5f2..9a0ae8b0d2 100644
--- a/doc/guides/cryptodevs/mlx5.rst
+++ b/doc/guides/cryptodevs/mlx5.rst
@@ -21,6 +21,11 @@ and **NVIDIA BlueField-3** family adapters.
 Overview
 --------
 
+Nvidia MLX5 crypto driver supports AES-XTs and AES-GCM cryption.
+
+AES-XTS
+-------
+
 The device can provide disk encryption services,
 allowing data encryption and decryption towards a disk.
 Having all encryption/decryption operations done in a single device
@@ -38,13 +43,19 @@ The encryption does not require text to be aligned to the AES block size (128b).
 
 See :doc:`../../platform/mlx5` guide for more design details.
 
+AES-GCM
+-------
+The encryption and decryption processes the traffic as standard RTE crypto
+API defines. The supported AAD/digest/key size can be read from dev_info.
+
+
 Configuration
 -------------
 
 See the :ref:`mlx5 common configuration <mlx5_common_env>`.
 
 A device comes out of NVIDIA factory with pre-defined import methods.
-There are two possible import methods: wrapped or plaintext.
+There are two possible import methods: wrapped or plaintext(valid to AES-XTS only).
 
 In case the device is in wrapped mode, it needs to be moved to crypto operational mode.
 In order to move the device to crypto operational mode, credential and KEK
@@ -120,24 +131,36 @@ Driver options
 Please refer to :ref:`mlx5 common options <mlx5_common_driver_options>`
 for an additional list of options shared with other mlx5 drivers.
 
+- ``algo`` parameter [int]
+
+  - 0. AES-XTS crypto.
+
+  - 1. AES-GCM crypto.
+
+  Set to zero(AES-XTS) by default.
+
 - ``wcs_file`` parameter [string] - mandatory in wrapped mode
 
   File path including only the wrapped credential in string format of hexadecimal
   numbers, represent 48 bytes (8 bytes IV added by the AES key wrap algorithm).
+  This option is valid only to AES-XTS.
 
 - ``import_kek_id`` parameter [int]
 
   The identifier of the KEK, default value is 0 represents the operational
   register import_kek..
+  This option is valid only to AES-XTS.
 
 - ``credential_id`` parameter [int]
 
   The identifier of the credential, default value is 0 represents the operational
   register credential.
+  This option is valid only to AES-XTS.
 
 - ``keytag`` parameter [int]
 
   The plaintext of the keytag appended to the AES-XTS keys, default value is 0.
+  This option is valid only to AES-XTS.
 
 - ``max_segs_num`` parameter [int]
 
@@ -161,6 +184,8 @@ Limitations
 - The supported data-unit lengths are 512B and 4KB and 1MB. In case the `dataunit_len`
   is not provided in the cipher xform, the OP length is limited to the above
   values.
+- AES-GCM is only supported on BlueField-3.
+- AES-GCM only supported key import plaintext mode.
 
 
 Prerequisites
@@ -172,6 +197,7 @@ FW Prerequisites
 - xx.31.0328 for ConnectX-6.
 - xx.32.0108 for ConnectX-6 Dx and BlueField-2.
 - xx.36.xxxx for ConnectX-7 and BlueField-3.
+- xx.37.3010 for BlueField-3 and newer for AES-GCM.
 
 Linux Prerequisites
 ~~~~~~~~~~~~~~~~~~~
@@ -186,3 +212,23 @@ Windows Prerequisites
 
 - NVIDIA WINOF-2 version: **2.60** or higher.
   See :ref:`mlx5 common prerequisites <mlx5_windows_prerequisites>` for more details.
+
+
+Notes for rte_crypto AES-GCM
+----------------------------
+
+In AES-GCM mode, the HW requires continuous input and output of Additional
+Authenticated Data (AAD), payload, and digest (if needed). However, the RTE
+API only provides a single AAD input, which means that in the out-of-place
+mode, the AAD will be used in both input and output. This reuse of AAD in the
+out-of-place mode breaks the continuous output, which degrades the performance
+and introduces extra UMR WQE. If digest is not continuous after payload will
+also lead to that extra UMR WQE.
+
+To address this issue, current RTE API provides min_mbuf_headroom_req and
+min_mbuf_tailroom_req in rte_cryptodev_info as a hint to the PMD. It
+indicates the PMD can use the buffer before and after the mbuf payload as AAD
+and digest space. With this hint, the PMD will use the buffer before and
+after the mbuf payload directly via copying AAD and digest. However, the
+application must ensure that there is enough headroom and tailroom reserved
+for the mbuf. Or, for non-continuous operations, extra UMR WQE will be used.
diff --git a/doc/guides/rel_notes/release_23_07.rst b/doc/guides/rel_notes/release_23_07.rst
index 946f89e83b..fbbdceab0b 100644
--- a/doc/guides/rel_notes/release_23_07.rst
+++ b/doc/guides/rel_notes/release_23_07.rst
@@ -29,6 +29,7 @@ New Features
   * Added support for multi-packet RQ on Windows.
   * Added support for CQE compression on Windows.
   * Added support for enhanced multi-packet write on Windows.
+  * Added support for AES-GCM crypto.
 
 * **Added flow matching of tx queue.**
 
diff --git a/drivers/crypto/mlx5/mlx5_crypto.c b/drivers/crypto/mlx5/mlx5_crypto.c
index 4d7d3ef2a3..081e96ad4d 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -269,6 +269,14 @@ mlx5_crypto_args_check_handler(const char *key, const char *val, void *opaque)
 		attr->credential_pointer = (uint32_t)tmp;
 	} else if (strcmp(key, "keytag") == 0) {
 		devarg_prms->keytag = tmp;
+	} else if (strcmp(key, "algo") == 0) {
+		if (tmp == 1) {
+			devarg_prms->is_aes_gcm = 1;
+		} else if (tmp > 1) {
+			DRV_LOG(ERR, "Invalid algo.");
+			rte_errno = EINVAL;
+			return -rte_errno;
+		}
 	}
 	return 0;
 }
@@ -285,6 +293,7 @@ mlx5_crypto_parse_devargs(struct mlx5_kvargs_ctrl *mkvlist,
 		"keytag",
 		"max_segs_num",
 		"wcs_file",
+		"algo",
 		NULL,
 	};
 
@@ -370,10 +379,19 @@ mlx5_crypto_dev_probe(struct mlx5_common_device *cdev,
 	priv->crypto_dev = crypto_dev;
 	priv->is_wrapped_mode = wrapped_mode;
 	priv->max_segs_num = devarg_prms.max_segs_num;
-	ret = mlx5_crypto_xts_init(priv);
-	if (ret) {
-		DRV_LOG(ERR, "Failed to init AES-XTS crypto.");
-		return -ENOTSUP;
+	/* Init and override AES-GCM configuration. */
+	if (devarg_prms.is_aes_gcm) {
+		ret = mlx5_crypto_gcm_init(priv);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to init AES-GCM crypto.");
+			return -ENOTSUP;
+		}
+	} else {
+		ret = mlx5_crypto_xts_init(priv);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to init AES-XTS crypto.");
+			return -ENOTSUP;
+		}
 	}
 	if (mlx5_devx_uar_prepare(cdev, &priv->uar) != 0) {
 		rte_cryptodev_pmd_destroy(priv->crypto_dev);
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index 6dcb41b27c..36dacdcda4 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -92,6 +92,7 @@ struct mlx5_crypto_devarg_params {
 	struct mlx5_devx_crypto_login_attr login_attr;
 	uint64_t keytag;
 	uint32_t max_segs_num;
+	uint32_t is_aes_gcm:1;
 };
 
 struct mlx5_crypto_session {
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index 2231bcbe6f..d481cd0716 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -107,6 +107,60 @@ mlx5_crypto_dek_fill_gcm_attr(struct mlx5_crypto_dek *dek,
 	return 0;
 }
 
+static int
+mlx5_crypto_generate_gcm_cap(struct mlx5_hca_crypto_mmo_attr *mmo_attr,
+			     struct rte_cryptodev_capabilities *cap)
+{
+	/* Init key size. */
+	if (mmo_attr->gcm_128_encrypt && mmo_attr->gcm_128_decrypt &&
+		mmo_attr->gcm_256_encrypt && mmo_attr->gcm_256_decrypt) {
+		cap->sym.aead.key_size.min = 16;
+		cap->sym.aead.key_size.max = 32;
+		cap->sym.aead.key_size.increment = 16;
+	} else if (mmo_attr->gcm_256_encrypt && mmo_attr->gcm_256_decrypt) {
+		cap->sym.aead.key_size.min = 32;
+		cap->sym.aead.key_size.max = 32;
+		cap->sym.aead.key_size.increment = 0;
+	} else if (mmo_attr->gcm_128_encrypt && mmo_attr->gcm_128_decrypt) {
+		cap->sym.aead.key_size.min = 16;
+		cap->sym.aead.key_size.max = 16;
+		cap->sym.aead.key_size.increment = 0;
+	} else {
+		DRV_LOG(ERR, "No available AES-GCM encryption/decryption supported.");
+		return -1;
+	}
+	/* Init tag size. */
+	if (mmo_attr->gcm_auth_tag_128 && mmo_attr->gcm_auth_tag_96) {
+		cap->sym.aead.digest_size.min = 12;
+		cap->sym.aead.digest_size.max = 16;
+		cap->sym.aead.digest_size.increment = 4;
+	} else if (mmo_attr->gcm_auth_tag_96) {
+		cap->sym.aead.digest_size.min = 12;
+		cap->sym.aead.digest_size.max = 12;
+		cap->sym.aead.digest_size.increment = 0;
+	} else if (mmo_attr->gcm_auth_tag_128) {
+		cap->sym.aead.digest_size.min = 16;
+		cap->sym.aead.digest_size.max = 16;
+		cap->sym.aead.digest_size.increment = 0;
+	} else {
+		DRV_LOG(ERR, "No available AES-GCM tag size supported.");
+		return -1;
+	}
+	/* Init AAD size. */
+	cap->sym.aead.aad_size.min = 0;
+	cap->sym.aead.aad_size.max = UINT16_MAX;
+	cap->sym.aead.aad_size.increment = 1;
+	/* Init IV size. */
+	cap->sym.aead.iv_size.min = 12;
+	cap->sym.aead.iv_size.max = 12;
+	cap->sym.aead.iv_size.increment = 0;
+	/* Init left items. */
+	cap->op = RTE_CRYPTO_OP_TYPE_SYMMETRIC;
+	cap->sym.xform_type = RTE_CRYPTO_SYM_XFORM_AEAD;
+	cap->sym.aead.algo = RTE_CRYPTO_AEAD_AES_GCM;
+	return 0;
+}
+
 static int
 mlx5_crypto_sym_gcm_session_configure(struct rte_cryptodev *dev,
 				  struct rte_crypto_sym_xform *xform,
@@ -915,8 +969,10 @@ mlx5_crypto_gcm_dequeue_burst(void *queue_pair,
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 {
+	struct mlx5_common_device *cdev = priv->cdev;
 	struct rte_cryptodev *crypto_dev = priv->crypto_dev;
 	struct rte_cryptodev_ops *dev_ops = crypto_dev->dev_ops;
+	int ret;
 
 	/* Override AES-GCM specified ops. */
 	dev_ops->sym_session_configure = mlx5_crypto_sym_gcm_session_configure;
@@ -926,6 +982,13 @@ mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 	crypto_dev->dequeue_burst = mlx5_crypto_gcm_dequeue_burst;
 	crypto_dev->enqueue_burst = mlx5_crypto_gcm_enqueue_burst;
 	priv->max_klm_num = RTE_ALIGN((priv->max_segs_num + 1) * 2 + 1, MLX5_UMR_KLM_NUM_ALIGN);
+	/* Generate GCM capability. */
+	ret = mlx5_crypto_generate_gcm_cap(&cdev->config.hca_attr.crypto_mmo,
+					   mlx5_crypto_gcm_caps);
+	if (ret) {
+		DRV_LOG(ERR, "No enough AES-GCM cap.");
+		return -1;
+	}
 	priv->caps = mlx5_crypto_gcm_caps;
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [EXT] [PATCH v2 0/9] crypto/mlx5: support AES-GCM
  2023-05-26  3:14 ` [PATCH v2 0/9] crypto/mlx5: support AES-GCM Suanming Mou
                     ` (8 preceding siblings ...)
  2023-05-26  3:14   ` [PATCH v2 9/9] crypto/mlx5: enable AES-GCM capability Suanming Mou
@ 2023-06-14 18:11   ` Akhil Goyal
  2023-06-20  1:22     ` Suanming Mou
  9 siblings, 1 reply; 54+ messages in thread
From: Akhil Goyal @ 2023-06-14 18:11 UTC (permalink / raw)
  To: Suanming Mou; +Cc: dev, rasland

> AES-GCM provides both authenticated encryption and the ability to check
> the integrity and authentication of additional authenticated data (AAD)
> that is sent in the clear.
> 
> The crypto operations are performed with crypto WQE. If the input
> buffers(AAD, mbuf, digest) are not contiguous and there is no enough
> headroom or tailroom for AAD or digest, as the requirement from FW, an
> UMR WQE is needed to generate contiguous address space for crypto WQE.
> The UMR WQE and crypto WQE are handled in two different QPs.
> 
> The QP for UMR operation contains two types of WQE, UMR and SEND_EN
> WQE. The WQEs are built dynamically according to the crypto operation
> buffer address. Crypto operation with non-contiguous buffers will
> have its own UMR WQE, while the operation with contiguous buffers
> doesn't need the UMR WQE. Once the all the operations WQE in the
> enqueue burst built finishes, if any UMR WQEs are built, additional
> SEND_EN WQE will be as the final WQE of the burst in the UMR QP.
> The purpose of that SEND_EN WQE is to trigger the crypto QP processing
> with the UMR ready input memory address space buffers.
> 
> The QP for crypto operations contains only the crypto WQE and the QP
> WQEs are built as fixed in QP setup. The QP processing is triggered
> by doorbell ring or the SEND_EN WQE from UMR QP.
> \
Change log missing.

Please get it reviewed from PMD maintainer.
Also rebase the patchset, and fix checkpatch issues.


> Suanming Mou (9):
>   common/mlx5: export memory region lookup by address
>   crypto/mlx5: split AES-XTS
>   crypto/mlx5: add AES-GCM query and initialization
>   crypto/mlx5: add AES-GCM encryption key
>   crypto/mlx5: add AES-GCM session configure
>   common/mlx5: add WQE-based QP synchronous basics
>   crypto/mlx5: add queue pair setup for GCM
>   crypto/mlx5: add enqueue and dequeue operations
>   crypto/mlx5: enable AES-GCM capability
> 
>  doc/guides/cryptodevs/mlx5.rst         |  48 +-
>  doc/guides/rel_notes/release_23_07.rst |   1 +
>  drivers/common/mlx5/mlx5_common_mr.c   |   2 +-
>  drivers/common/mlx5/mlx5_common_mr.h   |   5 +
>  drivers/common/mlx5/mlx5_devx_cmds.c   |  21 +
>  drivers/common/mlx5/mlx5_devx_cmds.h   |  16 +
>  drivers/common/mlx5/mlx5_prm.h         |  65 +-
>  drivers/common/mlx5/version.map        |   3 +
>  drivers/crypto/mlx5/meson.build        |   2 +
>  drivers/crypto/mlx5/mlx5_crypto.c      | 673 ++---------------
>  drivers/crypto/mlx5/mlx5_crypto.h      | 101 ++-
>  drivers/crypto/mlx5/mlx5_crypto_dek.c  | 102 ++-
>  drivers/crypto/mlx5/mlx5_crypto_gcm.c  | 995 +++++++++++++++++++++++++
>  drivers/crypto/mlx5/mlx5_crypto_xts.c  | 645 ++++++++++++++++
>  14 files changed, 2014 insertions(+), 665 deletions(-)
>  create mode 100644 drivers/crypto/mlx5/mlx5_crypto_gcm.c
>  create mode 100644 drivers/crypto/mlx5/mlx5_crypto_xts.c
> 
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [EXT] [PATCH v2 0/9] crypto/mlx5: support AES-GCM
  2023-06-14 18:11   ` [EXT] [PATCH v2 0/9] crypto/mlx5: support AES-GCM Akhil Goyal
@ 2023-06-20  1:22     ` Suanming Mou
  0 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20  1:22 UTC (permalink / raw)
  To: Akhil Goyal; +Cc: dev, Raslan Darawsheh

Hi,

Sorry for the late response.

> -----Original Message-----
> From: Akhil Goyal <gakhil@marvell.com>
> Sent: Thursday, June 15, 2023 2:12 AM
> To: Suanming Mou <suanmingm@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>
> Subject: RE: [EXT] [PATCH v2 0/9] crypto/mlx5: support AES-GCM
> 
> > AES-GCM provides both authenticated encryption and the ability to
> > check the integrity and authentication of additional authenticated
> > data (AAD) that is sent in the clear.
> >
> > The crypto operations are performed with crypto WQE. If the input
> > buffers(AAD, mbuf, digest) are not contiguous and there is no enough
> > headroom or tailroom for AAD or digest, as the requirement from FW, an
> > UMR WQE is needed to generate contiguous address space for crypto WQE.
> > The UMR WQE and crypto WQE are handled in two different QPs.
> >
> > The QP for UMR operation contains two types of WQE, UMR and SEND_EN
> > WQE. The WQEs are built dynamically according to the crypto operation
> > buffer address. Crypto operation with non-contiguous buffers will have
> > its own UMR WQE, while the operation with contiguous buffers doesn't
> > need the UMR WQE. Once the all the operations WQE in the enqueue burst
> > built finishes, if any UMR WQEs are built, additional SEND_EN WQE will
> > be as the final WQE of the burst in the UMR QP.
> > The purpose of that SEND_EN WQE is to trigger the crypto QP processing
> > with the UMR ready input memory address space buffers.
> >
> > The QP for crypto operations contains only the crypto WQE and the QP
> > WQEs are built as fixed in QP setup. The QP processing is triggered by
> > doorbell ring or the SEND_EN WQE from UMR QP.
> > \
> Change log missing.

Wii updated in V3.

> 
> Please get it reviewed from PMD maintainer.
> Also rebase the patchset, and fix checkpatch issues.

The checkpatch issues are mainly due to the struct names false positive.

> 
> 
> > Suanming Mou (9):
> >   common/mlx5: export memory region lookup by address
> >   crypto/mlx5: split AES-XTS
> >   crypto/mlx5: add AES-GCM query and initialization
> >   crypto/mlx5: add AES-GCM encryption key
> >   crypto/mlx5: add AES-GCM session configure
> >   common/mlx5: add WQE-based QP synchronous basics
> >   crypto/mlx5: add queue pair setup for GCM
> >   crypto/mlx5: add enqueue and dequeue operations
> >   crypto/mlx5: enable AES-GCM capability
> >
> >  doc/guides/cryptodevs/mlx5.rst         |  48 +-
> >  doc/guides/rel_notes/release_23_07.rst |   1 +
> >  drivers/common/mlx5/mlx5_common_mr.c   |   2 +-
> >  drivers/common/mlx5/mlx5_common_mr.h   |   5 +
> >  drivers/common/mlx5/mlx5_devx_cmds.c   |  21 +
> >  drivers/common/mlx5/mlx5_devx_cmds.h   |  16 +
> >  drivers/common/mlx5/mlx5_prm.h         |  65 +-
> >  drivers/common/mlx5/version.map        |   3 +
> >  drivers/crypto/mlx5/meson.build        |   2 +
> >  drivers/crypto/mlx5/mlx5_crypto.c      | 673 ++---------------
> >  drivers/crypto/mlx5/mlx5_crypto.h      | 101 ++-
> >  drivers/crypto/mlx5/mlx5_crypto_dek.c  | 102 ++-
> > drivers/crypto/mlx5/mlx5_crypto_gcm.c  | 995 +++++++++++++++++++++++++
> > drivers/crypto/mlx5/mlx5_crypto_xts.c  | 645 ++++++++++++++++
> >  14 files changed, 2014 insertions(+), 665 deletions(-)  create mode
> > 100644 drivers/crypto/mlx5/mlx5_crypto_gcm.c
> >  create mode 100644 drivers/crypto/mlx5/mlx5_crypto_xts.c
> >
> > --
> > 2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 0/9] crypto/mlx5: support AES-GCM
  2023-04-18  9:23 [RFC PATCH 0/5] crypto/mlx5: support AES-GCM Suanming Mou
                   ` (5 preceding siblings ...)
  2023-05-26  3:14 ` [PATCH v2 0/9] crypto/mlx5: support AES-GCM Suanming Mou
@ 2023-06-20  1:23 ` Suanming Mou
  2023-06-20  1:23   ` [PATCH v3 1/9] common/mlx5: export memory region lookup by address Suanming Mou
                     ` (9 more replies)
  2023-06-20 14:11 ` [PATCH v4 " Suanming Mou
  7 siblings, 10 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20  1:23 UTC (permalink / raw)
  Cc: rasland, dev, gakhil

AES-GCM provides both authenticated encryption and the ability to check
the integrity and authentication of additional authenticated data (AAD)
that is sent in the clear.

The crypto operations are performed with crypto WQE. If the input
buffers(AAD, mbuf, digest) are not contiguous and there is no enough
headroom or tailroom for AAD or digest, as the requirement from FW, an
UMR WQE is needed to generate contiguous address space for crypto WQE.
The UMR WQE and crypto WQE are handled in two different QPs.

The QP for UMR operation contains two types of WQE, UMR and SEND_EN
WQE. The WQEs are built dynamically according to the crypto operation 
buffer address. Crypto operation with non-contiguous buffers will
have its own UMR WQE, while the operation with contiguous buffers   
doesn't need the UMR WQE. Once the all the operations WQE in the
enqueue burst built finishes, if any UMR WQEs are built, additional
SEND_EN WQE will be as the final WQE of the burst in the UMR QP.
The purpose of that SEND_EN WQE is to trigger the crypto QP processing
with the UMR ready input memory address space buffers.

The QP for crypto operations contains only the crypto WQE and the QP
WQEs are built as fixed in QP setup. The QP processing is triggered
by doorbell ring or the SEND_EN WQE from UMR QP.

v2:
  - split XTS and GCM code to different file.
  - add headroom and tailroom optimize.

v3:
 - fix AES-GCM 128b key creation.

Suanming Mou (9):
  common/mlx5: export memory region lookup by address
  crypto/mlx5: split AES-XTS
  crypto/mlx5: add AES-GCM query and initialization
  crypto/mlx5: add AES-GCM encryption key
  crypto/mlx5: add AES-GCM session configure
  common/mlx5: add WQE-based QP synchronous basics
  crypto/mlx5: add queue pair setup for GCM
  crypto/mlx5: add enqueue and dequeue operations
  crypto/mlx5: enable AES-GCM capability

 doc/guides/cryptodevs/mlx5.rst         |  48 +-
 doc/guides/rel_notes/release_23_07.rst |   1 +
 drivers/common/mlx5/mlx5_common_mr.c   |   2 +-
 drivers/common/mlx5/mlx5_common_mr.h   |   5 +
 drivers/common/mlx5/mlx5_devx_cmds.c   |  21 +
 drivers/common/mlx5/mlx5_devx_cmds.h   |  16 +
 drivers/common/mlx5/mlx5_prm.h         |  65 +-
 drivers/common/mlx5/version.map        |   3 +
 drivers/crypto/mlx5/meson.build        |   2 +
 drivers/crypto/mlx5/mlx5_crypto.c      | 673 ++---------------
 drivers/crypto/mlx5/mlx5_crypto.h      | 101 ++-
 drivers/crypto/mlx5/mlx5_crypto_dek.c  | 102 ++-
 drivers/crypto/mlx5/mlx5_crypto_gcm.c  | 997 +++++++++++++++++++++++++
 drivers/crypto/mlx5/mlx5_crypto_xts.c  | 645 ++++++++++++++++
 14 files changed, 2016 insertions(+), 665 deletions(-)
 create mode 100644 drivers/crypto/mlx5/mlx5_crypto_gcm.c
 create mode 100644 drivers/crypto/mlx5/mlx5_crypto_xts.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 1/9] common/mlx5: export memory region lookup by address
  2023-06-20  1:23 ` Suanming Mou
@ 2023-06-20  1:23   ` Suanming Mou
  2023-06-20  1:23   ` [PATCH v3 2/9] crypto/mlx5: split AES-XTS Suanming Mou
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20  1:23 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ori Kam; +Cc: rasland, dev, gakhil

In case user provides the address without mempool. Export the
function to lookup the address without mempool is required.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/common/mlx5/mlx5_common_mr.c | 2 +-
 drivers/common/mlx5/mlx5_common_mr.h | 4 ++++
 drivers/common/mlx5/version.map      | 1 +
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/common/mlx5/mlx5_common_mr.c b/drivers/common/mlx5/mlx5_common_mr.c
index 7b14b0c7bf..40ff9153bd 100644
--- a/drivers/common/mlx5/mlx5_common_mr.c
+++ b/drivers/common/mlx5/mlx5_common_mr.c
@@ -1059,7 +1059,7 @@ mr_lookup_caches(struct mlx5_mr_ctrl *mr_ctrl,
  * @return
  *   Searched LKey on success, UINT32_MAX on no match.
  */
-static uint32_t
+uint32_t
 mlx5_mr_addr2mr_bh(struct mlx5_mr_ctrl *mr_ctrl, uintptr_t addr)
 {
 	uint32_t lkey;
diff --git a/drivers/common/mlx5/mlx5_common_mr.h b/drivers/common/mlx5/mlx5_common_mr.h
index 12def1585f..66623868a2 100644
--- a/drivers/common/mlx5/mlx5_common_mr.h
+++ b/drivers/common/mlx5/mlx5_common_mr.h
@@ -240,6 +240,10 @@ mlx5_mr_create(struct mlx5_common_device *cdev,
 	       struct mlx5_mr_share_cache *share_cache,
 	       struct mr_cache_entry *entry, uintptr_t addr);
 
+__rte_internal
+uint32_t
+mlx5_mr_addr2mr_bh(struct mlx5_mr_ctrl *mr_ctrl, uintptr_t addr);
+
 /* mlx5_common_verbs.c */
 
 __rte_internal
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index e05e1aa8c5..f860b069de 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -122,6 +122,7 @@ INTERNAL {
 	mlx5_mr_ctrl_init;
 	mlx5_mr_flush_local_cache;
 	mlx5_mr_mb2mr_bh;
+	mlx5_mr_addr2mr_bh;
 
 	mlx5_nl_allmulti; # WINDOWS_NO_EXPORT
 	mlx5_nl_ifindex; # WINDOWS_NO_EXPORT
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 2/9] crypto/mlx5: split AES-XTS
  2023-06-20  1:23 ` Suanming Mou
  2023-06-20  1:23   ` [PATCH v3 1/9] common/mlx5: export memory region lookup by address Suanming Mou
@ 2023-06-20  1:23   ` Suanming Mou
  2023-06-20  1:23   ` [PATCH v3 3/9] crypto/mlx5: add AES-GCM query and initialization Suanming Mou
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20  1:23 UTC (permalink / raw)
  To: Matan Azrad; +Cc: rasland, dev, gakhil

As there will be other crypto algo be supported. This commit splits
AES-XTS code to another *_xts.c file. The mlx5_crypto.c file will
just contain the common code.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/crypto/mlx5/meson.build       |   1 +
 drivers/crypto/mlx5/mlx5_crypto.c     | 642 ++------------------------
 drivers/crypto/mlx5/mlx5_crypto.h     |  33 ++
 drivers/crypto/mlx5/mlx5_crypto_xts.c | 594 ++++++++++++++++++++++++
 4 files changed, 667 insertions(+), 603 deletions(-)
 create mode 100644 drivers/crypto/mlx5/mlx5_crypto_xts.c

diff --git a/drivers/crypto/mlx5/meson.build b/drivers/crypto/mlx5/meson.build
index a2691ec0f0..045e8ce81d 100644
--- a/drivers/crypto/mlx5/meson.build
+++ b/drivers/crypto/mlx5/meson.build
@@ -15,6 +15,7 @@ endif
 
 sources = files(
         'mlx5_crypto.c',
+	'mlx5_crypto_xts.c',
         'mlx5_crypto_dek.c',
 )
 
diff --git a/drivers/crypto/mlx5/mlx5_crypto.c b/drivers/crypto/mlx5/mlx5_crypto.c
index 5267f48c1e..2e6bcc6ddc 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -40,33 +40,6 @@ int mlx5_crypto_logtype;
 
 uint8_t mlx5_crypto_driver_id;
 
-const struct rte_cryptodev_capabilities mlx5_crypto_caps[] = {
-	{		/* AES XTS */
-		.op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
-		{.sym = {
-			.xform_type = RTE_CRYPTO_SYM_XFORM_CIPHER,
-			{.cipher = {
-				.algo = RTE_CRYPTO_CIPHER_AES_XTS,
-				.block_size = 16,
-				.key_size = {
-					.min = 32,
-					.max = 64,
-					.increment = 32
-				},
-				.iv_size = {
-					.min = 16,
-					.max = 16,
-					.increment = 0
-				},
-				.dataunit_set =
-				RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_512_BYTES |
-				RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_4096_BYTES |
-				RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_1_MEGABYTES,
-			}, }
-		}, }
-	},
-};
-
 static const char mlx5_crypto_drv_name[] = RTE_STR(MLX5_CRYPTO_DRIVER_NAME);
 
 static const struct rte_driver mlx5_drv = {
@@ -76,21 +49,6 @@ static const struct rte_driver mlx5_drv = {
 
 static struct cryptodev_driver mlx5_cryptodev_driver;
 
-struct mlx5_crypto_session {
-	uint32_t bs_bpt_eo_es;
-	/**< bsf_size, bsf_p_type, encryption_order and encryption standard,
-	 * saved in big endian format.
-	 */
-	uint32_t bsp_res;
-	/**< crypto_block_size_pointer and reserved 24 bits saved in big
-	 * endian format.
-	 */
-	uint32_t iv_offset:16;
-	/**< Starting point for Initialisation Vector. */
-	struct mlx5_crypto_dek *dek; /**< Pointer to dek struct. */
-	uint32_t dek_id; /**< DEK ID */
-} __rte_packed;
-
 static void
 mlx5_crypto_dev_infos_get(struct rte_cryptodev *dev,
 			  struct rte_cryptodev_info *dev_info)
@@ -102,7 +60,7 @@ mlx5_crypto_dev_infos_get(struct rte_cryptodev *dev,
 		dev_info->driver_id = mlx5_crypto_driver_id;
 		dev_info->feature_flags =
 			MLX5_CRYPTO_FEATURE_FLAGS(priv->is_wrapped_mode);
-		dev_info->capabilities = mlx5_crypto_caps;
+		dev_info->capabilities = priv->caps;
 		dev_info->max_nb_queue_pairs = MLX5_CRYPTO_MAX_QPS;
 		dev_info->min_mbuf_headroom_req = 0;
 		dev_info->min_mbuf_tailroom_req = 0;
@@ -114,6 +72,38 @@ mlx5_crypto_dev_infos_get(struct rte_cryptodev *dev,
 	}
 }
 
+void
+mlx5_crypto_indirect_mkeys_release(struct mlx5_crypto_qp *qp,
+				   uint16_t n)
+{
+	uint32_t i;
+
+	for (i = 0; i < n; i++)
+		if (qp->mkey[i])
+			claim_zero(mlx5_devx_cmd_destroy(qp->mkey[i]));
+}
+
+int
+mlx5_crypto_indirect_mkeys_prepare(struct mlx5_crypto_priv *priv,
+				   struct mlx5_crypto_qp *qp,
+				   struct mlx5_devx_mkey_attr *attr,
+				   mlx5_crypto_mkey_update_t update_cb)
+{
+	uint32_t i;
+
+	for (i = 0; i < qp->entries_n; i++) {
+		attr->klm_array = update_cb(priv, qp, i);
+		qp->mkey[i] = mlx5_devx_cmd_mkey_create(priv->cdev->ctx, attr);
+		if (!qp->mkey[i])
+			goto error;
+	}
+	return 0;
+error:
+	DRV_LOG(ERR, "Failed to allocate indirect mkey.");
+	mlx5_crypto_indirect_mkeys_release(qp, i);
+	return -1;
+}
+
 static int
 mlx5_crypto_dev_configure(struct rte_cryptodev *dev,
 			  struct rte_cryptodev_config *config)
@@ -168,72 +158,6 @@ mlx5_crypto_sym_session_get_size(struct rte_cryptodev *dev __rte_unused)
 	return sizeof(struct mlx5_crypto_session);
 }
 
-static int
-mlx5_crypto_sym_session_configure(struct rte_cryptodev *dev,
-				  struct rte_crypto_sym_xform *xform,
-				  struct rte_cryptodev_sym_session *session)
-{
-	struct mlx5_crypto_priv *priv = dev->data->dev_private;
-	struct mlx5_crypto_session *sess_private_data =
-		CRYPTODEV_GET_SYM_SESS_PRIV(session);
-	struct rte_crypto_cipher_xform *cipher;
-	uint8_t encryption_order;
-
-	if (unlikely(xform->next != NULL)) {
-		DRV_LOG(ERR, "Xform next is not supported.");
-		return -ENOTSUP;
-	}
-	if (unlikely((xform->type != RTE_CRYPTO_SYM_XFORM_CIPHER) ||
-		     (xform->cipher.algo != RTE_CRYPTO_CIPHER_AES_XTS))) {
-		DRV_LOG(ERR, "Only AES-XTS algorithm is supported.");
-		return -ENOTSUP;
-	}
-	cipher = &xform->cipher;
-	sess_private_data->dek = mlx5_crypto_dek_prepare(priv, cipher);
-	if (sess_private_data->dek == NULL) {
-		DRV_LOG(ERR, "Failed to prepare dek.");
-		return -ENOMEM;
-	}
-	if (cipher->op == RTE_CRYPTO_CIPHER_OP_ENCRYPT)
-		encryption_order = MLX5_ENCRYPTION_ORDER_ENCRYPTED_RAW_MEMORY;
-	else
-		encryption_order = MLX5_ENCRYPTION_ORDER_ENCRYPTED_RAW_WIRE;
-	sess_private_data->bs_bpt_eo_es = rte_cpu_to_be_32
-			(MLX5_BSF_SIZE_64B << MLX5_BSF_SIZE_OFFSET |
-			 MLX5_BSF_P_TYPE_CRYPTO << MLX5_BSF_P_TYPE_OFFSET |
-			 encryption_order << MLX5_ENCRYPTION_ORDER_OFFSET |
-			 MLX5_ENCRYPTION_STANDARD_AES_XTS);
-	switch (xform->cipher.dataunit_len) {
-	case 0:
-		sess_private_data->bsp_res = 0;
-		break;
-	case 512:
-		sess_private_data->bsp_res = rte_cpu_to_be_32
-					     ((uint32_t)MLX5_BLOCK_SIZE_512B <<
-					     MLX5_BLOCK_SIZE_OFFSET);
-		break;
-	case 4096:
-		sess_private_data->bsp_res = rte_cpu_to_be_32
-					     ((uint32_t)MLX5_BLOCK_SIZE_4096B <<
-					     MLX5_BLOCK_SIZE_OFFSET);
-		break;
-	case 1048576:
-		sess_private_data->bsp_res = rte_cpu_to_be_32
-					     ((uint32_t)MLX5_BLOCK_SIZE_1MB <<
-					     MLX5_BLOCK_SIZE_OFFSET);
-		break;
-	default:
-		DRV_LOG(ERR, "Cipher data unit length is not supported.");
-		return -ENOTSUP;
-	}
-	sess_private_data->iv_offset = cipher->iv.offset;
-	sess_private_data->dek_id =
-			rte_cpu_to_be_32(sess_private_data->dek->obj->id &
-					 0xffffff);
-	DRV_LOG(DEBUG, "Session %p was configured.", sess_private_data);
-	return 0;
-}
-
 static void
 mlx5_crypto_sym_session_clear(struct rte_cryptodev *dev,
 			      struct rte_cryptodev_sym_session *sess)
@@ -249,412 +173,6 @@ mlx5_crypto_sym_session_clear(struct rte_cryptodev *dev,
 	DRV_LOG(DEBUG, "Session %p was cleared.", spriv);
 }
 
-static void
-mlx5_crypto_indirect_mkeys_release(struct mlx5_crypto_qp *qp, uint16_t n)
-{
-	uint16_t i;
-
-	for (i = 0; i < n; i++)
-		if (qp->mkey[i])
-			claim_zero(mlx5_devx_cmd_destroy(qp->mkey[i]));
-}
-
-static void
-mlx5_crypto_qp_release(struct mlx5_crypto_qp *qp)
-{
-	if (qp == NULL)
-		return;
-	mlx5_devx_qp_destroy(&qp->qp_obj);
-	mlx5_mr_btree_free(&qp->mr_ctrl.cache_bh);
-	mlx5_devx_cq_destroy(&qp->cq_obj);
-	rte_free(qp);
-}
-
-static int
-mlx5_crypto_queue_pair_release(struct rte_cryptodev *dev, uint16_t qp_id)
-{
-	struct mlx5_crypto_qp *qp = dev->data->queue_pairs[qp_id];
-
-	mlx5_crypto_indirect_mkeys_release(qp, qp->entries_n);
-	mlx5_crypto_qp_release(qp);
-	dev->data->queue_pairs[qp_id] = NULL;
-	return 0;
-}
-
-static __rte_noinline uint32_t
-mlx5_crypto_get_block_size(struct rte_crypto_op *op)
-{
-	uint32_t bl = op->sym->cipher.data.length;
-
-	switch (bl) {
-	case (1 << 20):
-		return RTE_BE32(MLX5_BLOCK_SIZE_1MB << MLX5_BLOCK_SIZE_OFFSET);
-	case (1 << 12):
-		return RTE_BE32(MLX5_BLOCK_SIZE_4096B <<
-				MLX5_BLOCK_SIZE_OFFSET);
-	case (1 << 9):
-		return RTE_BE32(MLX5_BLOCK_SIZE_512B << MLX5_BLOCK_SIZE_OFFSET);
-	default:
-		DRV_LOG(ERR, "Unknown block size: %u.", bl);
-		return UINT32_MAX;
-	}
-}
-
-static __rte_always_inline uint32_t
-mlx5_crypto_klm_set(struct mlx5_crypto_qp *qp, struct rte_mbuf *mbuf,
-		    struct mlx5_wqe_dseg *klm, uint32_t offset,
-		    uint32_t *remain)
-{
-	uint32_t data_len = (rte_pktmbuf_data_len(mbuf) - offset);
-	uintptr_t addr = rte_pktmbuf_mtod_offset(mbuf, uintptr_t, offset);
-
-	if (data_len > *remain)
-		data_len = *remain;
-	*remain -= data_len;
-	klm->bcount = rte_cpu_to_be_32(data_len);
-	klm->pbuf = rte_cpu_to_be_64(addr);
-	klm->lkey = mlx5_mr_mb2mr(&qp->mr_ctrl, mbuf);
-	return klm->lkey;
-
-}
-
-static __rte_always_inline uint32_t
-mlx5_crypto_klms_set(struct mlx5_crypto_qp *qp, struct rte_crypto_op *op,
-		     struct rte_mbuf *mbuf, struct mlx5_wqe_dseg *klm)
-{
-	uint32_t remain_len = op->sym->cipher.data.length;
-	uint32_t nb_segs = mbuf->nb_segs;
-	uint32_t klm_n = 1u;
-
-	/* First mbuf needs to take the cipher offset. */
-	if (unlikely(mlx5_crypto_klm_set(qp, mbuf, klm,
-		     op->sym->cipher.data.offset, &remain_len) == UINT32_MAX)) {
-		op->status = RTE_CRYPTO_OP_STATUS_ERROR;
-		return 0;
-	}
-	while (remain_len) {
-		nb_segs--;
-		mbuf = mbuf->next;
-		if (unlikely(mbuf == NULL || nb_segs == 0)) {
-			op->status = RTE_CRYPTO_OP_STATUS_INVALID_ARGS;
-			return 0;
-		}
-		if (unlikely(mlx5_crypto_klm_set(qp, mbuf, ++klm, 0,
-						 &remain_len) == UINT32_MAX)) {
-			op->status = RTE_CRYPTO_OP_STATUS_ERROR;
-			return 0;
-		}
-		klm_n++;
-	}
-	return klm_n;
-}
-
-static __rte_always_inline int
-mlx5_crypto_wqe_set(struct mlx5_crypto_priv *priv,
-			 struct mlx5_crypto_qp *qp,
-			 struct rte_crypto_op *op,
-			 struct mlx5_umr_wqe *umr)
-{
-	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
-	struct mlx5_wqe_cseg *cseg = &umr->ctr;
-	struct mlx5_wqe_mkey_cseg *mkc = &umr->mkc;
-	struct mlx5_wqe_dseg *klms = &umr->kseg[0];
-	struct mlx5_wqe_umr_bsf_seg *bsf = ((struct mlx5_wqe_umr_bsf_seg *)
-				      RTE_PTR_ADD(umr, priv->umr_wqe_size)) - 1;
-	uint32_t ds;
-	bool ipl = op->sym->m_dst == NULL || op->sym->m_dst == op->sym->m_src;
-	/* Set UMR WQE. */
-	uint32_t klm_n = mlx5_crypto_klms_set(qp, op,
-				   ipl ? op->sym->m_src : op->sym->m_dst, klms);
-
-	if (unlikely(klm_n == 0))
-		return 0;
-	bsf->bs_bpt_eo_es = sess->bs_bpt_eo_es;
-	if (unlikely(!sess->bsp_res)) {
-		bsf->bsp_res = mlx5_crypto_get_block_size(op);
-		if (unlikely(bsf->bsp_res == UINT32_MAX)) {
-			op->status = RTE_CRYPTO_OP_STATUS_INVALID_ARGS;
-			return 0;
-		}
-	} else {
-		bsf->bsp_res = sess->bsp_res;
-	}
-	bsf->raw_data_size = rte_cpu_to_be_32(op->sym->cipher.data.length);
-	memcpy(bsf->xts_initial_tweak,
-	       rte_crypto_op_ctod_offset(op, uint8_t *, sess->iv_offset), 16);
-	bsf->res_dp = sess->dek_id;
-	mkc->len = rte_cpu_to_be_64(op->sym->cipher.data.length);
-	cseg->opcode = rte_cpu_to_be_32((qp->db_pi << 8) | MLX5_OPCODE_UMR);
-	qp->db_pi += priv->umr_wqe_stride;
-	/* Set RDMA_WRITE WQE. */
-	cseg = RTE_PTR_ADD(cseg, priv->umr_wqe_size);
-	klms = RTE_PTR_ADD(cseg, sizeof(struct mlx5_rdma_write_wqe));
-	if (!ipl) {
-		klm_n = mlx5_crypto_klms_set(qp, op, op->sym->m_src, klms);
-		if (unlikely(klm_n == 0))
-			return 0;
-	} else {
-		memcpy(klms, &umr->kseg[0], sizeof(*klms) * klm_n);
-	}
-	ds = 2 + klm_n;
-	cseg->sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) | ds);
-	cseg->opcode = rte_cpu_to_be_32((qp->db_pi << 8) |
-							MLX5_OPCODE_RDMA_WRITE);
-	ds = RTE_ALIGN(ds, 4);
-	qp->db_pi += ds >> 2;
-	/* Set NOP WQE if needed. */
-	if (priv->max_rdmar_ds > ds) {
-		cseg += ds;
-		ds = priv->max_rdmar_ds - ds;
-		cseg->sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) | ds);
-		cseg->opcode = rte_cpu_to_be_32((qp->db_pi << 8) |
-							       MLX5_OPCODE_NOP);
-		qp->db_pi += ds >> 2; /* Here, DS is 4 aligned for sure. */
-	}
-	qp->wqe = (uint8_t *)cseg;
-	return 1;
-}
-
-static uint16_t
-mlx5_crypto_enqueue_burst(void *queue_pair, struct rte_crypto_op **ops,
-			  uint16_t nb_ops)
-{
-	struct mlx5_crypto_qp *qp = queue_pair;
-	struct mlx5_crypto_priv *priv = qp->priv;
-	struct mlx5_umr_wqe *umr;
-	struct rte_crypto_op *op;
-	uint16_t mask = qp->entries_n - 1;
-	uint16_t remain = qp->entries_n - (qp->pi - qp->ci);
-	uint32_t idx;
-
-	if (remain < nb_ops)
-		nb_ops = remain;
-	else
-		remain = nb_ops;
-	if (unlikely(remain == 0))
-		return 0;
-	do {
-		idx = qp->pi & mask;
-		op = *ops++;
-		umr = RTE_PTR_ADD(qp->qp_obj.umem_buf,
-			priv->wqe_set_size * idx);
-		if (unlikely(mlx5_crypto_wqe_set(priv, qp, op, umr) == 0)) {
-			qp->stats.enqueue_err_count++;
-			if (remain != nb_ops) {
-				qp->stats.enqueued_count -= remain;
-				break;
-			}
-			return 0;
-		}
-		qp->ops[idx] = op;
-		qp->pi++;
-	} while (--remain);
-	qp->stats.enqueued_count += nb_ops;
-	mlx5_doorbell_ring(&priv->uar.bf_db, *(volatile uint64_t *)qp->wqe,
-			   qp->db_pi, &qp->qp_obj.db_rec[MLX5_SND_DBR],
-			   !priv->uar.dbnc);
-	return nb_ops;
-}
-
-static __rte_noinline void
-mlx5_crypto_cqe_err_handle(struct mlx5_crypto_qp *qp, struct rte_crypto_op *op)
-{
-	const uint32_t idx = qp->ci & (qp->entries_n - 1);
-	volatile struct mlx5_err_cqe *cqe = (volatile struct mlx5_err_cqe *)
-							&qp->cq_obj.cqes[idx];
-
-	op->status = RTE_CRYPTO_OP_STATUS_ERROR;
-	qp->stats.dequeue_err_count++;
-	DRV_LOG(ERR, "CQE ERR:%x.\n", rte_be_to_cpu_32(cqe->syndrome));
-}
-
-static uint16_t
-mlx5_crypto_dequeue_burst(void *queue_pair, struct rte_crypto_op **ops,
-			  uint16_t nb_ops)
-{
-	struct mlx5_crypto_qp *qp = queue_pair;
-	volatile struct mlx5_cqe *restrict cqe;
-	struct rte_crypto_op *restrict op;
-	const unsigned int cq_size = qp->entries_n;
-	const unsigned int mask = cq_size - 1;
-	uint32_t idx;
-	uint32_t next_idx = qp->ci & mask;
-	const uint16_t max = RTE_MIN((uint16_t)(qp->pi - qp->ci), nb_ops);
-	uint16_t i = 0;
-	int ret;
-
-	if (unlikely(max == 0))
-		return 0;
-	do {
-		idx = next_idx;
-		next_idx = (qp->ci + 1) & mask;
-		op = qp->ops[idx];
-		cqe = &qp->cq_obj.cqes[idx];
-		ret = check_cqe(cqe, cq_size, qp->ci);
-		rte_io_rmb();
-		if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
-			if (unlikely(ret != MLX5_CQE_STATUS_HW_OWN))
-				mlx5_crypto_cqe_err_handle(qp, op);
-			break;
-		}
-		op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
-		ops[i++] = op;
-		qp->ci++;
-	} while (i < max);
-	if (likely(i != 0)) {
-		rte_io_wmb();
-		qp->cq_obj.db_rec[0] = rte_cpu_to_be_32(qp->ci);
-		qp->stats.dequeued_count += i;
-	}
-	return i;
-}
-
-static void
-mlx5_crypto_qp_init(struct mlx5_crypto_priv *priv, struct mlx5_crypto_qp *qp)
-{
-	uint32_t i;
-
-	for (i = 0 ; i < qp->entries_n; i++) {
-		struct mlx5_wqe_cseg *cseg = RTE_PTR_ADD(qp->qp_obj.umem_buf,
-			i * priv->wqe_set_size);
-		struct mlx5_wqe_umr_cseg *ucseg = (struct mlx5_wqe_umr_cseg *)
-								     (cseg + 1);
-		struct mlx5_wqe_umr_bsf_seg *bsf =
-			(struct mlx5_wqe_umr_bsf_seg *)(RTE_PTR_ADD(cseg,
-						       priv->umr_wqe_size)) - 1;
-		struct mlx5_wqe_rseg *rseg;
-
-		/* Init UMR WQE. */
-		cseg->sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) |
-					 (priv->umr_wqe_size / MLX5_WSEG_SIZE));
-		cseg->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
-				       MLX5_COMP_MODE_OFFSET);
-		cseg->misc = rte_cpu_to_be_32(qp->mkey[i]->id);
-		ucseg->if_cf_toe_cq_res = RTE_BE32(1u << MLX5_UMRC_IF_OFFSET);
-		ucseg->mkey_mask = RTE_BE64(1u << 0); /* Mkey length bit. */
-		ucseg->ko_to_bs = rte_cpu_to_be_32
-			((MLX5_CRYPTO_KLM_SEGS_NUM(priv->umr_wqe_size) <<
-			 MLX5_UMRC_KO_OFFSET) | (4 << MLX5_UMRC_TO_BS_OFFSET));
-		bsf->keytag = priv->keytag;
-		/* Init RDMA WRITE WQE. */
-		cseg = RTE_PTR_ADD(cseg, priv->umr_wqe_size);
-		cseg->flags = RTE_BE32((MLX5_COMP_ALWAYS <<
-				      MLX5_COMP_MODE_OFFSET) |
-				      MLX5_WQE_CTRL_INITIATOR_SMALL_FENCE);
-		rseg = (struct mlx5_wqe_rseg *)(cseg + 1);
-		rseg->rkey = rte_cpu_to_be_32(qp->mkey[i]->id);
-	}
-}
-
-static int
-mlx5_crypto_indirect_mkeys_prepare(struct mlx5_crypto_priv *priv,
-				  struct mlx5_crypto_qp *qp)
-{
-	struct mlx5_umr_wqe *umr;
-	uint32_t i;
-	struct mlx5_devx_mkey_attr attr = {
-		.pd = priv->cdev->pdn,
-		.umr_en = 1,
-		.crypto_en = 1,
-		.set_remote_rw = 1,
-		.klm_num = MLX5_CRYPTO_KLM_SEGS_NUM(priv->umr_wqe_size),
-	};
-
-	for (umr = (struct mlx5_umr_wqe *)qp->qp_obj.umem_buf, i = 0;
-	   i < qp->entries_n; i++, umr = RTE_PTR_ADD(umr, priv->wqe_set_size)) {
-		attr.klm_array = (struct mlx5_klm *)&umr->kseg[0];
-		qp->mkey[i] = mlx5_devx_cmd_mkey_create(priv->cdev->ctx, &attr);
-		if (!qp->mkey[i])
-			goto error;
-	}
-	return 0;
-error:
-	DRV_LOG(ERR, "Failed to allocate indirect mkey.");
-	mlx5_crypto_indirect_mkeys_release(qp, i);
-	return -1;
-}
-
-static int
-mlx5_crypto_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
-			     const struct rte_cryptodev_qp_conf *qp_conf,
-			     int socket_id)
-{
-	struct mlx5_crypto_priv *priv = dev->data->dev_private;
-	struct mlx5_devx_qp_attr attr = {0};
-	struct mlx5_crypto_qp *qp;
-	uint16_t log_nb_desc = rte_log2_u32(qp_conf->nb_descriptors);
-	uint32_t ret;
-	uint32_t alloc_size = sizeof(*qp);
-	uint32_t log_wqbb_n;
-	struct mlx5_devx_cq_attr cq_attr = {
-		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
-	};
-
-	if (dev->data->queue_pairs[qp_id] != NULL)
-		mlx5_crypto_queue_pair_release(dev, qp_id);
-	alloc_size = RTE_ALIGN(alloc_size, RTE_CACHE_LINE_SIZE);
-	alloc_size += (sizeof(struct rte_crypto_op *) +
-		       sizeof(struct mlx5_devx_obj *)) *
-		       RTE_BIT32(log_nb_desc);
-	qp = rte_zmalloc_socket(__func__, alloc_size, RTE_CACHE_LINE_SIZE,
-				socket_id);
-	if (qp == NULL) {
-		DRV_LOG(ERR, "Failed to allocate QP memory.");
-		rte_errno = ENOMEM;
-		return -rte_errno;
-	}
-	if (mlx5_devx_cq_create(priv->cdev->ctx, &qp->cq_obj, log_nb_desc,
-				&cq_attr, socket_id) != 0) {
-		DRV_LOG(ERR, "Failed to create CQ.");
-		goto error;
-	}
-	log_wqbb_n = rte_log2_u32(RTE_BIT32(log_nb_desc) *
-				(priv->wqe_set_size / MLX5_SEND_WQE_BB));
-	attr.pd = priv->cdev->pdn;
-	attr.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar.obj);
-	attr.cqn = qp->cq_obj.cq->id;
-	attr.num_of_receive_wqes = 0;
-	attr.num_of_send_wqbbs = RTE_BIT32(log_wqbb_n);
-	attr.ts_format =
-		mlx5_ts_format_conv(priv->cdev->config.hca_attr.qp_ts_format);
-	ret = mlx5_devx_qp_create(priv->cdev->ctx, &qp->qp_obj,
-					attr.num_of_send_wqbbs * MLX5_WQE_SIZE,
-					&attr, socket_id);
-	if (ret) {
-		DRV_LOG(ERR, "Failed to create QP.");
-		goto error;
-	}
-	if (mlx5_mr_ctrl_init(&qp->mr_ctrl, &priv->cdev->mr_scache.dev_gen,
-			      priv->dev_config.socket_id) != 0) {
-		DRV_LOG(ERR, "Cannot allocate MR Btree for qp %u.",
-			(uint32_t)qp_id);
-		rte_errno = ENOMEM;
-		goto error;
-	}
-	/*
-	 * In Order to configure self loopback, when calling devx qp2rts the
-	 * remote QP id that is used is the id of the same QP.
-	 */
-	if (mlx5_devx_qp2rts(&qp->qp_obj, qp->qp_obj.qp->id))
-		goto error;
-	qp->mkey = (struct mlx5_devx_obj **)RTE_ALIGN((uintptr_t)(qp + 1),
-							   RTE_CACHE_LINE_SIZE);
-	qp->ops = (struct rte_crypto_op **)(qp->mkey + RTE_BIT32(log_nb_desc));
-	qp->entries_n = 1 << log_nb_desc;
-	if (mlx5_crypto_indirect_mkeys_prepare(priv, qp)) {
-		DRV_LOG(ERR, "Cannot allocate indirect memory regions.");
-		rte_errno = ENOMEM;
-		goto error;
-	}
-	mlx5_crypto_qp_init(priv, qp);
-	qp->priv = priv;
-	dev->data->queue_pairs[qp_id] = qp;
-	return 0;
-error:
-	mlx5_crypto_qp_release(qp);
-	return -1;
-}
-
 static void
 mlx5_crypto_stats_get(struct rte_cryptodev *dev,
 		      struct rte_cryptodev_stats *stats)
@@ -691,10 +209,7 @@ static struct rte_cryptodev_ops mlx5_crypto_ops = {
 	.dev_infos_get			= mlx5_crypto_dev_infos_get,
 	.stats_get			= mlx5_crypto_stats_get,
 	.stats_reset			= mlx5_crypto_stats_reset,
-	.queue_pair_setup		= mlx5_crypto_queue_pair_setup,
-	.queue_pair_release		= mlx5_crypto_queue_pair_release,
 	.sym_session_get_size		= mlx5_crypto_sym_session_get_size,
-	.sym_session_configure		= mlx5_crypto_sym_session_configure,
 	.sym_session_clear		= mlx5_crypto_sym_session_clear,
 	.sym_get_raw_dp_ctx_size	= NULL,
 	.sym_configure_raw_dp_ctx	= NULL,
@@ -796,81 +311,6 @@ mlx5_crypto_parse_devargs(struct mlx5_kvargs_ctrl *mkvlist,
 	return 0;
 }
 
-/*
- * Calculate UMR WQE size and RDMA Write WQE size with the
- * following limitations:
- *	- Each WQE size is multiple of 64.
- *	- The summarize of both UMR WQE and RDMA_W WQE is a power of 2.
- *	- The number of entries in the UMR WQE's KLM list is multiple of 4.
- */
-static void
-mlx5_crypto_get_wqe_sizes(uint32_t segs_num, uint32_t *umr_size,
-			uint32_t *rdmaw_size)
-{
-	uint32_t diff, wqe_set_size;
-
-	*umr_size = MLX5_CRYPTO_UMR_WQE_STATIC_SIZE +
-			RTE_ALIGN(segs_num, 4) *
-			sizeof(struct mlx5_wqe_dseg);
-	/* Make sure UMR WQE size is multiple of WQBB. */
-	*umr_size = RTE_ALIGN(*umr_size, MLX5_SEND_WQE_BB);
-	*rdmaw_size = sizeof(struct mlx5_rdma_write_wqe) +
-			sizeof(struct mlx5_wqe_dseg) *
-			(segs_num <= 2 ? 2 : 2 +
-			RTE_ALIGN(segs_num - 2, 4));
-	/* Make sure RDMA_WRITE WQE size is multiple of WQBB. */
-	*rdmaw_size = RTE_ALIGN(*rdmaw_size, MLX5_SEND_WQE_BB);
-	wqe_set_size = *rdmaw_size + *umr_size;
-	diff = rte_align32pow2(wqe_set_size) - wqe_set_size;
-	/* Make sure wqe_set size is power of 2. */
-	if (diff)
-		*umr_size += diff;
-}
-
-static uint8_t
-mlx5_crypto_max_segs_num(uint16_t max_wqe_size)
-{
-	int klms_sizes = max_wqe_size - MLX5_CRYPTO_UMR_WQE_STATIC_SIZE;
-	uint32_t max_segs_cap = RTE_ALIGN_FLOOR(klms_sizes, MLX5_SEND_WQE_BB) /
-			sizeof(struct mlx5_wqe_dseg);
-
-	MLX5_ASSERT(klms_sizes >= MLX5_SEND_WQE_BB);
-	while (max_segs_cap) {
-		uint32_t umr_wqe_size, rdmw_wqe_size;
-
-		mlx5_crypto_get_wqe_sizes(max_segs_cap, &umr_wqe_size,
-						&rdmw_wqe_size);
-		if (umr_wqe_size <= max_wqe_size &&
-				rdmw_wqe_size <= max_wqe_size)
-			break;
-		max_segs_cap -= 4;
-	}
-	return max_segs_cap;
-}
-
-static int
-mlx5_crypto_configure_wqe_size(struct mlx5_crypto_priv *priv,
-				uint16_t max_wqe_size, uint32_t max_segs_num)
-{
-	uint32_t rdmw_wqe_size, umr_wqe_size;
-
-	mlx5_crypto_get_wqe_sizes(max_segs_num, &umr_wqe_size,
-					&rdmw_wqe_size);
-	priv->wqe_set_size = rdmw_wqe_size + umr_wqe_size;
-	if (umr_wqe_size > max_wqe_size ||
-				rdmw_wqe_size > max_wqe_size) {
-		DRV_LOG(ERR, "Invalid max_segs_num: %u. should be %u or lower.",
-			max_segs_num,
-			mlx5_crypto_max_segs_num(max_wqe_size));
-		rte_errno = EINVAL;
-		return -EINVAL;
-	}
-	priv->umr_wqe_size = (uint16_t)umr_wqe_size;
-	priv->umr_wqe_stride = priv->umr_wqe_size / MLX5_SEND_WQE_BB;
-	priv->max_rdmar_ds = rdmw_wqe_size / sizeof(struct mlx5_wqe_dseg);
-	return 0;
-}
-
 static int
 mlx5_crypto_dev_probe(struct mlx5_common_device *cdev,
 		      struct mlx5_kvargs_ctrl *mkvlist)
@@ -916,14 +356,18 @@ mlx5_crypto_dev_probe(struct mlx5_common_device *cdev,
 	DRV_LOG(INFO,
 		"Crypto device %s was created successfully.", ibdev_name);
 	crypto_dev->dev_ops = &mlx5_crypto_ops;
-	crypto_dev->dequeue_burst = mlx5_crypto_dequeue_burst;
-	crypto_dev->enqueue_burst = mlx5_crypto_enqueue_burst;
 	crypto_dev->feature_flags = MLX5_CRYPTO_FEATURE_FLAGS(wrapped_mode);
 	crypto_dev->driver_id = mlx5_crypto_driver_id;
 	priv = crypto_dev->data->dev_private;
 	priv->cdev = cdev;
 	priv->crypto_dev = crypto_dev;
 	priv->is_wrapped_mode = wrapped_mode;
+	priv->max_segs_num = devarg_prms.max_segs_num;
+	ret = mlx5_crypto_xts_init(priv);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to init AES-XTS crypto.");
+		return -ENOTSUP;
+	}
 	if (mlx5_devx_uar_prepare(cdev, &priv->uar) != 0) {
 		rte_cryptodev_pmd_destroy(priv->crypto_dev);
 		return -1;
@@ -939,14 +383,6 @@ mlx5_crypto_dev_probe(struct mlx5_common_device *cdev,
 		}
 		priv->login_obj = login;
 	}
-	ret = mlx5_crypto_configure_wqe_size(priv,
-		cdev->config.hca_attr.max_wqe_sz_sq, devarg_prms.max_segs_num);
-	if (ret) {
-		claim_zero(mlx5_devx_cmd_destroy(priv->login_obj));
-		mlx5_devx_uar_release(&priv->uar);
-		rte_cryptodev_pmd_destroy(priv->crypto_dev);
-		return -1;
-	}
 	priv->keytag = rte_cpu_to_be_64(devarg_prms.keytag);
 	DRV_LOG(INFO, "Max number of segments: %u.",
 		(unsigned int)RTE_MIN(
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index a2771b3dab..05d8fe97fe 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -31,6 +31,7 @@ struct mlx5_crypto_priv {
 	struct mlx5_uar uar; /* User Access Region. */
 	uint32_t max_segs_num; /* Maximum supported data segs. */
 	struct mlx5_hlist *dek_hlist; /* Dek hash list. */
+	const struct rte_cryptodev_capabilities *caps;
 	struct rte_cryptodev_config dev_config;
 	struct mlx5_devx_obj *login_obj;
 	uint64_t keytag;
@@ -70,6 +71,35 @@ struct mlx5_crypto_devarg_params {
 	uint32_t max_segs_num;
 };
 
+struct mlx5_crypto_session {
+	uint32_t bs_bpt_eo_es;
+	/**< bsf_size, bsf_p_type, encryption_order and encryption standard,
+	 * saved in big endian format.
+	 */
+	uint32_t bsp_res;
+	/**< crypto_block_size_pointer and reserved 24 bits saved in big
+	 * endian format.
+	 */
+	uint32_t iv_offset:16;
+	/**< Starting point for Initialisation Vector. */
+	struct mlx5_crypto_dek *dek; /**< Pointer to dek struct. */
+	uint32_t dek_id; /**< DEK ID */
+} __rte_packed;
+
+typedef void *(*mlx5_crypto_mkey_update_t)(struct mlx5_crypto_priv *priv,
+					   struct mlx5_crypto_qp *qp,
+					   uint32_t idx);
+
+void
+mlx5_crypto_indirect_mkeys_release(struct mlx5_crypto_qp *qp,
+				   uint16_t n);
+
+int
+mlx5_crypto_indirect_mkeys_prepare(struct mlx5_crypto_priv *priv,
+				   struct mlx5_crypto_qp *qp,
+				   struct mlx5_devx_mkey_attr *attr,
+				   mlx5_crypto_mkey_update_t update_cb);
+
 int
 mlx5_crypto_dek_destroy(struct mlx5_crypto_priv *priv,
 			struct mlx5_crypto_dek *dek);
@@ -84,4 +114,7 @@ mlx5_crypto_dek_setup(struct mlx5_crypto_priv *priv);
 void
 mlx5_crypto_dek_unset(struct mlx5_crypto_priv *priv);
 
+int
+mlx5_crypto_xts_init(struct mlx5_crypto_priv *priv);
+
 #endif /* MLX5_CRYPTO_H_ */
diff --git a/drivers/crypto/mlx5/mlx5_crypto_xts.c b/drivers/crypto/mlx5/mlx5_crypto_xts.c
new file mode 100644
index 0000000000..964d02e6ed
--- /dev/null
+++ b/drivers/crypto/mlx5/mlx5_crypto_xts.c
@@ -0,0 +1,594 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 NVIDIA Corporation & Affiliates
+ */
+
+#include <rte_malloc.h>
+#include <rte_mempool.h>
+#include <rte_eal_paging.h>
+#include <rte_errno.h>
+#include <rte_log.h>
+#include <bus_pci_driver.h>
+#include <rte_memory.h>
+
+#include <mlx5_glue.h>
+#include <mlx5_common.h>
+#include <mlx5_devx_cmds.h>
+#include <mlx5_common_os.h>
+
+#include "mlx5_crypto_utils.h"
+#include "mlx5_crypto.h"
+
+const struct rte_cryptodev_capabilities mlx5_crypto_caps[] = {
+	{		/* AES XTS */
+		.op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+		{.sym = {
+			.xform_type = RTE_CRYPTO_SYM_XFORM_CIPHER,
+			{.cipher = {
+				.algo = RTE_CRYPTO_CIPHER_AES_XTS,
+				.block_size = 16,
+				.key_size = {
+					.min = 32,
+					.max = 64,
+					.increment = 32
+				},
+				.iv_size = {
+					.min = 16,
+					.max = 16,
+					.increment = 0
+				},
+				.dataunit_set =
+				RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_512_BYTES |
+				RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_4096_BYTES |
+				RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_1_MEGABYTES,
+			}, }
+		}, }
+	},
+};
+
+static int
+mlx5_crypto_xts_sym_session_configure(struct rte_cryptodev *dev,
+				      struct rte_crypto_sym_xform *xform,
+				      struct rte_cryptodev_sym_session *session)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_crypto_session *sess_private_data =
+		CRYPTODEV_GET_SYM_SESS_PRIV(session);
+	struct rte_crypto_cipher_xform *cipher;
+	uint8_t encryption_order;
+
+	if (unlikely(xform->next != NULL)) {
+		DRV_LOG(ERR, "Xform next is not supported.");
+		return -ENOTSUP;
+	}
+	if (unlikely((xform->type != RTE_CRYPTO_SYM_XFORM_CIPHER) ||
+		     (xform->cipher.algo != RTE_CRYPTO_CIPHER_AES_XTS))) {
+		DRV_LOG(ERR, "Only AES-XTS algorithm is supported.");
+		return -ENOTSUP;
+	}
+	cipher = &xform->cipher;
+	sess_private_data->dek = mlx5_crypto_dek_prepare(priv, cipher);
+	if (sess_private_data->dek == NULL) {
+		DRV_LOG(ERR, "Failed to prepare dek.");
+		return -ENOMEM;
+	}
+	if (cipher->op == RTE_CRYPTO_CIPHER_OP_ENCRYPT)
+		encryption_order = MLX5_ENCRYPTION_ORDER_ENCRYPTED_RAW_MEMORY;
+	else
+		encryption_order = MLX5_ENCRYPTION_ORDER_ENCRYPTED_RAW_WIRE;
+	sess_private_data->bs_bpt_eo_es = rte_cpu_to_be_32
+			(MLX5_BSF_SIZE_64B << MLX5_BSF_SIZE_OFFSET |
+			 MLX5_BSF_P_TYPE_CRYPTO << MLX5_BSF_P_TYPE_OFFSET |
+			 encryption_order << MLX5_ENCRYPTION_ORDER_OFFSET |
+			 MLX5_ENCRYPTION_STANDARD_AES_XTS);
+	switch (xform->cipher.dataunit_len) {
+	case 0:
+		sess_private_data->bsp_res = 0;
+		break;
+	case 512:
+		sess_private_data->bsp_res = rte_cpu_to_be_32
+					     ((uint32_t)MLX5_BLOCK_SIZE_512B <<
+					     MLX5_BLOCK_SIZE_OFFSET);
+		break;
+	case 4096:
+		sess_private_data->bsp_res = rte_cpu_to_be_32
+					     ((uint32_t)MLX5_BLOCK_SIZE_4096B <<
+					     MLX5_BLOCK_SIZE_OFFSET);
+		break;
+	case 1048576:
+		sess_private_data->bsp_res = rte_cpu_to_be_32
+					     ((uint32_t)MLX5_BLOCK_SIZE_1MB <<
+					     MLX5_BLOCK_SIZE_OFFSET);
+		break;
+	default:
+		DRV_LOG(ERR, "Cipher data unit length is not supported.");
+		return -ENOTSUP;
+	}
+	sess_private_data->iv_offset = cipher->iv.offset;
+	sess_private_data->dek_id =
+			rte_cpu_to_be_32(sess_private_data->dek->obj->id &
+					 0xffffff);
+	DRV_LOG(DEBUG, "Session %p was configured.", sess_private_data);
+	return 0;
+}
+
+static void
+mlx5_crypto_xts_qp_release(struct mlx5_crypto_qp *qp)
+{
+	if (qp == NULL)
+		return;
+	mlx5_devx_qp_destroy(&qp->qp_obj);
+	mlx5_mr_btree_free(&qp->mr_ctrl.cache_bh);
+	mlx5_devx_cq_destroy(&qp->cq_obj);
+	rte_free(qp);
+}
+
+static int
+mlx5_crypto_xts_queue_pair_release(struct rte_cryptodev *dev, uint16_t qp_id)
+{
+	struct mlx5_crypto_qp *qp = dev->data->queue_pairs[qp_id];
+
+	mlx5_crypto_indirect_mkeys_release(qp, qp->entries_n);
+	mlx5_crypto_xts_qp_release(qp);
+	dev->data->queue_pairs[qp_id] = NULL;
+	return 0;
+}
+
+static __rte_noinline uint32_t
+mlx5_crypto_xts_get_block_size(struct rte_crypto_op *op)
+{
+	uint32_t bl = op->sym->cipher.data.length;
+
+	switch (bl) {
+	case (1 << 20):
+		return RTE_BE32(MLX5_BLOCK_SIZE_1MB << MLX5_BLOCK_SIZE_OFFSET);
+	case (1 << 12):
+		return RTE_BE32(MLX5_BLOCK_SIZE_4096B <<
+				MLX5_BLOCK_SIZE_OFFSET);
+	case (1 << 9):
+		return RTE_BE32(MLX5_BLOCK_SIZE_512B << MLX5_BLOCK_SIZE_OFFSET);
+	default:
+		DRV_LOG(ERR, "Unknown block size: %u.", bl);
+		return UINT32_MAX;
+	}
+}
+
+static __rte_always_inline uint32_t
+mlx5_crypto_xts_klm_set(struct mlx5_crypto_qp *qp, struct rte_mbuf *mbuf,
+			struct mlx5_wqe_dseg *klm, uint32_t offset,
+			uint32_t *remain)
+{
+	uint32_t data_len = (rte_pktmbuf_data_len(mbuf) - offset);
+	uintptr_t addr = rte_pktmbuf_mtod_offset(mbuf, uintptr_t, offset);
+
+	if (data_len > *remain)
+		data_len = *remain;
+	*remain -= data_len;
+	klm->bcount = rte_cpu_to_be_32(data_len);
+	klm->pbuf = rte_cpu_to_be_64(addr);
+	klm->lkey = mlx5_mr_mb2mr(&qp->mr_ctrl, mbuf);
+	return klm->lkey;
+
+}
+
+static __rte_always_inline uint32_t
+mlx5_crypto_xts_klms_set(struct mlx5_crypto_qp *qp, struct rte_crypto_op *op,
+			 struct rte_mbuf *mbuf, struct mlx5_wqe_dseg *klm)
+{
+	uint32_t remain_len = op->sym->cipher.data.length;
+	uint32_t nb_segs = mbuf->nb_segs;
+	uint32_t klm_n = 1u;
+
+	/* First mbuf needs to take the cipher offset. */
+	if (unlikely(mlx5_crypto_xts_klm_set(qp, mbuf, klm,
+		     op->sym->cipher.data.offset, &remain_len) == UINT32_MAX)) {
+		op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+		return 0;
+	}
+	while (remain_len) {
+		nb_segs--;
+		mbuf = mbuf->next;
+		if (unlikely(mbuf == NULL || nb_segs == 0)) {
+			op->status = RTE_CRYPTO_OP_STATUS_INVALID_ARGS;
+			return 0;
+		}
+		if (unlikely(mlx5_crypto_xts_klm_set(qp, mbuf, ++klm, 0,
+						&remain_len) == UINT32_MAX)) {
+			op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+			return 0;
+		}
+		klm_n++;
+	}
+	return klm_n;
+}
+
+static __rte_always_inline int
+mlx5_crypto_xts_wqe_set(struct mlx5_crypto_priv *priv,
+			 struct mlx5_crypto_qp *qp,
+			 struct rte_crypto_op *op,
+			 struct mlx5_umr_wqe *umr)
+{
+	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+	struct mlx5_wqe_cseg *cseg = &umr->ctr;
+	struct mlx5_wqe_mkey_cseg *mkc = &umr->mkc;
+	struct mlx5_wqe_dseg *klms = &umr->kseg[0];
+	struct mlx5_wqe_umr_bsf_seg *bsf = ((struct mlx5_wqe_umr_bsf_seg *)
+				      RTE_PTR_ADD(umr, priv->umr_wqe_size)) - 1;
+	uint32_t ds;
+	bool ipl = op->sym->m_dst == NULL || op->sym->m_dst == op->sym->m_src;
+	/* Set UMR WQE. */
+	uint32_t klm_n = mlx5_crypto_xts_klms_set(qp, op,
+				   ipl ? op->sym->m_src : op->sym->m_dst, klms);
+
+	if (unlikely(klm_n == 0))
+		return 0;
+	bsf->bs_bpt_eo_es = sess->bs_bpt_eo_es;
+	if (unlikely(!sess->bsp_res)) {
+		bsf->bsp_res = mlx5_crypto_xts_get_block_size(op);
+		if (unlikely(bsf->bsp_res == UINT32_MAX)) {
+			op->status = RTE_CRYPTO_OP_STATUS_INVALID_ARGS;
+			return 0;
+		}
+	} else {
+		bsf->bsp_res = sess->bsp_res;
+	}
+	bsf->raw_data_size = rte_cpu_to_be_32(op->sym->cipher.data.length);
+	memcpy(bsf->xts_initial_tweak,
+	       rte_crypto_op_ctod_offset(op, uint8_t *, sess->iv_offset), 16);
+	bsf->res_dp = sess->dek_id;
+	mkc->len = rte_cpu_to_be_64(op->sym->cipher.data.length);
+	cseg->opcode = rte_cpu_to_be_32((qp->db_pi << 8) | MLX5_OPCODE_UMR);
+	qp->db_pi += priv->umr_wqe_stride;
+	/* Set RDMA_WRITE WQE. */
+	cseg = RTE_PTR_ADD(cseg, priv->umr_wqe_size);
+	klms = RTE_PTR_ADD(cseg, sizeof(struct mlx5_rdma_write_wqe));
+	if (!ipl) {
+		klm_n = mlx5_crypto_xts_klms_set(qp, op, op->sym->m_src, klms);
+		if (unlikely(klm_n == 0))
+			return 0;
+	} else {
+		memcpy(klms, &umr->kseg[0], sizeof(*klms) * klm_n);
+	}
+	ds = 2 + klm_n;
+	cseg->sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) | ds);
+	cseg->opcode = rte_cpu_to_be_32((qp->db_pi << 8) |
+							MLX5_OPCODE_RDMA_WRITE);
+	ds = RTE_ALIGN(ds, 4);
+	qp->db_pi += ds >> 2;
+	/* Set NOP WQE if needed. */
+	if (priv->max_rdmar_ds > ds) {
+		cseg += ds;
+		ds = priv->max_rdmar_ds - ds;
+		cseg->sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) | ds);
+		cseg->opcode = rte_cpu_to_be_32((qp->db_pi << 8) |
+							       MLX5_OPCODE_NOP);
+		qp->db_pi += ds >> 2; /* Here, DS is 4 aligned for sure. */
+	}
+	qp->wqe = (uint8_t *)cseg;
+	return 1;
+}
+
+static uint16_t
+mlx5_crypto_xts_enqueue_burst(void *queue_pair, struct rte_crypto_op **ops,
+			      uint16_t nb_ops)
+{
+	struct mlx5_crypto_qp *qp = queue_pair;
+	struct mlx5_crypto_priv *priv = qp->priv;
+	struct mlx5_umr_wqe *umr;
+	struct rte_crypto_op *op;
+	uint16_t mask = qp->entries_n - 1;
+	uint16_t remain = qp->entries_n - (qp->pi - qp->ci);
+	uint32_t idx;
+
+	if (remain < nb_ops)
+		nb_ops = remain;
+	else
+		remain = nb_ops;
+	if (unlikely(remain == 0))
+		return 0;
+	do {
+		idx = qp->pi & mask;
+		op = *ops++;
+		umr = RTE_PTR_ADD(qp->qp_obj.umem_buf,
+			priv->wqe_set_size * idx);
+		if (unlikely(mlx5_crypto_xts_wqe_set(priv, qp, op, umr) == 0)) {
+			qp->stats.enqueue_err_count++;
+			if (remain != nb_ops) {
+				qp->stats.enqueued_count -= remain;
+				break;
+			}
+			return 0;
+		}
+		qp->ops[idx] = op;
+		qp->pi++;
+	} while (--remain);
+	qp->stats.enqueued_count += nb_ops;
+	mlx5_doorbell_ring(&priv->uar.bf_db, *(volatile uint64_t *)qp->wqe,
+			   qp->db_pi, &qp->qp_obj.db_rec[MLX5_SND_DBR],
+			   !priv->uar.dbnc);
+	return nb_ops;
+}
+
+static __rte_noinline void
+mlx5_crypto_xts_cqe_err_handle(struct mlx5_crypto_qp *qp, struct rte_crypto_op *op)
+{
+	const uint32_t idx = qp->ci & (qp->entries_n - 1);
+	volatile struct mlx5_err_cqe *cqe = (volatile struct mlx5_err_cqe *)
+							&qp->cq_obj.cqes[idx];
+
+	op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+	qp->stats.dequeue_err_count++;
+	DRV_LOG(ERR, "CQE ERR:%x.\n", rte_be_to_cpu_32(cqe->syndrome));
+}
+
+static uint16_t
+mlx5_crypto_xts_dequeue_burst(void *queue_pair, struct rte_crypto_op **ops,
+			  uint16_t nb_ops)
+{
+	struct mlx5_crypto_qp *qp = queue_pair;
+	volatile struct mlx5_cqe *restrict cqe;
+	struct rte_crypto_op *restrict op;
+	const unsigned int cq_size = qp->entries_n;
+	const unsigned int mask = cq_size - 1;
+	uint32_t idx;
+	uint32_t next_idx = qp->ci & mask;
+	const uint16_t max = RTE_MIN((uint16_t)(qp->pi - qp->ci), nb_ops);
+	uint16_t i = 0;
+	int ret;
+
+	if (unlikely(max == 0))
+		return 0;
+	do {
+		idx = next_idx;
+		next_idx = (qp->ci + 1) & mask;
+		op = qp->ops[idx];
+		cqe = &qp->cq_obj.cqes[idx];
+		ret = check_cqe(cqe, cq_size, qp->ci);
+		rte_io_rmb();
+		if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
+			if (unlikely(ret != MLX5_CQE_STATUS_HW_OWN))
+				mlx5_crypto_xts_cqe_err_handle(qp, op);
+			break;
+		}
+		op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
+		ops[i++] = op;
+		qp->ci++;
+	} while (i < max);
+	if (likely(i != 0)) {
+		rte_io_wmb();
+		qp->cq_obj.db_rec[0] = rte_cpu_to_be_32(qp->ci);
+		qp->stats.dequeued_count += i;
+	}
+	return i;
+}
+
+static void
+mlx5_crypto_xts_qp_init(struct mlx5_crypto_priv *priv, struct mlx5_crypto_qp *qp)
+{
+	uint32_t i;
+
+	for (i = 0 ; i < qp->entries_n; i++) {
+		struct mlx5_wqe_cseg *cseg = RTE_PTR_ADD(qp->qp_obj.umem_buf,
+			i * priv->wqe_set_size);
+		struct mlx5_wqe_umr_cseg *ucseg = (struct mlx5_wqe_umr_cseg *)
+								     (cseg + 1);
+		struct mlx5_wqe_umr_bsf_seg *bsf =
+			(struct mlx5_wqe_umr_bsf_seg *)(RTE_PTR_ADD(cseg,
+						       priv->umr_wqe_size)) - 1;
+		struct mlx5_wqe_rseg *rseg;
+
+		/* Init UMR WQE. */
+		cseg->sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) |
+					 (priv->umr_wqe_size / MLX5_WSEG_SIZE));
+		cseg->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
+				       MLX5_COMP_MODE_OFFSET);
+		cseg->misc = rte_cpu_to_be_32(qp->mkey[i]->id);
+		ucseg->if_cf_toe_cq_res = RTE_BE32(1u << MLX5_UMRC_IF_OFFSET);
+		ucseg->mkey_mask = RTE_BE64(1u << 0); /* Mkey length bit. */
+		ucseg->ko_to_bs = rte_cpu_to_be_32
+			((MLX5_CRYPTO_KLM_SEGS_NUM(priv->umr_wqe_size) <<
+			 MLX5_UMRC_KO_OFFSET) | (4 << MLX5_UMRC_TO_BS_OFFSET));
+		bsf->keytag = priv->keytag;
+		/* Init RDMA WRITE WQE. */
+		cseg = RTE_PTR_ADD(cseg, priv->umr_wqe_size);
+		cseg->flags = RTE_BE32((MLX5_COMP_ALWAYS <<
+				      MLX5_COMP_MODE_OFFSET) |
+				      MLX5_WQE_CTRL_INITIATOR_SMALL_FENCE);
+		rseg = (struct mlx5_wqe_rseg *)(cseg + 1);
+		rseg->rkey = rte_cpu_to_be_32(qp->mkey[i]->id);
+	}
+}
+
+static void *
+mlx5_crypto_gcm_mkey_klm_update(struct mlx5_crypto_priv *priv,
+				struct mlx5_crypto_qp *qp,
+				uint32_t idx)
+{
+	return RTE_PTR_ADD(qp->qp_obj.umem_buf, priv->wqe_set_size * idx);
+}
+
+static int
+mlx5_crypto_xts_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
+				 const struct rte_cryptodev_qp_conf *qp_conf,
+				 int socket_id)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_devx_qp_attr attr = {0};
+	struct mlx5_crypto_qp *qp;
+	uint16_t log_nb_desc = rte_log2_u32(qp_conf->nb_descriptors);
+	uint32_t ret;
+	uint32_t alloc_size = sizeof(*qp);
+	uint32_t log_wqbb_n;
+	struct mlx5_devx_cq_attr cq_attr = {
+		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
+	};
+	struct mlx5_devx_mkey_attr mkey_attr = {
+		.pd = priv->cdev->pdn,
+		.umr_en = 1,
+		.crypto_en = 1,
+		.set_remote_rw = 1,
+		.klm_num = MLX5_CRYPTO_KLM_SEGS_NUM(priv->umr_wqe_size),
+	};
+
+	if (dev->data->queue_pairs[qp_id] != NULL)
+		mlx5_crypto_xts_queue_pair_release(dev, qp_id);
+	alloc_size = RTE_ALIGN(alloc_size, RTE_CACHE_LINE_SIZE);
+	alloc_size += (sizeof(struct rte_crypto_op *) +
+		       sizeof(struct mlx5_devx_obj *)) *
+		       RTE_BIT32(log_nb_desc);
+	qp = rte_zmalloc_socket(__func__, alloc_size, RTE_CACHE_LINE_SIZE,
+				socket_id);
+	if (qp == NULL) {
+		DRV_LOG(ERR, "Failed to allocate QP memory.");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	if (mlx5_devx_cq_create(priv->cdev->ctx, &qp->cq_obj, log_nb_desc,
+				&cq_attr, socket_id) != 0) {
+		DRV_LOG(ERR, "Failed to create CQ.");
+		goto error;
+	}
+	log_wqbb_n = rte_log2_u32(RTE_BIT32(log_nb_desc) *
+				(priv->wqe_set_size / MLX5_SEND_WQE_BB));
+	attr.pd = priv->cdev->pdn;
+	attr.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar.obj);
+	attr.cqn = qp->cq_obj.cq->id;
+	attr.num_of_receive_wqes = 0;
+	attr.num_of_send_wqbbs = RTE_BIT32(log_wqbb_n);
+	attr.ts_format =
+		mlx5_ts_format_conv(priv->cdev->config.hca_attr.qp_ts_format);
+	ret = mlx5_devx_qp_create(priv->cdev->ctx, &qp->qp_obj,
+					attr.num_of_send_wqbbs * MLX5_WQE_SIZE,
+					&attr, socket_id);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to create QP.");
+		goto error;
+	}
+	if (mlx5_mr_ctrl_init(&qp->mr_ctrl, &priv->cdev->mr_scache.dev_gen,
+			      priv->dev_config.socket_id) != 0) {
+		DRV_LOG(ERR, "Cannot allocate MR Btree for qp %u.",
+			(uint32_t)qp_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	/*
+	 * In Order to configure self loopback, when calling devx qp2rts the
+	 * remote QP id that is used is the id of the same QP.
+	 */
+	if (mlx5_devx_qp2rts(&qp->qp_obj, qp->qp_obj.qp->id))
+		goto error;
+	qp->mkey = (struct mlx5_devx_obj **)RTE_ALIGN((uintptr_t)(qp + 1),
+							   RTE_CACHE_LINE_SIZE);
+	qp->ops = (struct rte_crypto_op **)(qp->mkey + RTE_BIT32(log_nb_desc));
+	qp->entries_n = 1 << log_nb_desc;
+	if (mlx5_crypto_indirect_mkeys_prepare(priv, qp, &mkey_attr,
+					       mlx5_crypto_gcm_mkey_klm_update)) {
+		DRV_LOG(ERR, "Cannot allocate indirect memory regions.");
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	mlx5_crypto_xts_qp_init(priv, qp);
+	qp->priv = priv;
+	dev->data->queue_pairs[qp_id] = qp;
+	return 0;
+error:
+	mlx5_crypto_xts_qp_release(qp);
+	return -1;
+}
+
+/*
+ * Calculate UMR WQE size and RDMA Write WQE size with the
+ * following limitations:
+ *	- Each WQE size is multiple of 64.
+ *	- The summarize of both UMR WQE and RDMA_W WQE is a power of 2.
+ *	- The number of entries in the UMR WQE's KLM list is multiple of 4.
+ */
+static void
+mlx5_crypto_xts_get_wqe_sizes(uint32_t segs_num, uint32_t *umr_size,
+			      uint32_t *rdmaw_size)
+{
+	uint32_t diff, wqe_set_size;
+
+	*umr_size = MLX5_CRYPTO_UMR_WQE_STATIC_SIZE +
+			RTE_ALIGN(segs_num, 4) *
+			sizeof(struct mlx5_wqe_dseg);
+	/* Make sure UMR WQE size is multiple of WQBB. */
+	*umr_size = RTE_ALIGN(*umr_size, MLX5_SEND_WQE_BB);
+	*rdmaw_size = sizeof(struct mlx5_rdma_write_wqe) +
+			sizeof(struct mlx5_wqe_dseg) *
+			(segs_num <= 2 ? 2 : 2 +
+			RTE_ALIGN(segs_num - 2, 4));
+	/* Make sure RDMA_WRITE WQE size is multiple of WQBB. */
+	*rdmaw_size = RTE_ALIGN(*rdmaw_size, MLX5_SEND_WQE_BB);
+	wqe_set_size = *rdmaw_size + *umr_size;
+	diff = rte_align32pow2(wqe_set_size) - wqe_set_size;
+	/* Make sure wqe_set size is power of 2. */
+	if (diff)
+		*umr_size += diff;
+}
+
+static uint8_t
+mlx5_crypto_xts_max_segs_num(uint16_t max_wqe_size)
+{
+	int klms_sizes = max_wqe_size - MLX5_CRYPTO_UMR_WQE_STATIC_SIZE;
+	uint32_t max_segs_cap = RTE_ALIGN_FLOOR(klms_sizes, MLX5_SEND_WQE_BB) /
+			sizeof(struct mlx5_wqe_dseg);
+
+	MLX5_ASSERT(klms_sizes >= MLX5_SEND_WQE_BB);
+	while (max_segs_cap) {
+		uint32_t umr_wqe_size, rdmw_wqe_size;
+
+		mlx5_crypto_xts_get_wqe_sizes(max_segs_cap, &umr_wqe_size,
+						&rdmw_wqe_size);
+		if (umr_wqe_size <= max_wqe_size &&
+				rdmw_wqe_size <= max_wqe_size)
+			break;
+		max_segs_cap -= 4;
+	}
+	return max_segs_cap;
+}
+
+static int
+mlx5_crypto_xts_configure_wqe_size(struct mlx5_crypto_priv *priv,
+				   uint16_t max_wqe_size, uint32_t max_segs_num)
+{
+	uint32_t rdmw_wqe_size, umr_wqe_size;
+
+	mlx5_crypto_xts_get_wqe_sizes(max_segs_num, &umr_wqe_size,
+			&rdmw_wqe_size);
+	priv->wqe_set_size = rdmw_wqe_size + umr_wqe_size;
+	if (umr_wqe_size > max_wqe_size ||
+				rdmw_wqe_size > max_wqe_size) {
+		DRV_LOG(ERR, "Invalid max_segs_num: %u. should be %u or lower.",
+			max_segs_num,
+			mlx5_crypto_xts_max_segs_num(max_wqe_size));
+		rte_errno = EINVAL;
+		return -EINVAL;
+	}
+	priv->umr_wqe_size = (uint16_t)umr_wqe_size;
+	priv->umr_wqe_stride = priv->umr_wqe_size / MLX5_SEND_WQE_BB;
+	priv->max_rdmar_ds = rdmw_wqe_size / sizeof(struct mlx5_wqe_dseg);
+	return 0;
+}
+
+int
+mlx5_crypto_xts_init(struct mlx5_crypto_priv *priv)
+{
+	struct mlx5_common_device *cdev = priv->cdev;
+	struct rte_cryptodev *crypto_dev = priv->crypto_dev;
+	struct rte_cryptodev_ops *dev_ops = crypto_dev->dev_ops;
+	int ret;
+
+	ret = mlx5_crypto_xts_configure_wqe_size(priv,
+		cdev->config.hca_attr.max_wqe_sz_sq, priv->max_segs_num);
+	if (ret)
+		return -EINVAL;
+	/* Override AES-XST specified ops. */
+	dev_ops->sym_session_configure = mlx5_crypto_xts_sym_session_configure;
+	dev_ops->queue_pair_setup = mlx5_crypto_xts_queue_pair_setup;
+	dev_ops->queue_pair_release = mlx5_crypto_xts_queue_pair_release;
+	crypto_dev->dequeue_burst = mlx5_crypto_xts_dequeue_burst;
+	crypto_dev->enqueue_burst = mlx5_crypto_xts_enqueue_burst;
+	priv->caps = mlx5_crypto_caps;
+	return 0;
+}
+
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 3/9] crypto/mlx5: add AES-GCM query and initialization
  2023-06-20  1:23 ` Suanming Mou
  2023-06-20  1:23   ` [PATCH v3 1/9] common/mlx5: export memory region lookup by address Suanming Mou
  2023-06-20  1:23   ` [PATCH v3 2/9] crypto/mlx5: split AES-XTS Suanming Mou
@ 2023-06-20  1:23   ` Suanming Mou
  2023-06-20  1:23   ` [PATCH v3 4/9] crypto/mlx5: add AES-GCM encryption key Suanming Mou
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20  1:23 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ori Kam; +Cc: rasland, dev, gakhil

AES-GCM provides both authenticated encryption and the ability to check
the integrity and authentication of additional authenticated data (AAD)
that is sent in the clear.

This commit adds the AES-GCM attributes query and initialization function.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c  | 15 +++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h  | 13 ++++++++++
 drivers/common/mlx5/mlx5_prm.h        | 19 +++++++++++---
 drivers/crypto/mlx5/meson.build       |  1 +
 drivers/crypto/mlx5/mlx5_crypto.c     |  4 ++-
 drivers/crypto/mlx5/mlx5_crypto.h     |  3 +++
 drivers/crypto/mlx5/mlx5_crypto_gcm.c | 36 +++++++++++++++++++++++++++
 7 files changed, 87 insertions(+), 4 deletions(-)
 create mode 100644 drivers/crypto/mlx5/mlx5_crypto_gcm.c

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 1e418a0353..4332081165 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -1117,6 +1117,21 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 		attr->crypto_wrapped_import_method = !!(MLX5_GET(crypto_caps,
 						hcattr, wrapped_import_method)
 						& 1 << 2);
+		attr->crypto_mmo.crypto_mmo_qp = MLX5_GET(crypto_caps, hcattr, crypto_mmo_qp);
+		attr->crypto_mmo.gcm_256_encrypt =
+			MLX5_GET(crypto_caps, hcattr, crypto_aes_gcm_256_encrypt);
+		attr->crypto_mmo.gcm_128_encrypt =
+			MLX5_GET(crypto_caps, hcattr, crypto_aes_gcm_128_encrypt);
+		attr->crypto_mmo.gcm_256_decrypt =
+			MLX5_GET(crypto_caps, hcattr, crypto_aes_gcm_256_decrypt);
+		attr->crypto_mmo.gcm_128_decrypt =
+			MLX5_GET(crypto_caps, hcattr, crypto_aes_gcm_128_decrypt);
+		attr->crypto_mmo.gcm_auth_tag_128 =
+			MLX5_GET(crypto_caps, hcattr, gcm_auth_tag_128);
+		attr->crypto_mmo.gcm_auth_tag_96 =
+			MLX5_GET(crypto_caps, hcattr, gcm_auth_tag_96);
+		attr->crypto_mmo.log_crypto_mmo_max_size =
+			MLX5_GET(crypto_caps, hcattr, log_crypto_mmo_max_size);
 	}
 	if (hca_cap_2_sup) {
 		hcattr = mlx5_devx_get_hca_cap(ctx, in, out, &rc,
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index dc3359268d..cb3f3a211b 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -125,6 +125,18 @@ struct mlx5_hca_flex_attr {
 	uint8_t  header_length_mask_width;
 };
 
+__extension__
+struct mlx5_hca_crypto_mmo_attr {
+	uint32_t crypto_mmo_qp:1;
+	uint32_t gcm_256_encrypt:1;
+	uint32_t gcm_128_encrypt:1;
+	uint32_t gcm_256_decrypt:1;
+	uint32_t gcm_128_decrypt:1;
+	uint32_t gcm_auth_tag_128:1;
+	uint32_t gcm_auth_tag_96:1;
+	uint32_t log_crypto_mmo_max_size:6;
+};
+
 /* ISO C restricts enumerator values to range of 'int' */
 __extension__
 enum {
@@ -250,6 +262,7 @@ struct mlx5_hca_attr {
 	struct mlx5_hca_vdpa_attr vdpa;
 	struct mlx5_hca_flow_attr flow;
 	struct mlx5_hca_flex_attr flex;
+	struct mlx5_hca_crypto_mmo_attr crypto_mmo;
 	int log_max_qp_sz;
 	int log_max_cq_sz;
 	int log_max_qp;
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index d67c4336e6..755bd73275 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -4581,7 +4581,9 @@ struct mlx5_ifc_crypto_caps_bits {
 	u8 synchronize_dek[0x1];
 	u8 int_kek_manual[0x1];
 	u8 int_kek_auto[0x1];
-	u8 reserved_at_6[0x12];
+	u8 reserved_at_6[0xd];
+	u8 sw_wrapped_dek_key_purpose[0x1];
+	u8 reserved_at_14[0x4];
 	u8 wrapped_import_method[0x8];
 	u8 reserved_at_20[0x3];
 	u8 log_dek_max_alloc[0x5];
@@ -4598,8 +4600,19 @@ struct mlx5_ifc_crypto_caps_bits {
 	u8 log_dek_granularity[0x5];
 	u8 reserved_at_68[0x3];
 	u8 log_max_num_int_kek[0x5];
-	u8 reserved_at_70[0x10];
-	u8 reserved_at_80[0x780];
+	u8 sw_wrapped_dek_new[0x10];
+	u8 reserved_at_80[0x80];
+	u8 crypto_mmo_qp[0x1];
+	u8 crypto_aes_gcm_256_encrypt[0x1];
+	u8 crypto_aes_gcm_128_encrypt[0x1];
+	u8 crypto_aes_gcm_256_decrypt[0x1];
+	u8 crypto_aes_gcm_128_decrypt[0x1];
+	u8 gcm_auth_tag_128[0x1];
+	u8 gcm_auth_tag_96[0x1];
+	u8 reserved_at_107[0x3];
+	u8 log_crypto_mmo_max_size[0x6];
+	u8 reserved_at_110[0x10];
+	u8 reserved_at_120[0x6e0];
 };
 
 struct mlx5_ifc_crypto_commissioning_register_bits {
diff --git a/drivers/crypto/mlx5/meson.build b/drivers/crypto/mlx5/meson.build
index 045e8ce81d..17ffce89f0 100644
--- a/drivers/crypto/mlx5/meson.build
+++ b/drivers/crypto/mlx5/meson.build
@@ -16,6 +16,7 @@ endif
 sources = files(
         'mlx5_crypto.c',
 	'mlx5_crypto_xts.c',
+	'mlx5_crypto_gcm.c',
         'mlx5_crypto_dek.c',
 )
 
diff --git a/drivers/crypto/mlx5/mlx5_crypto.c b/drivers/crypto/mlx5/mlx5_crypto.c
index 2e6bcc6ddc..ff632cd69a 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -335,7 +335,9 @@ mlx5_crypto_dev_probe(struct mlx5_common_device *cdev,
 		rte_errno = ENOTSUP;
 		return -rte_errno;
 	}
-	if (!cdev->config.hca_attr.crypto || !cdev->config.hca_attr.aes_xts) {
+	if (!cdev->config.hca_attr.crypto ||
+	   (!cdev->config.hca_attr.aes_xts &&
+	    !cdev->config.hca_attr.crypto_mmo.crypto_mmo_qp)) {
 		DRV_LOG(ERR, "Not enough capabilities to support crypto "
 			"operations, maybe old FW/OFED version?");
 		rte_errno = ENOTSUP;
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index 05d8fe97fe..76f368ee91 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -117,4 +117,7 @@ mlx5_crypto_dek_unset(struct mlx5_crypto_priv *priv);
 int
 mlx5_crypto_xts_init(struct mlx5_crypto_priv *priv);
 
+int
+mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv);
+
 #endif /* MLX5_CRYPTO_H_ */
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
new file mode 100644
index 0000000000..bd78c6d66b
--- /dev/null
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 NVIDIA Corporation & Affiliates
+ */
+
+#include <rte_malloc.h>
+#include <rte_mempool.h>
+#include <rte_eal_paging.h>
+#include <rte_errno.h>
+#include <rte_log.h>
+#include <bus_pci_driver.h>
+#include <rte_memory.h>
+
+#include <mlx5_glue.h>
+#include <mlx5_common.h>
+#include <mlx5_devx_cmds.h>
+#include <mlx5_common_os.h>
+
+#include "mlx5_crypto_utils.h"
+#include "mlx5_crypto.h"
+
+static struct rte_cryptodev_capabilities mlx5_crypto_gcm_caps[] = {
+	{
+		.op = RTE_CRYPTO_OP_TYPE_UNDEFINED,
+	},
+	{
+		.op = RTE_CRYPTO_OP_TYPE_UNDEFINED,
+	}
+};
+
+int
+mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
+{
+	priv->caps = mlx5_crypto_gcm_caps;
+	return 0;
+}
+
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 4/9] crypto/mlx5: add AES-GCM encryption key
  2023-06-20  1:23 ` Suanming Mou
                     ` (2 preceding siblings ...)
  2023-06-20  1:23   ` [PATCH v3 3/9] crypto/mlx5: add AES-GCM query and initialization Suanming Mou
@ 2023-06-20  1:23   ` Suanming Mou
  2023-06-20  1:23   ` [PATCH v3 5/9] crypto/mlx5: add AES-GCM session configure Suanming Mou
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20  1:23 UTC (permalink / raw)
  To: Matan Azrad; +Cc: rasland, dev, gakhil

The crypto device requires the DEK(data encryption key) object for
data encryption/decryption operation.

This commit adds the AES-GCM DEK object management support.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/crypto/mlx5/mlx5_crypto.h     |  17 ++++-
 drivers/crypto/mlx5/mlx5_crypto_dek.c | 102 +++++++++++++-------------
 drivers/crypto/mlx5/mlx5_crypto_gcm.c |  33 +++++++++
 drivers/crypto/mlx5/mlx5_crypto_xts.c |  53 ++++++++++++-
 4 files changed, 150 insertions(+), 55 deletions(-)

diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index 76f368ee91..bb5a557a38 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -86,6 +86,11 @@ struct mlx5_crypto_session {
 	uint32_t dek_id; /**< DEK ID */
 } __rte_packed;
 
+struct mlx5_crypto_dek_ctx {
+	struct rte_crypto_sym_xform *xform;
+	struct mlx5_crypto_priv *priv;
+};
+
 typedef void *(*mlx5_crypto_mkey_update_t)(struct mlx5_crypto_priv *priv,
 					   struct mlx5_crypto_qp *qp,
 					   uint32_t idx);
@@ -106,7 +111,7 @@ mlx5_crypto_dek_destroy(struct mlx5_crypto_priv *priv,
 
 struct mlx5_crypto_dek *
 mlx5_crypto_dek_prepare(struct mlx5_crypto_priv *priv,
-			struct rte_crypto_cipher_xform *cipher);
+			struct rte_crypto_sym_xform *xform);
 
 int
 mlx5_crypto_dek_setup(struct mlx5_crypto_priv *priv);
@@ -120,4 +125,14 @@ mlx5_crypto_xts_init(struct mlx5_crypto_priv *priv);
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv);
 
+int
+mlx5_crypto_dek_fill_xts_attr(struct mlx5_crypto_dek *dek,
+			      struct mlx5_devx_dek_attr *dek_attr,
+			      void *cb_ctx);
+
+int
+mlx5_crypto_dek_fill_gcm_attr(struct mlx5_crypto_dek *dek,
+			      struct mlx5_devx_dek_attr *dek_attr,
+			      void *cb_ctx);
+
 #endif /* MLX5_CRYPTO_H_ */
diff --git a/drivers/crypto/mlx5/mlx5_crypto_dek.c b/drivers/crypto/mlx5/mlx5_crypto_dek.c
index 7339ef2bd9..716bcc0545 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_dek.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_dek.c
@@ -13,10 +13,24 @@
 #include "mlx5_crypto_utils.h"
 #include "mlx5_crypto.h"
 
-struct mlx5_crypto_dek_ctx {
-	struct rte_crypto_cipher_xform *cipher;
-	struct mlx5_crypto_priv *priv;
-};
+static int
+mlx5_crypto_dek_get_key(struct rte_crypto_sym_xform *xform,
+			const uint8_t **key,
+			uint16_t *key_len)
+{
+	if (xform->type == RTE_CRYPTO_SYM_XFORM_CIPHER) {
+		*key = xform->cipher.key.data;
+		*key_len = xform->cipher.key.length;
+	} else if (xform->type == RTE_CRYPTO_SYM_XFORM_AEAD) {
+		*key = xform->aead.key.data;
+		*key_len = xform->aead.key.length;
+	} else {
+		DRV_LOG(ERR, "Xform dek type not supported.");
+		rte_errno = -EINVAL;
+		return -1;
+	}
+	return 0;
+}
 
 int
 mlx5_crypto_dek_destroy(struct mlx5_crypto_priv *priv,
@@ -27,19 +41,22 @@ mlx5_crypto_dek_destroy(struct mlx5_crypto_priv *priv,
 
 struct mlx5_crypto_dek *
 mlx5_crypto_dek_prepare(struct mlx5_crypto_priv *priv,
-			struct rte_crypto_cipher_xform *cipher)
+			struct rte_crypto_sym_xform *xform)
 {
+	const uint8_t *key;
+	uint16_t key_len;
 	struct mlx5_hlist *dek_hlist = priv->dek_hlist;
 	struct mlx5_crypto_dek_ctx dek_ctx = {
-		.cipher = cipher,
+		.xform = xform,
 		.priv = priv,
 	};
-	struct rte_crypto_cipher_xform *cipher_ctx = cipher;
-	uint64_t key64 = __rte_raw_cksum(cipher_ctx->key.data,
-					 cipher_ctx->key.length, 0);
-	struct mlx5_list_entry *entry = mlx5_hlist_register(dek_hlist,
-							     key64, &dek_ctx);
+	uint64_t key64;
+	struct mlx5_list_entry *entry;
 
+	if (mlx5_crypto_dek_get_key(xform, &key, &key_len))
+		return NULL;
+	key64 = __rte_raw_cksum(key, key_len, 0);
+	entry = mlx5_hlist_register(dek_hlist, key64, &dek_ctx);
 	return entry == NULL ? NULL :
 			     container_of(entry, struct mlx5_crypto_dek, entry);
 }
@@ -76,76 +93,55 @@ mlx5_crypto_dek_match_cb(void *tool_ctx __rte_unused,
 			 struct mlx5_list_entry *entry, void *cb_ctx)
 {
 	struct mlx5_crypto_dek_ctx *ctx = cb_ctx;
-	struct rte_crypto_cipher_xform *cipher_ctx = ctx->cipher;
+	struct rte_crypto_sym_xform *xform = ctx->xform;
 	struct mlx5_crypto_dek *dek =
 			container_of(entry, typeof(*dek), entry);
 	uint32_t key_len = dek->size;
+	uint16_t xkey_len;
+	const uint8_t *key;
 
-	if (key_len != cipher_ctx->key.length)
+	if (mlx5_crypto_dek_get_key(xform, &key, &xkey_len))
+		return -1;
+	if (key_len != xkey_len)
 		return -1;
-	return memcmp(cipher_ctx->key.data, dek->data, cipher_ctx->key.length);
+	return memcmp(key, dek->data, xkey_len);
 }
 
 static struct mlx5_list_entry *
 mlx5_crypto_dek_create_cb(void *tool_ctx __rte_unused, void *cb_ctx)
 {
 	struct mlx5_crypto_dek_ctx *ctx = cb_ctx;
-	struct rte_crypto_cipher_xform *cipher_ctx = ctx->cipher;
+	struct rte_crypto_sym_xform *xform = ctx->xform;
 	struct mlx5_crypto_dek *dek = rte_zmalloc(__func__, sizeof(*dek),
 						  RTE_CACHE_LINE_SIZE);
 	struct mlx5_devx_dek_attr dek_attr = {
 		.pd = ctx->priv->cdev->pdn,
-		.key_purpose = MLX5_CRYPTO_KEY_PURPOSE_AES_XTS,
-		.has_keytag = 1,
 	};
-	bool is_wrapped = ctx->priv->is_wrapped_mode;
+	int ret = -1;
 
 	if (dek == NULL) {
 		DRV_LOG(ERR, "Failed to allocate dek memory.");
 		return NULL;
 	}
-	if (is_wrapped) {
-		switch (cipher_ctx->key.length) {
-		case 48:
-			dek->size = 48;
-			dek_attr.key_size = MLX5_CRYPTO_KEY_SIZE_128b;
-			break;
-		case 80:
-			dek->size = 80;
-			dek_attr.key_size = MLX5_CRYPTO_KEY_SIZE_256b;
-			break;
-		default:
-			DRV_LOG(ERR, "Wrapped key size not supported.");
-			return NULL;
-		}
-	} else {
-		switch (cipher_ctx->key.length) {
-		case 32:
-			dek->size = 40;
-			dek_attr.key_size = MLX5_CRYPTO_KEY_SIZE_128b;
-			break;
-		case 64:
-			dek->size = 72;
-			dek_attr.key_size = MLX5_CRYPTO_KEY_SIZE_256b;
-			break;
-		default:
-			DRV_LOG(ERR, "Key size not supported.");
-			return NULL;
-		}
-		memcpy(&dek_attr.key[cipher_ctx->key.length],
-						&ctx->priv->keytag, 8);
-	}
-	memcpy(&dek_attr.key, cipher_ctx->key.data, cipher_ctx->key.length);
+	if (xform->type == RTE_CRYPTO_SYM_XFORM_CIPHER)
+		ret = mlx5_crypto_dek_fill_xts_attr(dek, &dek_attr, cb_ctx);
+	else if (xform->type == RTE_CRYPTO_SYM_XFORM_AEAD)
+		ret = mlx5_crypto_dek_fill_gcm_attr(dek, &dek_attr, cb_ctx);
+	if (ret)
+		goto fail;
 	dek->obj = mlx5_devx_cmd_create_dek_obj(ctx->priv->cdev->ctx,
 						&dek_attr);
 	if (dek->obj == NULL) {
-		rte_free(dek);
-		return NULL;
+		DRV_LOG(ERR, "Failed to create dek obj.");
+		goto fail;
 	}
-	memcpy(&dek->data, cipher_ctx->key.data, cipher_ctx->key.length);
 	return &dek->entry;
+fail:
+	rte_free(dek);
+	return NULL;
 }
 
+
 static void
 mlx5_crypto_dek_remove_cb(void *tool_ctx __rte_unused,
 			  struct mlx5_list_entry *entry)
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index bd78c6d66b..5b315ef42c 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -27,6 +27,39 @@ static struct rte_cryptodev_capabilities mlx5_crypto_gcm_caps[] = {
 	}
 };
 
+int
+mlx5_crypto_dek_fill_gcm_attr(struct mlx5_crypto_dek *dek,
+			      struct mlx5_devx_dek_attr *dek_attr,
+			      void *cb_ctx)
+{
+	uint32_t offset = 0;
+	struct mlx5_crypto_dek_ctx *ctx = cb_ctx;
+	struct rte_crypto_aead_xform *aead_ctx = &ctx->xform->aead;
+
+	if (aead_ctx->algo != RTE_CRYPTO_AEAD_AES_GCM) {
+		DRV_LOG(ERR, "Only AES-GCM algo supported.");
+		return -EINVAL;
+	}
+	dek_attr->key_purpose = MLX5_CRYPTO_KEY_PURPOSE_GCM;
+	switch (aead_ctx->key.length) {
+	case 16:
+		offset = 16;
+		dek->size = 16;
+		dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_128b;
+		break;
+	case 32:
+		dek->size = 32;
+		dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_256b;
+		break;
+	default:
+		DRV_LOG(ERR, "Wrapped key size not supported.");
+		return -EINVAL;
+	}
+	memcpy(&dek_attr->key[offset], aead_ctx->key.data, aead_ctx->key.length);
+	memcpy(&dek->data, aead_ctx->key.data, aead_ctx->key.length);
+	return 0;
+}
+
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 {
diff --git a/drivers/crypto/mlx5/mlx5_crypto_xts.c b/drivers/crypto/mlx5/mlx5_crypto_xts.c
index 964d02e6ed..661da5f589 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_xts.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_xts.c
@@ -45,6 +45,57 @@ const struct rte_cryptodev_capabilities mlx5_crypto_caps[] = {
 	},
 };
 
+int
+mlx5_crypto_dek_fill_xts_attr(struct mlx5_crypto_dek *dek,
+			      struct mlx5_devx_dek_attr *dek_attr,
+			      void *cb_ctx)
+{
+	struct mlx5_crypto_dek_ctx *ctx = cb_ctx;
+	struct rte_crypto_cipher_xform *cipher_ctx = &ctx->xform->cipher;
+	bool is_wrapped = ctx->priv->is_wrapped_mode;
+
+	if (cipher_ctx->algo != RTE_CRYPTO_CIPHER_AES_XTS) {
+		DRV_LOG(ERR, "Only AES-XTS algo supported.");
+		return -EINVAL;
+	}
+	dek_attr->key_purpose = MLX5_CRYPTO_KEY_PURPOSE_AES_XTS;
+	dek_attr->has_keytag = 1;
+	if (is_wrapped) {
+		switch (cipher_ctx->key.length) {
+		case 48:
+			dek->size = 48;
+			dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_128b;
+			break;
+		case 80:
+			dek->size = 80;
+			dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_256b;
+			break;
+		default:
+			DRV_LOG(ERR, "Wrapped key size not supported.");
+			return -EINVAL;
+		}
+	} else {
+		switch (cipher_ctx->key.length) {
+		case 32:
+			dek->size = 40;
+			dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_128b;
+			break;
+		case 64:
+			dek->size = 72;
+			dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_256b;
+			break;
+		default:
+			DRV_LOG(ERR, "Key size not supported.");
+			return -EINVAL;
+		}
+		memcpy(&dek_attr->key[cipher_ctx->key.length],
+						&ctx->priv->keytag, 8);
+	}
+	memcpy(&dek_attr->key, cipher_ctx->key.data, cipher_ctx->key.length);
+	memcpy(&dek->data, cipher_ctx->key.data, cipher_ctx->key.length);
+	return 0;
+}
+
 static int
 mlx5_crypto_xts_sym_session_configure(struct rte_cryptodev *dev,
 				      struct rte_crypto_sym_xform *xform,
@@ -66,7 +117,7 @@ mlx5_crypto_xts_sym_session_configure(struct rte_cryptodev *dev,
 		return -ENOTSUP;
 	}
 	cipher = &xform->cipher;
-	sess_private_data->dek = mlx5_crypto_dek_prepare(priv, cipher);
+	sess_private_data->dek = mlx5_crypto_dek_prepare(priv, xform);
 	if (sess_private_data->dek == NULL) {
 		DRV_LOG(ERR, "Failed to prepare dek.");
 		return -ENOMEM;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 5/9] crypto/mlx5: add AES-GCM session configure
  2023-06-20  1:23 ` Suanming Mou
                     ` (3 preceding siblings ...)
  2023-06-20  1:23   ` [PATCH v3 4/9] crypto/mlx5: add AES-GCM encryption key Suanming Mou
@ 2023-06-20  1:23   ` Suanming Mou
  2023-06-20  1:23   ` [PATCH v3 6/9] common/mlx5: add WQE-based QP synchronous basics Suanming Mou
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20  1:23 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ori Kam; +Cc: rasland, dev, gakhil

Sessions are used in symmetric transformations in order to prepare
objects and data for packet processing stage.

The AES-GCM session includes IV, AAD, digest(tag), DEK, operation
mode information.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/common/mlx5/mlx5_prm.h        | 12 +++++++
 drivers/crypto/mlx5/mlx5_crypto.h     | 40 ++++++++++++++++++-----
 drivers/crypto/mlx5/mlx5_crypto_gcm.c | 47 +++++++++++++++++++++++++++
 3 files changed, 91 insertions(+), 8 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 755bd73275..6b48c6ca32 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -523,11 +523,23 @@ enum {
 	MLX5_BLOCK_SIZE_4048B	= 0x6,
 };
 
+enum {
+	MLX5_ENCRYPTION_TYPE_AES_GCM = 0x3,
+};
+
+enum {
+	MLX5_CRYPTO_OP_TYPE_ENCRYPTION = 0x0,
+	MLX5_CRYPTO_OP_TYPE_DECRYPTION = 0x1,
+};
+
 #define MLX5_BSF_SIZE_OFFSET		30
 #define MLX5_BSF_P_TYPE_OFFSET		24
 #define MLX5_ENCRYPTION_ORDER_OFFSET	16
 #define MLX5_BLOCK_SIZE_OFFSET		24
 
+#define MLX5_CRYPTO_MMO_TYPE_OFFSET 24
+#define MLX5_CRYPTO_MMO_OP_OFFSET 20
+
 struct mlx5_wqe_umr_bsf_seg {
 	/*
 	 * bs_bpt_eo_es contains:
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index bb5a557a38..6cb4d4ddec 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -72,16 +72,40 @@ struct mlx5_crypto_devarg_params {
 };
 
 struct mlx5_crypto_session {
-	uint32_t bs_bpt_eo_es;
-	/**< bsf_size, bsf_p_type, encryption_order and encryption standard,
-	 * saved in big endian format.
-	 */
-	uint32_t bsp_res;
-	/**< crypto_block_size_pointer and reserved 24 bits saved in big
-	 * endian format.
-	 */
+	union {
+		/**< AES-XTS configuration. */
+		struct {
+			uint32_t bs_bpt_eo_es;
+			/**< bsf_size, bsf_p_type, encryption_order and encryption standard,
+			 * saved in big endian format.
+			 */
+			uint32_t bsp_res;
+			/**< crypto_block_size_pointer and reserved 24 bits saved in big
+			 * endian format.
+			 */
+		};
+		/**< AES-GCM configuration. */
+		struct {
+			uint32_t mmo_ctrl;
+			/**< Crypto control fields with algo type and op type in big
+			 * endian format.
+			 */
+			uint32_t wqe_aad_len;
+			/**< Crypto AAD length field in big endian format. */
+			uint32_t wqe_tag_len;
+			/**< Crypto tag length field in big endian format. */
+			uint16_t tag_len;
+			/**< AES-GCM crypto digest size in bytes. */
+			uint16_t aad_len;
+			/**< The length of the additional authenticated data (AAD) in bytes. */
+			uint32_t op_type;
+			/**< Operation type. */
+		};
+	};
 	uint32_t iv_offset:16;
 	/**< Starting point for Initialisation Vector. */
+	uint32_t iv_len;
+	/**< Initialisation Vector length. */
 	struct mlx5_crypto_dek *dek; /**< Pointer to dek struct. */
 	uint32_t dek_id; /**< DEK ID */
 } __rte_packed;
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index 5b315ef42c..5f55314382 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -60,9 +60,56 @@ mlx5_crypto_dek_fill_gcm_attr(struct mlx5_crypto_dek *dek,
 	return 0;
 }
 
+static int
+mlx5_crypto_sym_gcm_session_configure(struct rte_cryptodev *dev,
+				  struct rte_crypto_sym_xform *xform,
+				  struct rte_cryptodev_sym_session *session)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_crypto_session *sess_private_data = CRYPTODEV_GET_SYM_SESS_PRIV(session);
+	struct rte_crypto_aead_xform *aead = &xform->aead;
+	uint32_t op_type;
+
+	if (unlikely(xform->next != NULL)) {
+		DRV_LOG(ERR, "Xform next is not supported.");
+		return -ENOTSUP;
+	}
+	if (aead->algo != RTE_CRYPTO_AEAD_AES_GCM) {
+		DRV_LOG(ERR, "Only AES-GCM algorithm is supported.");
+		return -ENOTSUP;
+	}
+	if (aead->op == RTE_CRYPTO_AEAD_OP_ENCRYPT)
+		op_type = MLX5_CRYPTO_OP_TYPE_ENCRYPTION;
+	else
+		op_type = MLX5_CRYPTO_OP_TYPE_DECRYPTION;
+	sess_private_data->op_type = op_type;
+	sess_private_data->mmo_ctrl = rte_cpu_to_be_32
+			(op_type << MLX5_CRYPTO_MMO_OP_OFFSET |
+			 MLX5_ENCRYPTION_TYPE_AES_GCM << MLX5_CRYPTO_MMO_TYPE_OFFSET);
+	sess_private_data->aad_len = aead->aad_length;
+	sess_private_data->tag_len = aead->digest_length;
+	sess_private_data->iv_offset = aead->iv.offset;
+	sess_private_data->iv_len = aead->iv.length;
+	sess_private_data->dek = mlx5_crypto_dek_prepare(priv, xform);
+	if (sess_private_data->dek == NULL) {
+		DRV_LOG(ERR, "Failed to prepare dek.");
+		return -ENOMEM;
+	}
+	sess_private_data->dek_id =
+			rte_cpu_to_be_32(sess_private_data->dek->obj->id &
+					 0xffffff);
+	DRV_LOG(DEBUG, "Session %p was configured.", sess_private_data);
+	return 0;
+}
+
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 {
+	struct rte_cryptodev *crypto_dev = priv->crypto_dev;
+	struct rte_cryptodev_ops *dev_ops = crypto_dev->dev_ops;
+
+	/* Override AES-GCM specified ops. */
+	dev_ops->sym_session_configure = mlx5_crypto_sym_gcm_session_configure;
 	priv->caps = mlx5_crypto_gcm_caps;
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 6/9] common/mlx5: add WQE-based QP synchronous basics
  2023-06-20  1:23 ` Suanming Mou
                     ` (4 preceding siblings ...)
  2023-06-20  1:23   ` [PATCH v3 5/9] crypto/mlx5: add AES-GCM session configure Suanming Mou
@ 2023-06-20  1:23   ` Suanming Mou
  2023-06-20  1:23   ` [PATCH v3 7/9] crypto/mlx5: add queue pair setup for GCM Suanming Mou
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20  1:23 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ori Kam; +Cc: rasland, dev, gakhil

Nvidia HW provides a synchronous mechanism between QPs. When
creating the QPs, user can set one as primary and another as
follower. The follower QP's WQE execution can be controlled
by primary QP via SEND_EN WQE.

This commit introduces the SEND_EN WQE to improve the WQE
execution sync-up between primary and follower QPs.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  6 ++++++
 drivers/common/mlx5/mlx5_devx_cmds.h |  3 +++
 drivers/common/mlx5/mlx5_prm.h       | 11 +++++++++++
 3 files changed, 20 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 4332081165..ef87862a6d 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2475,6 +2475,12 @@ mlx5_devx_cmd_create_qp(void *ctx,
 				 attr->dbr_umem_valid);
 			MLX5_SET(qpc, qpc, dbr_umem_id, attr->dbr_umem_id);
 		}
+		if (attr->cd_master)
+			MLX5_SET(qpc, qpc, cd_master, attr->cd_master);
+		if (attr->cd_slave_send)
+			MLX5_SET(qpc, qpc, cd_slave_send, attr->cd_slave_send);
+		if (attr->cd_slave_recv)
+			MLX5_SET(qpc, qpc, cd_slave_receive, attr->cd_slave_recv);
 		MLX5_SET64(qpc, qpc, dbr_addr, attr->dbr_address);
 		MLX5_SET64(create_qp_in, in, wq_umem_offset,
 			   attr->wq_umem_offset);
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index cb3f3a211b..e071cd841f 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -559,6 +559,9 @@ struct mlx5_devx_qp_attr {
 	uint64_t wq_umem_offset;
 	uint32_t user_index:24;
 	uint32_t mmo:1;
+	uint32_t cd_master:1;
+	uint32_t cd_slave_send:1;
+	uint32_t cd_slave_recv:1;
 };
 
 struct mlx5_devx_virtio_q_couners_attr {
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 6b48c6ca32..2d0a34ffbc 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -589,6 +589,17 @@ struct mlx5_rdma_write_wqe {
 	struct mlx5_wqe_dseg dseg[];
 } __rte_packed;
 
+struct mlx5_wqe_send_en_seg {
+	uint32_t reserve[2];
+	uint32_t sqnpc;
+	uint32_t qpn;
+} __rte_packed;
+
+struct mlx5_wqe_send_en_wqe {
+	struct mlx5_wqe_cseg ctr;
+	struct mlx5_wqe_send_en_seg sseg;
+} __rte_packed;
+
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 7/9] crypto/mlx5: add queue pair setup for GCM
  2023-06-20  1:23 ` Suanming Mou
                     ` (5 preceding siblings ...)
  2023-06-20  1:23   ` [PATCH v3 6/9] common/mlx5: add WQE-based QP synchronous basics Suanming Mou
@ 2023-06-20  1:23   ` Suanming Mou
  2023-06-20  1:23   ` [PATCH v3 8/9] crypto/mlx5: add enqueue and dequeue operations Suanming Mou
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20  1:23 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ori Kam; +Cc: rasland, dev, gakhil

Crypto queue pair is for handling the encryption/decryption operations.

As AES-GCM AEAD API provides AAD, mbuf, digest separately, low-level FW
only accepts the data in a single contiguous memory region, two internal
QPs are created for AES-GCM queue pair. One for organizing the memory
to be contiguous if they are not. The other is for crypto.

If the buffers are checked as implicitly contiguous, the buffer will be
sent to the crypto QP directly for encryption/decryption. If not, the
buffers will be handled by the first UMR QP. The UMR QP will convert
the buffers to be contiguous one. Then the well organized "new" buffer
can be handled by crypto QP.

The crypto QP is initialized as follower, and UMR as leader. Once
crypto operation input buffer requires memory address space converting
by UMR QP, the crypto QP processing will be triggered by UMR QP.
Otherwise, the ring crypto QP doorbell directly.

The existing max_segs_num devarg is used for define how many segments
the chained mbuf contains same as AES-XTS before.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/common/mlx5/mlx5_common_mr.h  |   1 +
 drivers/common/mlx5/mlx5_prm.h        |  22 +++
 drivers/common/mlx5/version.map       |   2 +
 drivers/crypto/mlx5/mlx5_crypto.h     |  15 ++
 drivers/crypto/mlx5/mlx5_crypto_gcm.c | 230 ++++++++++++++++++++++++++
 5 files changed, 270 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_common_mr.h b/drivers/common/mlx5/mlx5_common_mr.h
index 66623868a2..8789d403b1 100644
--- a/drivers/common/mlx5/mlx5_common_mr.h
+++ b/drivers/common/mlx5/mlx5_common_mr.h
@@ -254,6 +254,7 @@ __rte_internal
 void
 mlx5_common_verbs_dereg_mr(struct mlx5_pmd_mr *pmd_mr);
 
+__rte_internal
 void
 mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb, mlx5_dereg_mr_t *dereg_mr_cb);
 
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 2d0a34ffbc..60dff9dcda 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -470,6 +470,15 @@ struct mlx5_wqe_rseg {
 #define MLX5_UMRC_KO_OFFSET 16u
 #define MLX5_UMRC_TO_BS_OFFSET 0u
 
+/*
+ * As PRM describes, the address of the UMR pointer must be
+ * aligned to 2KB.
+ */
+#define MLX5_UMR_KLM_PTR_ALIGN (1 << 11)
+
+#define MLX5_UMR_KLM_NUM_ALIGN \
+	(MLX5_UMR_KLM_PTR_ALIGN / sizeof(struct mlx5_klm))
+
 struct mlx5_wqe_umr_cseg {
 	uint32_t if_cf_toe_cq_res;
 	uint32_t ko_to_bs;
@@ -674,6 +683,19 @@ union mlx5_gga_compress_opaque {
 	uint32_t data[64];
 };
 
+union mlx5_gga_crypto_opaque {
+	struct {
+		uint32_t syndrome;
+		uint32_t reserved0[2];
+		struct {
+			uint32_t iv[3];
+			uint32_t tag_size;
+			uint32_t aad_size;
+		} cp __rte_packed;
+	} __rte_packed;
+	uint8_t data[64];
+};
+
 struct mlx5_ifc_regexp_mmo_control_bits {
 	uint8_t reserved_at_31[0x2];
 	uint8_t le[0x1];
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index f860b069de..0758ba76de 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -159,5 +159,7 @@ INTERNAL {
 
 	mlx5_os_interrupt_handler_create; # WINDOWS_NO_EXPORT
 	mlx5_os_interrupt_handler_destroy; # WINDOWS_NO_EXPORT
+
+	mlx5_os_set_reg_mr_cb;
 	local: *;
 };
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index 6cb4d4ddec..88a09a6b1c 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -28,8 +28,11 @@ struct mlx5_crypto_priv {
 	TAILQ_ENTRY(mlx5_crypto_priv) next;
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
 	struct rte_cryptodev *crypto_dev;
+	mlx5_reg_mr_t reg_mr_cb; /* Callback to reg_mr func */
+	mlx5_dereg_mr_t dereg_mr_cb; /* Callback to dereg_mr func */
 	struct mlx5_uar uar; /* User Access Region. */
 	uint32_t max_segs_num; /* Maximum supported data segs. */
+	uint32_t max_klm_num; /* Maximum supported klm. */
 	struct mlx5_hlist *dek_hlist; /* Dek hash list. */
 	const struct rte_cryptodev_capabilities *caps;
 	struct rte_cryptodev_config dev_config;
@@ -46,15 +49,27 @@ struct mlx5_crypto_qp {
 	struct mlx5_crypto_priv *priv;
 	struct mlx5_devx_cq cq_obj;
 	struct mlx5_devx_qp qp_obj;
+	struct mlx5_devx_qp umr_qp_obj;
 	struct rte_cryptodev_stats stats;
 	struct rte_crypto_op **ops;
 	struct mlx5_devx_obj **mkey; /* WQE's indirect mekys. */
+	struct mlx5_klm *klm_array;
+	union mlx5_gga_crypto_opaque *opaque_addr;
 	struct mlx5_mr_ctrl mr_ctrl;
+	struct mlx5_pmd_mr mr;
+	/* Crypto QP. */
 	uint8_t *wqe;
 	uint16_t entries_n;
+	uint16_t cq_entries_n;
 	uint16_t pi;
 	uint16_t ci;
 	uint16_t db_pi;
+	/* UMR QP. */
+	uint8_t *umr_wqe;
+	uint16_t umr_wqbbs;
+	uint16_t umr_pi;
+	uint16_t umr_ci;
+	uint32_t umr_errors;
 };
 
 struct mlx5_crypto_dek {
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index 5f55314382..c3859547ee 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -18,6 +18,20 @@
 #include "mlx5_crypto_utils.h"
 #include "mlx5_crypto.h"
 
+/*
+ * AES-GCM uses indirect KLM mode. The UMR WQE comprises of WQE control +
+ * UMR control + mkey context + indirect KLM. The WQE size is aligned to
+ * be 3 WQEBBS.
+ */
+#define MLX5_UMR_GCM_WQE_SIZE \
+	(RTE_ALIGN(sizeof(struct mlx5_umr_wqe) + sizeof(struct mlx5_wqe_dseg), \
+			MLX5_SEND_WQE_BB))
+
+#define MLX5_UMR_GCM_WQE_SET_SIZE \
+	(MLX5_UMR_GCM_WQE_SIZE + \
+	 RTE_ALIGN(sizeof(struct mlx5_wqe_send_en_wqe), \
+	 MLX5_SEND_WQE_BB))
+
 static struct rte_cryptodev_capabilities mlx5_crypto_gcm_caps[] = {
 	{
 		.op = RTE_CRYPTO_OP_TYPE_UNDEFINED,
@@ -86,6 +100,8 @@ mlx5_crypto_sym_gcm_session_configure(struct rte_cryptodev *dev,
 	sess_private_data->mmo_ctrl = rte_cpu_to_be_32
 			(op_type << MLX5_CRYPTO_MMO_OP_OFFSET |
 			 MLX5_ENCRYPTION_TYPE_AES_GCM << MLX5_CRYPTO_MMO_TYPE_OFFSET);
+	sess_private_data->wqe_aad_len = rte_cpu_to_be_32((uint32_t)aead->aad_length);
+	sess_private_data->wqe_tag_len = rte_cpu_to_be_32((uint32_t)aead->digest_length);
 	sess_private_data->aad_len = aead->aad_length;
 	sess_private_data->tag_len = aead->digest_length;
 	sess_private_data->iv_offset = aead->iv.offset;
@@ -102,6 +118,216 @@ mlx5_crypto_sym_gcm_session_configure(struct rte_cryptodev *dev,
 	return 0;
 }
 
+static void *
+mlx5_crypto_gcm_mkey_klm_update(struct mlx5_crypto_priv *priv,
+				struct mlx5_crypto_qp *qp __rte_unused,
+				uint32_t idx)
+{
+	return &qp->klm_array[idx * priv->max_klm_num];
+}
+
+static int
+mlx5_crypto_gcm_qp_release(struct rte_cryptodev *dev, uint16_t qp_id)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_crypto_qp *qp = dev->data->queue_pairs[qp_id];
+
+	if (qp->umr_qp_obj.qp != NULL)
+		mlx5_devx_qp_destroy(&qp->umr_qp_obj);
+	if (qp->qp_obj.qp != NULL)
+		mlx5_devx_qp_destroy(&qp->qp_obj);
+	if (qp->cq_obj.cq != NULL)
+		mlx5_devx_cq_destroy(&qp->cq_obj);
+	if (qp->mr.obj != NULL) {
+		void *opaq = qp->mr.addr;
+
+		priv->dereg_mr_cb(&qp->mr);
+		rte_free(opaq);
+	}
+	mlx5_crypto_indirect_mkeys_release(qp, qp->entries_n);
+	mlx5_mr_btree_free(&qp->mr_ctrl.cache_bh);
+	rte_free(qp);
+	dev->data->queue_pairs[qp_id] = NULL;
+	return 0;
+}
+
+static void
+mlx5_crypto_gcm_init_qp(struct mlx5_crypto_qp *qp)
+{
+	volatile struct mlx5_gga_wqe *restrict wqe =
+				    (volatile struct mlx5_gga_wqe *)qp->qp_obj.wqes;
+	volatile union mlx5_gga_crypto_opaque *opaq = qp->opaque_addr;
+	const uint32_t sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) | 4u);
+	const uint32_t flags = RTE_BE32(MLX5_COMP_ALWAYS <<
+					MLX5_COMP_MODE_OFFSET);
+	const uint32_t opaq_lkey = rte_cpu_to_be_32(qp->mr.lkey);
+	int i;
+
+	/* All the next fields state should stay constant. */
+	for (i = 0; i < qp->entries_n; ++i, ++wqe) {
+		wqe->sq_ds = sq_ds;
+		wqe->flags = flags;
+		wqe->opaque_lkey = opaq_lkey;
+		wqe->opaque_vaddr = rte_cpu_to_be_64((uint64_t)(uintptr_t)&opaq[i]);
+	}
+}
+
+static inline int
+mlx5_crypto_gcm_umr_qp_setup(struct rte_cryptodev *dev, struct mlx5_crypto_qp *qp,
+			     int socket_id)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_devx_qp_attr attr = {0};
+	uint32_t ret;
+	uint32_t log_wqbb_n;
+
+	/* Set UMR + SEND_EN WQE as maximum same with crypto. */
+	log_wqbb_n = rte_log2_u32(qp->entries_n *
+			(MLX5_UMR_GCM_WQE_SET_SIZE / MLX5_SEND_WQE_BB));
+	attr.pd = priv->cdev->pdn;
+	attr.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar.obj);
+	attr.cqn = qp->cq_obj.cq->id;
+	attr.num_of_receive_wqes = 0;
+	attr.num_of_send_wqbbs = RTE_BIT32(log_wqbb_n);
+	attr.ts_format =
+		mlx5_ts_format_conv(priv->cdev->config.hca_attr.qp_ts_format);
+	attr.cd_master = 1;
+	ret = mlx5_devx_qp_create(priv->cdev->ctx, &qp->umr_qp_obj,
+				  attr.num_of_send_wqbbs * MLX5_SEND_WQE_BB,
+				  &attr, socket_id);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to create UMR QP.");
+		return -1;
+	}
+	if (mlx5_devx_qp2rts(&qp->umr_qp_obj, qp->umr_qp_obj.qp->id)) {
+		DRV_LOG(ERR, "Failed to change UMR QP state to RTS.");
+		return -1;
+	}
+	/* Save the UMR WQEBBS for checking the WQE boundary. */
+	qp->umr_wqbbs = attr.num_of_send_wqbbs;
+	return 0;
+}
+
+static int
+mlx5_crypto_gcm_qp_setup(struct rte_cryptodev *dev, uint16_t qp_id,
+			 const struct rte_cryptodev_qp_conf *qp_conf,
+			 int socket_id)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_hca_attr *attr = &priv->cdev->config.hca_attr;
+	struct mlx5_crypto_qp *qp;
+	struct mlx5_devx_cq_attr cq_attr = {
+		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
+	};
+	struct mlx5_devx_qp_attr qp_attr = {
+		.pd = priv->cdev->pdn,
+		.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
+		.user_index = qp_id,
+	};
+	struct mlx5_devx_mkey_attr mkey_attr = {
+		.pd = priv->cdev->pdn,
+		.umr_en = 1,
+		.klm_num = priv->max_klm_num,
+	};
+	uint32_t log_ops_n = rte_log2_u32(qp_conf->nb_descriptors);
+	uint32_t entries = RTE_BIT32(log_ops_n);
+	uint32_t alloc_size = sizeof(*qp);
+	size_t mr_size, opaq_size;
+	void *mr_buf;
+	int ret;
+
+	alloc_size = RTE_ALIGN(alloc_size, RTE_CACHE_LINE_SIZE);
+	alloc_size += (sizeof(struct rte_crypto_op *) +
+		       sizeof(struct mlx5_devx_obj *)) * entries;
+	qp = rte_zmalloc_socket(__func__, alloc_size, RTE_CACHE_LINE_SIZE,
+				socket_id);
+	if (qp == NULL) {
+		DRV_LOG(ERR, "Failed to allocate qp memory.");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	qp->priv = priv;
+	qp->entries_n = entries;
+	if (mlx5_mr_ctrl_init(&qp->mr_ctrl, &priv->cdev->mr_scache.dev_gen,
+				  priv->dev_config.socket_id)) {
+		DRV_LOG(ERR, "Cannot allocate MR Btree for qp %u.",
+			(uint32_t)qp_id);
+		rte_errno = ENOMEM;
+		goto err;
+	}
+	/*
+	 * The following KLM pointer must be aligned with
+	 * MLX5_UMR_KLM_PTR_ALIGN. Aligned opaq_size here
+	 * to make the KLM pointer with offset be aligned.
+	 */
+	opaq_size = RTE_ALIGN(sizeof(union mlx5_gga_crypto_opaque) * entries,
+			      MLX5_UMR_KLM_PTR_ALIGN);
+	mr_size = (priv->max_klm_num * sizeof(struct mlx5_klm) * entries) + opaq_size;
+	mr_buf = rte_calloc(__func__, (size_t)1, mr_size, MLX5_UMR_KLM_PTR_ALIGN);
+	if (mr_buf == NULL) {
+		DRV_LOG(ERR, "Failed to allocate mr memory.");
+		rte_errno = ENOMEM;
+		goto err;
+	}
+	if (priv->reg_mr_cb(priv->cdev->pd, mr_buf, mr_size, &qp->mr) != 0) {
+		rte_free(mr_buf);
+		DRV_LOG(ERR, "Failed to register opaque MR.");
+		rte_errno = ENOMEM;
+		goto err;
+	}
+	qp->opaque_addr = qp->mr.addr;
+	qp->klm_array = RTE_PTR_ADD(qp->opaque_addr, opaq_size);
+	/*
+	 * Triple the CQ size as UMR QP which contains UMR and SEND_EN WQE
+	 * will share this CQ .
+	 */
+	qp->cq_entries_n = rte_align32pow2(entries * 3);
+	ret = mlx5_devx_cq_create(priv->cdev->ctx, &qp->cq_obj,
+				  rte_log2_u32(qp->cq_entries_n),
+				  &cq_attr, socket_id);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Failed to create CQ.");
+		goto err;
+	}
+	qp_attr.cqn = qp->cq_obj.cq->id;
+	qp_attr.ts_format = mlx5_ts_format_conv(attr->qp_ts_format);
+	qp_attr.num_of_receive_wqes = 0;
+	qp_attr.num_of_send_wqbbs = entries;
+	qp_attr.mmo = attr->crypto_mmo.crypto_mmo_qp;
+	/* Set MMO QP as follower as the input data may depend on UMR. */
+	qp_attr.cd_slave_send = 1;
+	ret = mlx5_devx_qp_create(priv->cdev->ctx, &qp->qp_obj,
+				  qp_attr.num_of_send_wqbbs * MLX5_WQE_SIZE,
+				  &qp_attr, socket_id);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Failed to create QP.");
+		goto err;
+	}
+	mlx5_crypto_gcm_init_qp(qp);
+	ret = mlx5_devx_qp2rts(&qp->qp_obj, 0);
+	if (ret)
+		goto err;
+	qp->ops = (struct rte_crypto_op **)(qp + 1);
+	qp->mkey = (struct mlx5_devx_obj **)(qp->ops + entries);
+	if (mlx5_crypto_gcm_umr_qp_setup(dev, qp, socket_id)) {
+		DRV_LOG(ERR, "Failed to setup UMR QP.");
+		goto err;
+	}
+	DRV_LOG(INFO, "QP %u: SQN=0x%X CQN=0x%X entries num = %u",
+		(uint32_t)qp_id, qp->qp_obj.qp->id, qp->cq_obj.cq->id, entries);
+	if (mlx5_crypto_indirect_mkeys_prepare(priv, qp, &mkey_attr,
+					       mlx5_crypto_gcm_mkey_klm_update)) {
+		DRV_LOG(ERR, "Cannot allocate indirect memory regions.");
+		rte_errno = ENOMEM;
+		goto err;
+	}
+	dev->data->queue_pairs[qp_id] = qp;
+	return 0;
+err:
+	mlx5_crypto_gcm_qp_release(dev, qp_id);
+	return -1;
+}
+
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 {
@@ -110,6 +336,10 @@ mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 
 	/* Override AES-GCM specified ops. */
 	dev_ops->sym_session_configure = mlx5_crypto_sym_gcm_session_configure;
+	mlx5_os_set_reg_mr_cb(&priv->reg_mr_cb, &priv->dereg_mr_cb);
+	dev_ops->queue_pair_setup = mlx5_crypto_gcm_qp_setup;
+	dev_ops->queue_pair_release = mlx5_crypto_gcm_qp_release;
+	priv->max_klm_num = RTE_ALIGN((priv->max_segs_num + 1) * 2 + 1, MLX5_UMR_KLM_NUM_ALIGN);
 	priv->caps = mlx5_crypto_gcm_caps;
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 8/9] crypto/mlx5: add enqueue and dequeue operations
  2023-06-20  1:23 ` Suanming Mou
                     ` (6 preceding siblings ...)
  2023-06-20  1:23   ` [PATCH v3 7/9] crypto/mlx5: add queue pair setup for GCM Suanming Mou
@ 2023-06-20  1:23   ` Suanming Mou
  2023-06-20  1:23   ` [PATCH v3 9/9] crypto/mlx5: enable AES-GCM capability Suanming Mou
  2023-06-20  9:55   ` [PATCH v2 0/9] crypto/mlx5: support AES-GCM Suanming Mou
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20  1:23 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ori Kam; +Cc: rasland, dev, gakhil

The crypto operations are performed with crypto WQE. If the input
buffers(AAD, mbuf, digest) are not contiguous and there is no enough
headroom/tailroom for copying AAD/digest, as the requirement from FW,
an UMR WQE is needed to generate contiguous address space for crypto
WQE. The UMR WQE and crypto WQE are handled in two different QPs.

Crypto operation with non-contiguous buffers will have its own UMR
WQE, while the operation with contiguous buffers doesn't need the
UMR WQE. Once the all the operations WQE in the enqueue burst built
finishes, if any UMR WQEs are built, an additional SEND_EN WQE will
be as the final WQE of the burst in the UMR QP. The purpose of that
SEND_EN WQE is to trigger the crypto QP processing with the UMR ready
input memory address space buffers.

The QP for crypto operations contains only the crypto WQE and the QP
WQEs are built as fixed in QP setup. The QP processing is triggered
by doorbell ring or the SEND_EN WQE from UMR QP.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 drivers/common/mlx5/mlx5_prm.h        |   1 +
 drivers/crypto/mlx5/mlx5_crypto.c     |   9 +-
 drivers/crypto/mlx5/mlx5_crypto.h     |   8 +
 drivers/crypto/mlx5/mlx5_crypto_gcm.c | 588 ++++++++++++++++++++++++++
 4 files changed, 604 insertions(+), 2 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 60dff9dcda..0d84d92af5 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -617,6 +617,7 @@ struct mlx5_wqe_send_en_wqe {
 /* MMO metadata segment */
 
 #define	MLX5_OPCODE_MMO	0x2fu
+#define	MLX5_OPC_MOD_MMO_CRYPTO 0x6u
 #define	MLX5_OPC_MOD_MMO_REGEX 0x4u
 #define	MLX5_OPC_MOD_MMO_COMP 0x2u
 #define	MLX5_OPC_MOD_MMO_DECOMP 0x3u
diff --git a/drivers/crypto/mlx5/mlx5_crypto.c b/drivers/crypto/mlx5/mlx5_crypto.c
index ff632cd69a..4d7d3ef2a3 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -62,8 +62,13 @@ mlx5_crypto_dev_infos_get(struct rte_cryptodev *dev,
 			MLX5_CRYPTO_FEATURE_FLAGS(priv->is_wrapped_mode);
 		dev_info->capabilities = priv->caps;
 		dev_info->max_nb_queue_pairs = MLX5_CRYPTO_MAX_QPS;
-		dev_info->min_mbuf_headroom_req = 0;
-		dev_info->min_mbuf_tailroom_req = 0;
+		if (priv->caps->sym.xform_type == RTE_CRYPTO_SYM_XFORM_AEAD) {
+			dev_info->min_mbuf_headroom_req = MLX5_CRYPTO_GCM_MAX_AAD;
+			dev_info->min_mbuf_tailroom_req = MLX5_CRYPTO_GCM_MAX_DIGEST;
+		} else {
+			dev_info->min_mbuf_headroom_req = 0;
+			dev_info->min_mbuf_tailroom_req = 0;
+		}
 		dev_info->sym.max_nb_sessions = 0;
 		/*
 		 * If 0, the device does not have any limitation in number of
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index 88a09a6b1c..6dcb41b27c 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -23,6 +23,8 @@
 #define MLX5_CRYPTO_KLM_SEGS_NUM(umr_wqe_sz) ((umr_wqe_sz -\
 					MLX5_CRYPTO_UMR_WQE_STATIC_SIZE) /\
 					MLX5_WSEG_SIZE)
+#define MLX5_CRYPTO_GCM_MAX_AAD 64
+#define MLX5_CRYPTO_GCM_MAX_DIGEST 16
 
 struct mlx5_crypto_priv {
 	TAILQ_ENTRY(mlx5_crypto_priv) next;
@@ -61,6 +63,9 @@ struct mlx5_crypto_qp {
 	uint8_t *wqe;
 	uint16_t entries_n;
 	uint16_t cq_entries_n;
+	uint16_t reported_ci;
+	uint16_t qp_ci;
+	uint16_t cq_ci;
 	uint16_t pi;
 	uint16_t ci;
 	uint16_t db_pi;
@@ -70,6 +75,9 @@ struct mlx5_crypto_qp {
 	uint16_t umr_pi;
 	uint16_t umr_ci;
 	uint32_t umr_errors;
+	uint16_t last_gga_pi;
+	bool has_umr;
+	uint16_t cpy_tag_op;
 };
 
 struct mlx5_crypto_dek {
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index c3859547ee..8389c03c91 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -9,6 +9,7 @@
 #include <rte_log.h>
 #include <bus_pci_driver.h>
 #include <rte_memory.h>
+#include <rte_io.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
@@ -32,6 +33,40 @@
 	 RTE_ALIGN(sizeof(struct mlx5_wqe_send_en_wqe), \
 	 MLX5_SEND_WQE_BB))
 
+#define MLX5_UMR_GCM_WQE_STRIDE \
+	(MLX5_UMR_GCM_WQE_SIZE / MLX5_SEND_WQE_BB)
+
+#define MLX5_MMO_CRYPTO_OPC (MLX5_OPCODE_MMO | \
+	(MLX5_OPC_MOD_MMO_CRYPTO << WQE_CSEG_OPC_MOD_OFFSET))
+
+/*
+ * The status default value is RTE_CRYPTO_OP_STATUS_SUCCESS.
+ * Copy tag should fill different value to status.
+ */
+#define MLX5_CRYPTO_OP_STATUS_GCM_TAG_COPY (RTE_CRYPTO_OP_STATUS_SUCCESS + 1)
+
+struct mlx5_crypto_gcm_op_info {
+	bool need_umr;
+	bool is_oop;
+	bool is_enc;
+	void *digest;
+	void *src_addr;
+};
+
+struct mlx5_crypto_gcm_data {
+	void *src_addr;
+	uint32_t src_bytes;
+	void *dst_addr;
+	uint32_t dst_bytes;
+	uint32_t src_mkey;
+	uint32_t dst_mkey;
+};
+
+struct mlx5_crypto_gcm_tag_cpy_info {
+	void *digest;
+	uint8_t tag_len;
+} __rte_packed;
+
 static struct rte_cryptodev_capabilities mlx5_crypto_gcm_caps[] = {
 	{
 		.op = RTE_CRYPTO_OP_TYPE_UNDEFINED,
@@ -328,6 +363,557 @@ mlx5_crypto_gcm_qp_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 	return -1;
 }
 
+static __rte_always_inline void
+mlx5_crypto_gcm_get_op_info(struct mlx5_crypto_qp *qp,
+			    struct rte_crypto_op *op,
+			    struct mlx5_crypto_gcm_op_info *op_info)
+{
+	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+	struct rte_mbuf *m_src = op->sym->m_src;
+	void *aad_addr = op->sym->aead.aad.data;
+	void *tag_addr = op->sym->aead.digest.data;
+	void *src_addr = rte_pktmbuf_mtod_offset(m_src, void *, op->sym->aead.data.offset);
+	struct rte_mbuf *m_dst = m_src;
+	void *dst_addr = src_addr;
+	void *expected_aad = NULL;
+	void *expected_tag = NULL;
+	bool is_enc = sess->op_type == MLX5_CRYPTO_OP_TYPE_ENCRYPTION;
+	bool cp_aad = false;
+	bool cp_tag = false;
+
+	op_info->is_oop = false;
+	op_info->need_umr = false;
+	op_info->is_enc = is_enc;
+	op_info->digest = NULL;
+	op_info->src_addr = aad_addr;
+	if (op->sym->m_dst && op->sym->m_dst != m_src) {
+		op_info->is_oop = true;
+		m_dst = op->sym->m_dst;
+		dst_addr = rte_pktmbuf_mtod_offset(m_dst, void *, op->sym->aead.data.offset);
+		if (m_dst->nb_segs > 1) {
+			op_info->need_umr = true;
+			return;
+		}
+		/*
+		 * If the op's mbuf has extra data offset, don't copy AAD to
+		 * this area.
+		 */
+		if (rte_pktmbuf_headroom(m_dst) < sess->aad_len ||
+		    op->sym->aead.data.offset) {
+			op_info->need_umr = true;
+			return;
+		}
+	}
+	if (m_src->nb_segs > 1) {
+		op_info->need_umr = true;
+		return;
+	}
+	expected_aad = RTE_PTR_SUB(src_addr, sess->aad_len);
+	if (expected_aad != aad_addr) {
+		/*
+		 * If the op's mbuf has extra data offset, don't copy AAD to
+		 * this area.
+		 */
+		if (sess->aad_len > MLX5_CRYPTO_GCM_MAX_AAD ||
+		    sess->aad_len > rte_pktmbuf_headroom(m_src) ||
+		    op->sym->aead.data.offset) {
+			op_info->need_umr = true;
+			return;
+		}
+		cp_aad = true;
+		op_info->src_addr = expected_aad;
+	}
+	expected_tag = RTE_PTR_ADD(is_enc ? dst_addr : src_addr, op->sym->aead.data.length);
+	if (expected_tag != tag_addr) {
+		struct rte_mbuf *mbuf = is_enc ? m_dst : m_src;
+
+		/*
+		 * If op's mbuf is not fully set as payload, don't copy digest to
+		 * the left area.
+		 */
+		if (rte_pktmbuf_tailroom(mbuf) < sess->tag_len ||
+		    rte_pktmbuf_data_len(mbuf) != op->sym->aead.data.length) {
+			op_info->need_umr = true;
+			return;
+		}
+		if (is_enc) {
+			op_info->digest = expected_tag;
+			qp->cpy_tag_op++;
+		} else {
+			cp_tag = true;
+		}
+	}
+	if (cp_aad)
+		memcpy(expected_aad, aad_addr, sess->aad_len);
+	if (cp_tag)
+		memcpy(expected_tag, tag_addr, sess->tag_len);
+}
+
+static __rte_always_inline uint32_t
+_mlx5_crypto_gcm_umr_build_mbuf_klm(struct mlx5_crypto_qp *qp,
+				    struct rte_mbuf *mbuf,
+				    struct mlx5_klm *klm,
+				    uint32_t offset,
+				    uint32_t *remain)
+{
+	uint32_t data_len = (rte_pktmbuf_data_len(mbuf) - offset);
+	uintptr_t addr = rte_pktmbuf_mtod_offset(mbuf, uintptr_t, offset);
+
+	if (data_len > *remain)
+		data_len = *remain;
+	*remain -= data_len;
+	klm->byte_count = rte_cpu_to_be_32(data_len);
+	klm->address = rte_cpu_to_be_64(addr);
+	klm->mkey = mlx5_mr_mb2mr(&qp->mr_ctrl, mbuf);
+	return klm->mkey;
+}
+
+static __rte_always_inline int
+mlx5_crypto_gcm_build_mbuf_chain_klms(struct mlx5_crypto_qp *qp,
+				      struct rte_crypto_op *op,
+				      struct rte_mbuf *mbuf,
+				      struct mlx5_klm *klm)
+{
+	uint32_t remain_len = op->sym->aead.data.length;
+	__rte_unused uint32_t nb_segs = mbuf->nb_segs;
+	uint32_t klm_n = 0;
+
+	/* mbuf seg num should be less than max_segs_num. */
+	MLX5_ASSERT(nb_segs <= qp->priv->max_segs_num);
+	/* First mbuf needs to take the data offset. */
+	if (unlikely(_mlx5_crypto_gcm_umr_build_mbuf_klm(qp, mbuf, klm,
+		     op->sym->aead.data.offset, &remain_len) == UINT32_MAX)) {
+		op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+		return 0;
+	}
+	klm++;
+	klm_n++;
+	while (remain_len) {
+		nb_segs--;
+		mbuf = mbuf->next;
+		MLX5_ASSERT(mbuf && nb_segs);
+		if (unlikely(_mlx5_crypto_gcm_umr_build_mbuf_klm(qp, mbuf, klm,
+						0, &remain_len) == UINT32_MAX)) {
+			op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+			return 0;
+		}
+		klm++;
+		klm_n++;
+	}
+	return klm_n;
+}
+
+static __rte_always_inline int
+mlx5_crypto_gcm_build_klm_by_addr(struct mlx5_crypto_qp *qp,
+				  struct mlx5_klm *klm,
+				  void *addr,
+				  uint32_t len)
+{
+	klm->byte_count = rte_cpu_to_be_32(len);
+	klm->address = rte_cpu_to_be_64((uintptr_t)addr);
+	klm->mkey = mlx5_mr_addr2mr_bh(&qp->mr_ctrl, (uintptr_t)addr);
+	if (klm->mkey == UINT32_MAX)
+		return 0;
+	return 1;
+}
+
+static __rte_always_inline int
+mlx5_crypto_gcm_build_op_klm(struct mlx5_crypto_qp *qp,
+			     struct rte_crypto_op *op,
+			     struct mlx5_crypto_gcm_op_info *op_info,
+			     struct mlx5_klm *klm,
+			     uint32_t *len)
+{
+	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+	struct mlx5_klm *digest = NULL, *aad = NULL;
+	uint32_t total_len = op->sym->aead.data.length + sess->aad_len + sess->tag_len;
+	uint32_t klm_n = 0, klm_src = 0, klm_dst = 0;
+
+	/* Build AAD KLM. */
+	aad = klm;
+	if (!mlx5_crypto_gcm_build_klm_by_addr(qp, aad, op->sym->aead.aad.data, sess->aad_len))
+		return 0;
+	klm_n++;
+	/* Build src mubf KLM. */
+	klm_src = mlx5_crypto_gcm_build_mbuf_chain_klms(qp, op, op->sym->m_src, &klm[klm_n]);
+	if (!klm_src)
+		return 0;
+	klm_n += klm_src;
+	/* Reserve digest KLM if needed. */
+	if (!op_info->is_oop ||
+	    sess->op_type == MLX5_CRYPTO_OP_TYPE_DECRYPTION) {
+		digest = &klm[klm_n];
+		klm_n++;
+	}
+	/* Build dst mbuf KLM. */
+	if (op_info->is_oop) {
+		klm[klm_n] = *aad;
+		klm_n++;
+		klm_dst = mlx5_crypto_gcm_build_mbuf_chain_klms(qp, op, op->sym->m_dst,
+								&klm[klm_n]);
+		if (!klm_dst)
+			return 0;
+		klm_n += klm_dst;
+		total_len += (op->sym->aead.data.length + sess->aad_len);
+	}
+	/* Update digest at the end if it is not set. */
+	if (!digest) {
+		digest = &klm[klm_n];
+		klm_n++;
+	}
+	/* Build digest KLM. */
+	if (!mlx5_crypto_gcm_build_klm_by_addr(qp, digest, op->sym->aead.digest.data,
+					       sess->tag_len))
+		return 0;
+	*len = total_len;
+	return klm_n;
+}
+
+static __rte_always_inline struct mlx5_wqe_cseg *
+mlx5_crypto_gcm_get_umr_wqe(struct mlx5_crypto_qp *qp)
+{
+	uint32_t wqe_offset = qp->umr_pi & (qp->umr_wqbbs - 1);
+	uint32_t left_wqbbs = qp->umr_wqbbs - wqe_offset;
+	struct mlx5_wqe_cseg *wqe;
+
+	/* If UMR WQE is near the boundary. */
+	if (left_wqbbs < MLX5_UMR_GCM_WQE_STRIDE) {
+		/* Append NOP WQE as the left WQEBBS is not enough for UMR. */
+		wqe = RTE_PTR_ADD(qp->umr_qp_obj.umem_buf, wqe_offset * MLX5_SEND_WQE_BB);
+		wqe->opcode = rte_cpu_to_be_32(MLX5_OPCODE_NOP | ((uint32_t)qp->umr_pi << 8));
+		wqe->sq_ds = rte_cpu_to_be_32((qp->umr_qp_obj.qp->id << 8) | (left_wqbbs << 2));
+		wqe->flags = RTE_BE32(0);
+		wqe->misc = RTE_BE32(0);
+		qp->umr_pi += left_wqbbs;
+		wqe_offset = qp->umr_pi & (qp->umr_wqbbs - 1);
+	}
+	wqe_offset *= MLX5_SEND_WQE_BB;
+	return RTE_PTR_ADD(qp->umr_qp_obj.umem_buf, wqe_offset);
+}
+
+static __rte_always_inline int
+mlx5_crypto_gcm_build_umr(struct mlx5_crypto_qp *qp,
+			  struct rte_crypto_op *op,
+			  uint32_t idx,
+			  struct mlx5_crypto_gcm_op_info *op_info,
+			  struct mlx5_crypto_gcm_data *data)
+{
+	struct mlx5_crypto_priv *priv = qp->priv;
+	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+	struct mlx5_wqe_cseg *wqe;
+	struct mlx5_wqe_umr_cseg *ucseg;
+	struct mlx5_wqe_mkey_cseg *mkc;
+	struct mlx5_klm *iklm;
+	struct mlx5_klm *klm = &qp->klm_array[idx * priv->max_klm_num];
+	uint16_t klm_size, klm_align;
+	uint32_t total_len;
+
+	/* Build KLM base on the op. */
+	klm_size = mlx5_crypto_gcm_build_op_klm(qp, op, op_info, klm, &total_len);
+	if (!klm_size)
+		return -EINVAL;
+	klm_align = RTE_ALIGN(klm_size, 4);
+	/* Get UMR WQE memory. */
+	wqe = mlx5_crypto_gcm_get_umr_wqe(qp);
+	memset(wqe, 0, MLX5_UMR_GCM_WQE_SIZE);
+	/* Set WQE control seg. Non-inline KLM UMR WQE size must be 9 WQE_DS. */
+	wqe->opcode = rte_cpu_to_be_32(MLX5_OPCODE_UMR | ((uint32_t)qp->umr_pi << 8));
+	wqe->sq_ds = rte_cpu_to_be_32((qp->umr_qp_obj.qp->id << 8) | 9);
+	wqe->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR << MLX5_COMP_MODE_OFFSET);
+	wqe->misc = rte_cpu_to_be_32(qp->mkey[idx]->id);
+	/* Set UMR WQE control seg. */
+	ucseg = (struct mlx5_wqe_umr_cseg *)(wqe + 1);
+	ucseg->mkey_mask |= RTE_BE64(1u << 0);
+	ucseg->ko_to_bs = rte_cpu_to_be_32(klm_align << MLX5_UMRC_KO_OFFSET);
+	/* Set mkey context seg. */
+	mkc = (struct mlx5_wqe_mkey_cseg *)(ucseg + 1);
+	mkc->len = rte_cpu_to_be_64(total_len);
+	mkc->qpn_mkey = rte_cpu_to_be_32(0xffffff00 | (qp->mkey[idx]->id & 0xff));
+	/* Set UMR pointer to data seg. */
+	iklm = (struct mlx5_klm *)(mkc + 1);
+	iklm->address = rte_cpu_to_be_64((uintptr_t)((char *)klm));
+	iklm->mkey = rte_cpu_to_be_32(qp->mr.lkey);
+	data->src_mkey = rte_cpu_to_be_32(qp->mkey[idx]->id);
+	data->dst_mkey = data->src_mkey;
+	data->src_addr = 0;
+	data->src_bytes = sess->aad_len + op->sym->aead.data.length;
+	data->dst_bytes = data->src_bytes;
+	if (op_info->is_enc)
+		data->dst_bytes += sess->tag_len;
+	else
+		data->src_bytes += sess->tag_len;
+	if (op_info->is_oop)
+		data->dst_addr = (void *)(uintptr_t)(data->src_bytes);
+	else
+		data->dst_addr = 0;
+	/* Clear the padding memory. */
+	memset(&klm[klm_size], 0, sizeof(struct mlx5_klm) * (klm_align - klm_size));
+	/* Update PI and WQE */
+	qp->umr_pi += MLX5_UMR_GCM_WQE_STRIDE;
+	qp->umr_wqe = (uint8_t *)wqe;
+	return 0;
+}
+
+static __rte_always_inline void
+mlx5_crypto_gcm_build_send_en(struct mlx5_crypto_qp *qp)
+{
+	uint32_t wqe_offset = (qp->umr_pi & (qp->umr_wqbbs - 1)) * MLX5_SEND_WQE_BB;
+	struct mlx5_wqe_cseg *cs = RTE_PTR_ADD(qp->umr_qp_obj.wqes, wqe_offset);
+	struct mlx5_wqe_qseg *qs = RTE_PTR_ADD(cs, sizeof(struct mlx5_wqe_cseg));
+
+	cs->opcode = rte_cpu_to_be_32(MLX5_OPCODE_SEND_EN | ((uint32_t)qp->umr_pi << 8));
+	cs->sq_ds = rte_cpu_to_be_32((qp->umr_qp_obj.qp->id << 8) | 2);
+	/*
+	 * No need to generate the SEND_EN CQE as we want only GGA CQE
+	 * in the CQ normally. We can compare qp->last_send_gga_pi with
+	 * qp->pi to know if all SEND_EN be consumed.
+	 */
+	cs->flags = RTE_BE32((MLX5_COMP_ONLY_FIRST_ERR << MLX5_COMP_MODE_OFFSET) |
+			MLX5_WQE_CTRL_INITIATOR_SMALL_FENCE);
+	cs->misc = RTE_BE32(0);
+	qs->max_index = rte_cpu_to_be_32(qp->pi);
+	qs->qpn_cqn = rte_cpu_to_be_32(qp->qp_obj.qp->id);
+	qp->umr_wqe = (uint8_t *)cs;
+	qp->umr_pi += 1;
+}
+
+static __rte_always_inline void
+mlx5_crypto_gcm_wqe_set(struct mlx5_crypto_qp *qp,
+			struct rte_crypto_op *op,
+			uint32_t idx,
+			struct mlx5_crypto_gcm_data *data)
+{
+	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+	struct mlx5_gga_wqe *wqe = &((struct mlx5_gga_wqe *)qp->qp_obj.wqes)[idx];
+	union mlx5_gga_crypto_opaque *opaq = qp->opaque_addr;
+
+	memcpy(opaq[idx].cp.iv,
+		rte_crypto_op_ctod_offset(op, uint8_t *, sess->iv_offset), sess->iv_len);
+	opaq[idx].cp.tag_size = sess->wqe_tag_len;
+	opaq[idx].cp.aad_size = sess->wqe_aad_len;
+	/* Update control seg. */
+	wqe->opcode = rte_cpu_to_be_32(MLX5_MMO_CRYPTO_OPC + (qp->pi << 8));
+	wqe->gga_ctrl1 = sess->mmo_ctrl;
+	wqe->gga_ctrl2 = sess->dek_id;
+	wqe->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR << MLX5_COMP_MODE_OFFSET);
+	/* Update op_info seg. */
+	wqe->gather.bcount = rte_cpu_to_be_32(data->src_bytes);
+	wqe->gather.lkey = data->src_mkey;
+	wqe->gather.pbuf = rte_cpu_to_be_64((uintptr_t)data->src_addr);
+	/* Update output seg. */
+	wqe->scatter.bcount = rte_cpu_to_be_32(data->dst_bytes);
+	wqe->scatter.lkey = data->dst_mkey;
+	wqe->scatter.pbuf = rte_cpu_to_be_64((uintptr_t)data->dst_addr);
+	qp->wqe = (uint8_t *)wqe;
+}
+
+static uint16_t
+mlx5_crypto_gcm_enqueue_burst(void *queue_pair,
+			      struct rte_crypto_op **ops,
+			      uint16_t nb_ops)
+{
+	struct mlx5_crypto_qp *qp = queue_pair;
+	struct mlx5_crypto_session *sess;
+	struct mlx5_crypto_priv *priv = qp->priv;
+	struct mlx5_crypto_gcm_tag_cpy_info *tag;
+	struct mlx5_crypto_gcm_data gcm_data;
+	struct rte_crypto_op *op;
+	struct mlx5_crypto_gcm_op_info op_info;
+	uint16_t mask = qp->entries_n - 1;
+	uint16_t remain = qp->entries_n - (qp->pi - qp->qp_ci);
+	uint32_t idx;
+	uint16_t umr_cnt = 0;
+
+	if (remain < nb_ops)
+		nb_ops = remain;
+	else
+		remain = nb_ops;
+	if (unlikely(remain == 0))
+		return 0;
+	do {
+		op = *ops++;
+		sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+		idx = qp->pi & mask;
+		mlx5_crypto_gcm_get_op_info(qp, op, &op_info);
+		if (!op_info.need_umr) {
+			gcm_data.src_addr = op_info.src_addr;
+			gcm_data.src_bytes = op->sym->aead.data.length + sess->aad_len;
+			gcm_data.src_mkey = mlx5_mr_mb2mr(&qp->mr_ctrl, op->sym->m_src);
+			if (op_info.is_oop) {
+				gcm_data.dst_addr = RTE_PTR_SUB
+					(rte_pktmbuf_mtod_offset(op->sym->m_dst,
+					 void *, op->sym->aead.data.offset), sess->aad_len);
+				gcm_data.dst_mkey = mlx5_mr_mb2mr(&qp->mr_ctrl, op->sym->m_dst);
+			} else {
+				gcm_data.dst_addr = gcm_data.src_addr;
+				gcm_data.dst_mkey = gcm_data.src_mkey;
+			}
+			gcm_data.dst_bytes = gcm_data.src_bytes;
+			if (op_info.is_enc)
+				gcm_data.dst_bytes += sess->tag_len;
+			else
+				gcm_data.src_bytes += sess->tag_len;
+		} else {
+			if (unlikely(mlx5_crypto_gcm_build_umr(qp, op, idx,
+							&op_info, &gcm_data))) {
+				qp->stats.enqueue_err_count++;
+				if (remain != nb_ops) {
+					qp->stats.enqueued_count -= remain;
+					break;
+				}
+				return 0;
+			}
+			umr_cnt++;
+		}
+		mlx5_crypto_gcm_wqe_set(qp, op, idx, &gcm_data);
+		if (op_info.digest) {
+			tag = (struct mlx5_crypto_gcm_tag_cpy_info *)op->sym->aead.digest.data;
+			tag->digest = op_info.digest;
+			tag->tag_len = sess->tag_len;
+			op->status = MLX5_CRYPTO_OP_STATUS_GCM_TAG_COPY;
+		} else {
+			op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
+		}
+		qp->ops[idx] = op;
+		qp->pi++;
+	} while (--remain);
+	qp->stats.enqueued_count += nb_ops;
+	/* Update the last GGA cseg with COMP. */
+	((struct mlx5_wqe_cseg *)qp->wqe)->flags =
+		RTE_BE32(MLX5_COMP_ALWAYS << MLX5_COMP_MODE_OFFSET);
+	/* Only when there are no pending SEND_EN WQEs in background. */
+	if (!umr_cnt && !qp->has_umr) {
+		mlx5_doorbell_ring(&priv->uar.bf_db, *(volatile uint64_t *)qp->wqe,
+				   qp->pi, &qp->qp_obj.db_rec[MLX5_SND_DBR],
+				   !priv->uar.dbnc);
+	} else {
+		mlx5_crypto_gcm_build_send_en(qp);
+		mlx5_doorbell_ring(&priv->uar.bf_db, *(volatile uint64_t *)qp->umr_wqe,
+				   qp->umr_pi, &qp->umr_qp_obj.db_rec[MLX5_SND_DBR],
+				   !priv->uar.dbnc);
+		qp->last_gga_pi = qp->pi;
+		qp->has_umr = true;
+	}
+	return nb_ops;
+}
+
+static __rte_noinline void
+mlx5_crypto_gcm_cqe_err_handle(struct mlx5_crypto_qp *qp, struct rte_crypto_op *op)
+{
+	uint8_t op_code;
+	const uint32_t idx = qp->cq_ci & (qp->entries_n - 1);
+	volatile struct mlx5_err_cqe *cqe = (volatile struct mlx5_err_cqe *)
+							&qp->cq_obj.cqes[idx];
+
+	op_code = rte_be_to_cpu_32(cqe->s_wqe_opcode_qpn) >> MLX5_CQ_INDEX_WIDTH;
+	DRV_LOG(ERR, "CQE ERR:0x%x, Vendor_ERR:0x%x, OP:0x%x, QPN:0x%x, WQE_CNT:0x%x",
+		cqe->syndrome, cqe->vendor_err_synd, op_code,
+		(rte_be_to_cpu_32(cqe->s_wqe_opcode_qpn) & 0xffffff),
+		rte_be_to_cpu_16(cqe->wqe_counter));
+	if (op && op_code == MLX5_OPCODE_MMO) {
+		op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+		qp->stats.dequeue_err_count++;
+	}
+}
+
+static __rte_always_inline void
+mlx5_crypto_gcm_fill_op(struct mlx5_crypto_qp *qp,
+			struct rte_crypto_op **ops,
+			uint16_t orci,
+			uint16_t rci,
+			uint16_t op_mask)
+{
+	uint16_t n;
+
+	orci &= op_mask;
+	rci &= op_mask;
+	if (unlikely(orci > rci)) {
+		n = op_mask - orci + 1;
+		memcpy(ops, &qp->ops[orci], n * sizeof(*ops));
+		orci = 0;
+	} else {
+		n = 0;
+	}
+	/* rci can be 0 here, memcpy will skip that. */
+	memcpy(&ops[n], &qp->ops[orci], (rci - orci) * sizeof(*ops));
+}
+
+static __rte_always_inline void
+mlx5_crypto_gcm_cpy_tag(struct mlx5_crypto_qp *qp,
+			uint16_t orci,
+			uint16_t rci,
+			uint16_t op_mask)
+{
+	struct rte_crypto_op *op;
+	struct mlx5_crypto_gcm_tag_cpy_info *tag;
+
+	while (qp->cpy_tag_op && orci != rci) {
+		op = qp->ops[orci & op_mask];
+		if (op->status == MLX5_CRYPTO_OP_STATUS_GCM_TAG_COPY) {
+			tag = (struct mlx5_crypto_gcm_tag_cpy_info *)op->sym->aead.digest.data;
+			memcpy(op->sym->aead.digest.data, tag->digest, tag->tag_len);
+			op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
+			qp->cpy_tag_op--;
+		}
+		orci++;
+	}
+}
+
+static uint16_t
+mlx5_crypto_gcm_dequeue_burst(void *queue_pair,
+			      struct rte_crypto_op **ops,
+			      uint16_t nb_ops)
+{
+	struct mlx5_crypto_qp *qp = queue_pair;
+	volatile struct mlx5_cqe *restrict cqe;
+	const unsigned int cq_size = qp->cq_entries_n;
+	const unsigned int mask = cq_size - 1;
+	const unsigned int op_mask = qp->entries_n - 1;
+	uint32_t idx;
+	uint32_t next_idx = qp->cq_ci & mask;
+	uint16_t reported_ci = qp->reported_ci;
+	uint16_t qp_ci = qp->qp_ci;
+	const uint16_t max = RTE_MIN((uint16_t)(qp->pi - reported_ci), nb_ops);
+	uint16_t op_num = 0;
+	int ret;
+
+	if (unlikely(max == 0))
+		return 0;
+	while (qp_ci - reported_ci < max) {
+		idx = next_idx;
+		next_idx = (qp->cq_ci + 1) & mask;
+		cqe = &qp->cq_obj.cqes[idx];
+		ret = check_cqe(cqe, cq_size, qp->cq_ci);
+		if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
+			if (unlikely(ret != MLX5_CQE_STATUS_HW_OWN))
+				mlx5_crypto_gcm_cqe_err_handle(qp,
+						qp->ops[reported_ci & op_mask]);
+			break;
+		}
+		qp_ci = rte_be_to_cpu_16(cqe->wqe_counter) + 1;
+		if (qp->has_umr &&
+		    (qp->last_gga_pi + 1) == qp_ci)
+			qp->has_umr = false;
+		qp->cq_ci++;
+	}
+	/* If wqe_counter changed, means CQE handled. */
+	if (likely(qp->qp_ci != qp_ci)) {
+		qp->qp_ci = qp_ci;
+		rte_io_wmb();
+		qp->cq_obj.db_rec[0] = rte_cpu_to_be_32(qp->cq_ci);
+	}
+	/* If reported_ci is not same with qp_ci, means op retrieved. */
+	if (qp_ci != reported_ci) {
+		op_num = RTE_MIN((uint16_t)(qp_ci - reported_ci), max);
+		reported_ci += op_num;
+		mlx5_crypto_gcm_cpy_tag(qp, qp->reported_ci, reported_ci, op_mask);
+		mlx5_crypto_gcm_fill_op(qp, ops, qp->reported_ci, reported_ci, op_mask);
+		qp->stats.dequeued_count += op_num;
+		qp->reported_ci = reported_ci;
+	}
+	return op_num;
+}
+
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 {
@@ -339,6 +925,8 @@ mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 	mlx5_os_set_reg_mr_cb(&priv->reg_mr_cb, &priv->dereg_mr_cb);
 	dev_ops->queue_pair_setup = mlx5_crypto_gcm_qp_setup;
 	dev_ops->queue_pair_release = mlx5_crypto_gcm_qp_release;
+	crypto_dev->dequeue_burst = mlx5_crypto_gcm_dequeue_burst;
+	crypto_dev->enqueue_burst = mlx5_crypto_gcm_enqueue_burst;
 	priv->max_klm_num = RTE_ALIGN((priv->max_segs_num + 1) * 2 + 1, MLX5_UMR_KLM_NUM_ALIGN);
 	priv->caps = mlx5_crypto_gcm_caps;
 	return 0;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v3 9/9] crypto/mlx5: enable AES-GCM capability
  2023-06-20  1:23 ` Suanming Mou
                     ` (7 preceding siblings ...)
  2023-06-20  1:23   ` [PATCH v3 8/9] crypto/mlx5: add enqueue and dequeue operations Suanming Mou
@ 2023-06-20  1:23   ` Suanming Mou
  2023-06-20  9:25     ` [EXT] " Akhil Goyal
  2023-06-20  9:55   ` [PATCH v2 0/9] crypto/mlx5: support AES-GCM Suanming Mou
  9 siblings, 1 reply; 54+ messages in thread
From: Suanming Mou @ 2023-06-20  1:23 UTC (permalink / raw)
  To: Matan Azrad; +Cc: rasland, dev, gakhil

This commit generates AES-GCM capability based on the NIC
attributes and enables AES-GCM algo.

An new devarg "algo" is added to identify if the crypto PMD will
be initialized as AES-GCM(algo=1) or AES-XTS(algo=0, default).

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
---
 doc/guides/cryptodevs/mlx5.rst         | 48 +++++++++++++++++++-
 doc/guides/rel_notes/release_23_07.rst |  1 +
 drivers/crypto/mlx5/mlx5_crypto.c      | 26 +++++++++--
 drivers/crypto/mlx5/mlx5_crypto.h      |  1 +
 drivers/crypto/mlx5/mlx5_crypto_gcm.c  | 63 ++++++++++++++++++++++++++
 5 files changed, 134 insertions(+), 5 deletions(-)

diff --git a/doc/guides/cryptodevs/mlx5.rst b/doc/guides/cryptodevs/mlx5.rst
index b35ac5f5f2..9a0ae8b0d2 100644
--- a/doc/guides/cryptodevs/mlx5.rst
+++ b/doc/guides/cryptodevs/mlx5.rst
@@ -21,6 +21,11 @@ and **NVIDIA BlueField-3** family adapters.
 Overview
 --------
 
+Nvidia MLX5 crypto driver supports AES-XTs and AES-GCM cryption.
+
+AES-XTS
+-------
+
 The device can provide disk encryption services,
 allowing data encryption and decryption towards a disk.
 Having all encryption/decryption operations done in a single device
@@ -38,13 +43,19 @@ The encryption does not require text to be aligned to the AES block size (128b).
 
 See :doc:`../../platform/mlx5` guide for more design details.
 
+AES-GCM
+-------
+The encryption and decryption processes the traffic as standard RTE crypto
+API defines. The supported AAD/digest/key size can be read from dev_info.
+
+
 Configuration
 -------------
 
 See the :ref:`mlx5 common configuration <mlx5_common_env>`.
 
 A device comes out of NVIDIA factory with pre-defined import methods.
-There are two possible import methods: wrapped or plaintext.
+There are two possible import methods: wrapped or plaintext(valid to AES-XTS only).
 
 In case the device is in wrapped mode, it needs to be moved to crypto operational mode.
 In order to move the device to crypto operational mode, credential and KEK
@@ -120,24 +131,36 @@ Driver options
 Please refer to :ref:`mlx5 common options <mlx5_common_driver_options>`
 for an additional list of options shared with other mlx5 drivers.
 
+- ``algo`` parameter [int]
+
+  - 0. AES-XTS crypto.
+
+  - 1. AES-GCM crypto.
+
+  Set to zero(AES-XTS) by default.
+
 - ``wcs_file`` parameter [string] - mandatory in wrapped mode
 
   File path including only the wrapped credential in string format of hexadecimal
   numbers, represent 48 bytes (8 bytes IV added by the AES key wrap algorithm).
+  This option is valid only to AES-XTS.
 
 - ``import_kek_id`` parameter [int]
 
   The identifier of the KEK, default value is 0 represents the operational
   register import_kek..
+  This option is valid only to AES-XTS.
 
 - ``credential_id`` parameter [int]
 
   The identifier of the credential, default value is 0 represents the operational
   register credential.
+  This option is valid only to AES-XTS.
 
 - ``keytag`` parameter [int]
 
   The plaintext of the keytag appended to the AES-XTS keys, default value is 0.
+  This option is valid only to AES-XTS.
 
 - ``max_segs_num`` parameter [int]
 
@@ -161,6 +184,8 @@ Limitations
 - The supported data-unit lengths are 512B and 4KB and 1MB. In case the `dataunit_len`
   is not provided in the cipher xform, the OP length is limited to the above
   values.
+- AES-GCM is only supported on BlueField-3.
+- AES-GCM only supported key import plaintext mode.
 
 
 Prerequisites
@@ -172,6 +197,7 @@ FW Prerequisites
 - xx.31.0328 for ConnectX-6.
 - xx.32.0108 for ConnectX-6 Dx and BlueField-2.
 - xx.36.xxxx for ConnectX-7 and BlueField-3.
+- xx.37.3010 for BlueField-3 and newer for AES-GCM.
 
 Linux Prerequisites
 ~~~~~~~~~~~~~~~~~~~
@@ -186,3 +212,23 @@ Windows Prerequisites
 
 - NVIDIA WINOF-2 version: **2.60** or higher.
   See :ref:`mlx5 common prerequisites <mlx5_windows_prerequisites>` for more details.
+
+
+Notes for rte_crypto AES-GCM
+----------------------------
+
+In AES-GCM mode, the HW requires continuous input and output of Additional
+Authenticated Data (AAD), payload, and digest (if needed). However, the RTE
+API only provides a single AAD input, which means that in the out-of-place
+mode, the AAD will be used in both input and output. This reuse of AAD in the
+out-of-place mode breaks the continuous output, which degrades the performance
+and introduces extra UMR WQE. If digest is not continuous after payload will
+also lead to that extra UMR WQE.
+
+To address this issue, current RTE API provides min_mbuf_headroom_req and
+min_mbuf_tailroom_req in rte_cryptodev_info as a hint to the PMD. It
+indicates the PMD can use the buffer before and after the mbuf payload as AAD
+and digest space. With this hint, the PMD will use the buffer before and
+after the mbuf payload directly via copying AAD and digest. However, the
+application must ensure that there is enough headroom and tailroom reserved
+for the mbuf. Or, for non-continuous operations, extra UMR WQE will be used.
diff --git a/doc/guides/rel_notes/release_23_07.rst b/doc/guides/rel_notes/release_23_07.rst
index 027ae7bd2d..bbb8eddbca 100644
--- a/doc/guides/rel_notes/release_23_07.rst
+++ b/doc/guides/rel_notes/release_23_07.rst
@@ -131,6 +131,7 @@ New Features
   * Added support for CQE compression on Windows.
   * Added support for enhanced multi-packet write on Windows.
   * Added support for quota flow action and item.
+  * Added support for AES-GCM crypto.
 
 * **Added vmxnet3 version 7 support.**
 
diff --git a/drivers/crypto/mlx5/mlx5_crypto.c b/drivers/crypto/mlx5/mlx5_crypto.c
index 4d7d3ef2a3..081e96ad4d 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -269,6 +269,14 @@ mlx5_crypto_args_check_handler(const char *key, const char *val, void *opaque)
 		attr->credential_pointer = (uint32_t)tmp;
 	} else if (strcmp(key, "keytag") == 0) {
 		devarg_prms->keytag = tmp;
+	} else if (strcmp(key, "algo") == 0) {
+		if (tmp == 1) {
+			devarg_prms->is_aes_gcm = 1;
+		} else if (tmp > 1) {
+			DRV_LOG(ERR, "Invalid algo.");
+			rte_errno = EINVAL;
+			return -rte_errno;
+		}
 	}
 	return 0;
 }
@@ -285,6 +293,7 @@ mlx5_crypto_parse_devargs(struct mlx5_kvargs_ctrl *mkvlist,
 		"keytag",
 		"max_segs_num",
 		"wcs_file",
+		"algo",
 		NULL,
 	};
 
@@ -370,10 +379,19 @@ mlx5_crypto_dev_probe(struct mlx5_common_device *cdev,
 	priv->crypto_dev = crypto_dev;
 	priv->is_wrapped_mode = wrapped_mode;
 	priv->max_segs_num = devarg_prms.max_segs_num;
-	ret = mlx5_crypto_xts_init(priv);
-	if (ret) {
-		DRV_LOG(ERR, "Failed to init AES-XTS crypto.");
-		return -ENOTSUP;
+	/* Init and override AES-GCM configuration. */
+	if (devarg_prms.is_aes_gcm) {
+		ret = mlx5_crypto_gcm_init(priv);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to init AES-GCM crypto.");
+			return -ENOTSUP;
+		}
+	} else {
+		ret = mlx5_crypto_xts_init(priv);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to init AES-XTS crypto.");
+			return -ENOTSUP;
+		}
 	}
 	if (mlx5_devx_uar_prepare(cdev, &priv->uar) != 0) {
 		rte_cryptodev_pmd_destroy(priv->crypto_dev);
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index 6dcb41b27c..36dacdcda4 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -92,6 +92,7 @@ struct mlx5_crypto_devarg_params {
 	struct mlx5_devx_crypto_login_attr login_attr;
 	uint64_t keytag;
 	uint32_t max_segs_num;
+	uint32_t is_aes_gcm:1;
 };
 
 struct mlx5_crypto_session {
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index 8389c03c91..e26a338365 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -109,6 +109,60 @@ mlx5_crypto_dek_fill_gcm_attr(struct mlx5_crypto_dek *dek,
 	return 0;
 }
 
+static int
+mlx5_crypto_generate_gcm_cap(struct mlx5_hca_crypto_mmo_attr *mmo_attr,
+			     struct rte_cryptodev_capabilities *cap)
+{
+	/* Init key size. */
+	if (mmo_attr->gcm_128_encrypt && mmo_attr->gcm_128_decrypt &&
+		mmo_attr->gcm_256_encrypt && mmo_attr->gcm_256_decrypt) {
+		cap->sym.aead.key_size.min = 16;
+		cap->sym.aead.key_size.max = 32;
+		cap->sym.aead.key_size.increment = 16;
+	} else if (mmo_attr->gcm_256_encrypt && mmo_attr->gcm_256_decrypt) {
+		cap->sym.aead.key_size.min = 32;
+		cap->sym.aead.key_size.max = 32;
+		cap->sym.aead.key_size.increment = 0;
+	} else if (mmo_attr->gcm_128_encrypt && mmo_attr->gcm_128_decrypt) {
+		cap->sym.aead.key_size.min = 16;
+		cap->sym.aead.key_size.max = 16;
+		cap->sym.aead.key_size.increment = 0;
+	} else {
+		DRV_LOG(ERR, "No available AES-GCM encryption/decryption supported.");
+		return -1;
+	}
+	/* Init tag size. */
+	if (mmo_attr->gcm_auth_tag_128 && mmo_attr->gcm_auth_tag_96) {
+		cap->sym.aead.digest_size.min = 12;
+		cap->sym.aead.digest_size.max = 16;
+		cap->sym.aead.digest_size.increment = 4;
+	} else if (mmo_attr->gcm_auth_tag_96) {
+		cap->sym.aead.digest_size.min = 12;
+		cap->sym.aead.digest_size.max = 12;
+		cap->sym.aead.digest_size.increment = 0;
+	} else if (mmo_attr->gcm_auth_tag_128) {
+		cap->sym.aead.digest_size.min = 16;
+		cap->sym.aead.digest_size.max = 16;
+		cap->sym.aead.digest_size.increment = 0;
+	} else {
+		DRV_LOG(ERR, "No available AES-GCM tag size supported.");
+		return -1;
+	}
+	/* Init AAD size. */
+	cap->sym.aead.aad_size.min = 0;
+	cap->sym.aead.aad_size.max = UINT16_MAX;
+	cap->sym.aead.aad_size.increment = 1;
+	/* Init IV size. */
+	cap->sym.aead.iv_size.min = 12;
+	cap->sym.aead.iv_size.max = 12;
+	cap->sym.aead.iv_size.increment = 0;
+	/* Init left items. */
+	cap->op = RTE_CRYPTO_OP_TYPE_SYMMETRIC;
+	cap->sym.xform_type = RTE_CRYPTO_SYM_XFORM_AEAD;
+	cap->sym.aead.algo = RTE_CRYPTO_AEAD_AES_GCM;
+	return 0;
+}
+
 static int
 mlx5_crypto_sym_gcm_session_configure(struct rte_cryptodev *dev,
 				  struct rte_crypto_sym_xform *xform,
@@ -917,8 +971,10 @@ mlx5_crypto_gcm_dequeue_burst(void *queue_pair,
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 {
+	struct mlx5_common_device *cdev = priv->cdev;
 	struct rte_cryptodev *crypto_dev = priv->crypto_dev;
 	struct rte_cryptodev_ops *dev_ops = crypto_dev->dev_ops;
+	int ret;
 
 	/* Override AES-GCM specified ops. */
 	dev_ops->sym_session_configure = mlx5_crypto_sym_gcm_session_configure;
@@ -928,6 +984,13 @@ mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 	crypto_dev->dequeue_burst = mlx5_crypto_gcm_dequeue_burst;
 	crypto_dev->enqueue_burst = mlx5_crypto_gcm_enqueue_burst;
 	priv->max_klm_num = RTE_ALIGN((priv->max_segs_num + 1) * 2 + 1, MLX5_UMR_KLM_NUM_ALIGN);
+	/* Generate GCM capability. */
+	ret = mlx5_crypto_generate_gcm_cap(&cdev->config.hca_attr.crypto_mmo,
+					   mlx5_crypto_gcm_caps);
+	if (ret) {
+		DRV_LOG(ERR, "No enough AES-GCM cap.");
+		return -1;
+	}
 	priv->caps = mlx5_crypto_gcm_caps;
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [EXT] [PATCH v3 9/9] crypto/mlx5: enable AES-GCM capability
  2023-06-20  1:23   ` [PATCH v3 9/9] crypto/mlx5: enable AES-GCM capability Suanming Mou
@ 2023-06-20  9:25     ` Akhil Goyal
  2023-06-20  9:42       ` Suanming Mou
  0 siblings, 1 reply; 54+ messages in thread
From: Akhil Goyal @ 2023-06-20  9:25 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad; +Cc: rasland, dev

> This commit generates AES-GCM capability based on the NIC
> attributes and enables AES-GCM algo.
> 
> An new devarg "algo" is added to identify if the crypto PMD will
> be initialized as AES-GCM(algo=1) or AES-XTS(algo=0, default).
> 
> Signed-off-by: Suanming Mou <suanmingm@nvidia.com>

You should mention changelog for the changes done in the current patchset over previous one.
It helps in review.

Also update to "doc/guides/cryptodevs/features/mlx5.ini" is missing in this patch.

Also get an ack from mlx5 crypto maintainer.


> ---
>  doc/guides/cryptodevs/mlx5.rst         | 48 +++++++++++++++++++-
>  doc/guides/rel_notes/release_23_07.rst |  1 +
>  drivers/crypto/mlx5/mlx5_crypto.c      | 26 +++++++++--
>  drivers/crypto/mlx5/mlx5_crypto.h      |  1 +
>  drivers/crypto/mlx5/mlx5_crypto_gcm.c  | 63 ++++++++++++++++++++++++++
>  5 files changed, 134 insertions(+), 5 deletions(-)


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [EXT] [PATCH v3 9/9] crypto/mlx5: enable AES-GCM capability
  2023-06-20  9:25     ` [EXT] " Akhil Goyal
@ 2023-06-20  9:42       ` Suanming Mou
  2023-06-20  9:48         ` Akhil Goyal
  0 siblings, 1 reply; 54+ messages in thread
From: Suanming Mou @ 2023-06-20  9:42 UTC (permalink / raw)
  To: Akhil Goyal, Matan Azrad; +Cc: Raslan Darawsheh, dev



> -----Original Message-----
> From: Akhil Goyal <gakhil@marvell.com>
> Sent: Tuesday, June 20, 2023 5:25 PM
> To: Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>
> Cc: Raslan Darawsheh <rasland@nvidia.com>; dev@dpdk.org
> Subject: RE: [EXT] [PATCH v3 9/9] crypto/mlx5: enable AES-GCM capability
> 
> > This commit generates AES-GCM capability based on the NIC attributes
> > and enables AES-GCM algo.
> >
> > An new devarg "algo" is added to identify if the crypto PMD will be
> > initialized as AES-GCM(algo=1) or AES-XTS(algo=0, default).
> >
> > Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
> 
> You should mention changelog for the changes done in the current patchset over
> previous one.
> It helps in review.

V1 is RFC, V1 compared to V2 was xts and gcm file split. V3 just did a minor fix in aes-gcm 256 key creation.
I put the change log in the cover-letter, but thanks for the suggestion, will update the change log in single patch as well.

> 
> Also update to "doc/guides/cryptodevs/features/mlx5.ini" is missing in this patch.

You are right, it is missing here.

> 
> Also get an ack from mlx5 crypto maintainer.
> 
> 
> > ---
> >  doc/guides/cryptodevs/mlx5.rst         | 48 +++++++++++++++++++-
> >  doc/guides/rel_notes/release_23_07.rst |  1 +
> >  drivers/crypto/mlx5/mlx5_crypto.c      | 26 +++++++++--
> >  drivers/crypto/mlx5/mlx5_crypto.h      |  1 +
> >  drivers/crypto/mlx5/mlx5_crypto_gcm.c  | 63
> > ++++++++++++++++++++++++++
> >  5 files changed, 134 insertions(+), 5 deletions(-)


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [EXT] [PATCH v3 9/9] crypto/mlx5: enable AES-GCM capability
  2023-06-20  9:42       ` Suanming Mou
@ 2023-06-20  9:48         ` Akhil Goyal
  2023-06-20  9:56           ` Suanming Mou
  0 siblings, 1 reply; 54+ messages in thread
From: Akhil Goyal @ 2023-06-20  9:48 UTC (permalink / raw)
  To: Suanming Mou, Matan Azrad; +Cc: Raslan Darawsheh, dev

> > You should mention changelog for the changes done in the current patchset
> over
> > previous one.
> > It helps in review.
> 
> V1 is RFC, V1 compared to V2 was xts and gcm file split. V3 just did a minor fix in
> aes-gcm 256 key creation.
> I put the change log in the cover-letter, but thanks for the suggestion, will
> update the change log in single patch as well.
> 
I do not see the cover-letter for v3 in mailing list.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v2 0/9] crypto/mlx5: support AES-GCM
  2023-06-20  1:23 ` Suanming Mou
                     ` (8 preceding siblings ...)
  2023-06-20  1:23   ` [PATCH v3 9/9] crypto/mlx5: enable AES-GCM capability Suanming Mou
@ 2023-06-20  9:55   ` Suanming Mou
  2023-06-20  9:58     ` Akhil Goyal
  9 siblings, 1 reply; 54+ messages in thread
From: Suanming Mou @ 2023-06-20  9:55 UTC (permalink / raw)
  To: gakhil; +Cc: Raslan Darawsheh, dev

Hi Akhil,

Maybe due to "To" is empty, it was not collected correctly to the ML. But it was in my inbox, and you were cced.

Thanks,
Suanming

> -----Original Message-----
> From: Suanming Mou <suanmingm@nvidia.com>
> Sent: Tuesday, June 20, 2023 9:23 AM
> Cc: Raslan Darawsheh <rasland@nvidia.com>; dev@dpdk.org;
> gakhil@marvell.com
> Subject: [PATCH v2 0/9] crypto/mlx5: support AES-GCM
> 
> AES-GCM provides both authenticated encryption and the ability to check the
> integrity and authentication of additional authenticated data (AAD) that is sent in
> the clear.
> 
> The crypto operations are performed with crypto WQE. If the input buffers(AAD,
> mbuf, digest) are not contiguous and there is no enough headroom or tailroom for
> AAD or digest, as the requirement from FW, an UMR WQE is needed to generate
> contiguous address space for crypto WQE.
> The UMR WQE and crypto WQE are handled in two different QPs.
> 
> The QP for UMR operation contains two types of WQE, UMR and SEND_EN WQE.
> The WQEs are built dynamically according to the crypto operation buffer address.
> Crypto operation with non-contiguous buffers will
> have its own UMR WQE, while the operation with contiguous buffers
> doesn't need the UMR WQE. Once the all the operations WQE in the enqueue
> burst built finishes, if any UMR WQEs are built, additional SEND_EN WQE will be
> as the final WQE of the burst in the UMR QP.
> The purpose of that SEND_EN WQE is to trigger the crypto QP processing with the
> UMR ready input memory address space buffers.
> 
> The QP for crypto operations contains only the crypto WQE and the QP WQEs are
> built as fixed in QP setup. The QP processing is triggered by doorbell ring or the
> SEND_EN WQE from UMR QP.
> 
> v2:
>   - split XTS and GCM code to different file.
>   - add headroom and tailroom optimize.
> 
> v3:
>  - fix AES-GCM 128b key creation.
> 
> Suanming Mou (9):
>   common/mlx5: export memory region lookup by address
>   crypto/mlx5: split AES-XTS
>   crypto/mlx5: add AES-GCM query and initialization
>   crypto/mlx5: add AES-GCM encryption key
>   crypto/mlx5: add AES-GCM session configure
>   common/mlx5: add WQE-based QP synchronous basics
>   crypto/mlx5: add queue pair setup for GCM
>   crypto/mlx5: add enqueue and dequeue operations
>   crypto/mlx5: enable AES-GCM capability
> 
>  doc/guides/cryptodevs/mlx5.rst         |  48 +-
>  doc/guides/rel_notes/release_23_07.rst |   1 +
>  drivers/common/mlx5/mlx5_common_mr.c   |   2 +-
>  drivers/common/mlx5/mlx5_common_mr.h   |   5 +
>  drivers/common/mlx5/mlx5_devx_cmds.c   |  21 +
>  drivers/common/mlx5/mlx5_devx_cmds.h   |  16 +
>  drivers/common/mlx5/mlx5_prm.h         |  65 +-
>  drivers/common/mlx5/version.map        |   3 +
>  drivers/crypto/mlx5/meson.build        |   2 +
>  drivers/crypto/mlx5/mlx5_crypto.c      | 673 ++---------------
>  drivers/crypto/mlx5/mlx5_crypto.h      | 101 ++-
>  drivers/crypto/mlx5/mlx5_crypto_dek.c  | 102 ++-
> drivers/crypto/mlx5/mlx5_crypto_gcm.c  | 997 +++++++++++++++++++++++++
> drivers/crypto/mlx5/mlx5_crypto_xts.c  | 645 ++++++++++++++++
>  14 files changed, 2016 insertions(+), 665 deletions(-)  create mode 100644
> drivers/crypto/mlx5/mlx5_crypto_gcm.c
>  create mode 100644 drivers/crypto/mlx5/mlx5_crypto_xts.c
> 
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [EXT] [PATCH v3 9/9] crypto/mlx5: enable AES-GCM capability
  2023-06-20  9:48         ` Akhil Goyal
@ 2023-06-20  9:56           ` Suanming Mou
  0 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20  9:56 UTC (permalink / raw)
  To: Akhil Goyal, Matan Azrad; +Cc: Raslan Darawsheh, dev



> -----Original Message-----
> From: Akhil Goyal <gakhil@marvell.com>
> Sent: Tuesday, June 20, 2023 5:48 PM
> To: Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>
> Cc: Raslan Darawsheh <rasland@nvidia.com>; dev@dpdk.org
> Subject: RE: [EXT] [PATCH v3 9/9] crypto/mlx5: enable AES-GCM capability
> 
> > > You should mention changelog for the changes done in the current
> > > patchset
> > over
> > > previous one.
> > > It helps in review.
> >
> > V1 is RFC, V1 compared to V2 was xts and gcm file split. V3 just did a
> > minor fix in aes-gcm 256 key creation.
> > I put the change log in the cover-letter, but thanks for the
> > suggestion, will update the change log in single patch as well.
> >
> I do not see the cover-letter for v3 in mailing list.

Please see my reply in the v3 cover-letter.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v2 0/9] crypto/mlx5: support AES-GCM
  2023-06-20  9:55   ` [PATCH v2 0/9] crypto/mlx5: support AES-GCM Suanming Mou
@ 2023-06-20  9:58     ` Akhil Goyal
  2023-06-20 10:03       ` Suanming Mou
  0 siblings, 1 reply; 54+ messages in thread
From: Akhil Goyal @ 2023-06-20  9:58 UTC (permalink / raw)
  To: Suanming Mou; +Cc: Raslan Darawsheh, dev

Hi Suanming,
> Hi Akhil,
> 
> Maybe due to "To" is empty, it was not collected correctly to the ML. But it was
> in my inbox, and you were cced.
> 
This is a v2 cover-letter as per the title.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v2 0/9] crypto/mlx5: support AES-GCM
  2023-06-20  9:58     ` Akhil Goyal
@ 2023-06-20 10:03       ` Suanming Mou
  2023-06-20 13:52         ` Matan Azrad
  0 siblings, 1 reply; 54+ messages in thread
From: Suanming Mou @ 2023-06-20 10:03 UTC (permalink / raw)
  To: Akhil Goyal; +Cc: Raslan Darawsheh, dev



> -----Original Message-----
> From: Akhil Goyal <gakhil@marvell.com>
> Sent: Tuesday, June 20, 2023 5:59 PM
> To: Suanming Mou <suanmingm@nvidia.com>
> Cc: Raslan Darawsheh <rasland@nvidia.com>; dev@dpdk.org
> Subject: RE: [PATCH v2 0/9] crypto/mlx5: support AES-GCM
> 
> Hi Suanming,
> > Hi Akhil,
> >
> > Maybe due to "To" is empty, it was not collected correctly to the ML.
> > But it was in my inbox, and you were cced.
> >
> This is a v2 cover-letter as per the title.

Sorry, v2 is typo here, it is v3 in fact. So I understand why the cover-letter was "missing".

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v2 0/9] crypto/mlx5: support AES-GCM
  2023-06-20 10:03       ` Suanming Mou
@ 2023-06-20 13:52         ` Matan Azrad
  0 siblings, 0 replies; 54+ messages in thread
From: Matan Azrad @ 2023-06-20 13:52 UTC (permalink / raw)
  To: Suanming Mou, Akhil Goyal; +Cc: Raslan Darawsheh, dev



From: Suanming Mou
> > -----Original Message-----
> > From: Akhil Goyal <gakhil@marvell.com>
> > Sent: Tuesday, June 20, 2023 5:59 PM
> > To: Suanming Mou <suanmingm@nvidia.com>
> > Cc: Raslan Darawsheh <rasland@nvidia.com>; dev@dpdk.org
> > Subject: RE: [PATCH v2 0/9] crypto/mlx5: support AES-GCM
> >
> > Hi Suanming,
> > > Hi Akhil,
> > >
> > > Maybe due to "To" is empty, it was not collected correctly to the ML.
> > > But it was in my inbox, and you were cced.
> > >
> > This is a v2 cover-letter as per the title.
> 
> Sorry, v2 is typo here, it is v3 in fact. So I understand why the cover-letter was
> "missing".

For v3 series:
Series-acked-by: Matan Azrad <matan@nvidia.com>



^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v4 0/9] crypto/mlx5: support AES-GCM
  2023-04-18  9:23 [RFC PATCH 0/5] crypto/mlx5: support AES-GCM Suanming Mou
                   ` (6 preceding siblings ...)
  2023-06-20  1:23 ` Suanming Mou
@ 2023-06-20 14:11 ` Suanming Mou
  2023-06-20 14:11   ` [PATCH v4 1/9] common/mlx5: export memory region lookup by address Suanming Mou
                     ` (9 more replies)
  7 siblings, 10 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20 14:11 UTC (permalink / raw)
  To: gakhil; +Cc: rasland, dev

AES-GCM provides both authenticated encryption and the ability to check
the integrity and authentication of additional authenticated data (AAD)
that is sent in the clear.

The crypto operations are performed with crypto WQE. If the input
buffers(AAD, mbuf, digest) are not contiguous and there is no enough
headroom or tailroom for AAD or digest, as the requirement from FW, an
UMR WQE is needed to generate contiguous address space for crypto WQE.
The UMR WQE and crypto WQE are handled in two different QPs.

The QP for UMR operation contains two types of WQE, UMR and SEND_EN
WQE. The WQEs are built dynamically according to the crypto operation 
buffer address. Crypto operation with non-contiguous buffers will
have its own UMR WQE, while the operation with contiguous buffers   
doesn't need the UMR WQE. Once the all the operations WQE in the
enqueue burst built finishes, if any UMR WQEs are built, additional
SEND_EN WQE will be as the final WQE of the burst in the UMR QP.
The purpose of that SEND_EN WQE is to trigger the crypto QP processing
with the UMR ready input memory address space buffers.

The QP for crypto operations contains only the crypto WQE and the QP
WQEs are built as fixed in QP setup. The QP processing is triggered
by doorbell ring or the SEND_EN WQE from UMR QP.

v2:
  - split XTS and GCM code to different file.
  - add headroom and tailroom optimize.

v3:
 - fix AES-GCM 128b key creation.

v4:
 - add missing feature cap in mlx5.ini 

Suanming Mou (9):
  common/mlx5: export memory region lookup by address
  crypto/mlx5: split AES-XTS
  crypto/mlx5: add AES-GCM query and initialization
  crypto/mlx5: add AES-GCM encryption key
  crypto/mlx5: add AES-GCM session configure
  common/mlx5: add WQE-based QP synchronous basics
  crypto/mlx5: add queue pair setup for GCM
  crypto/mlx5: add enqueue and dequeue operations
  crypto/mlx5: enable AES-GCM capability

 doc/guides/cryptodevs/features/mlx5.ini |   2 +
 doc/guides/cryptodevs/mlx5.rst          |  48 +-
 doc/guides/rel_notes/release_23_07.rst  |   1 +
 drivers/common/mlx5/mlx5_common_mr.c    |   2 +-
 drivers/common/mlx5/mlx5_common_mr.h    |   5 +
 drivers/common/mlx5/mlx5_devx_cmds.c    |  21 +
 drivers/common/mlx5/mlx5_devx_cmds.h    |  16 +
 drivers/common/mlx5/mlx5_prm.h          |  65 +-
 drivers/common/mlx5/version.map         |   3 +
 drivers/crypto/mlx5/meson.build         |   2 +
 drivers/crypto/mlx5/mlx5_crypto.c       | 673 ++--------------
 drivers/crypto/mlx5/mlx5_crypto.h       | 101 ++-
 drivers/crypto/mlx5/mlx5_crypto_dek.c   | 102 ++-
 drivers/crypto/mlx5/mlx5_crypto_gcm.c   | 997 ++++++++++++++++++++++++
 drivers/crypto/mlx5/mlx5_crypto_xts.c   | 645 +++++++++++++++
 15 files changed, 2018 insertions(+), 665 deletions(-)
 create mode 100644 drivers/crypto/mlx5/mlx5_crypto_gcm.c
 create mode 100644 drivers/crypto/mlx5/mlx5_crypto_xts.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v4 1/9] common/mlx5: export memory region lookup by address
  2023-06-20 14:11 ` [PATCH v4 " Suanming Mou
@ 2023-06-20 14:11   ` Suanming Mou
  2023-06-20 14:11   ` [PATCH v4 2/9] crypto/mlx5: split AES-XTS Suanming Mou
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20 14:11 UTC (permalink / raw)
  To: gakhil, Matan Azrad, Viacheslav Ovsiienko, Ori Kam; +Cc: rasland, dev

In case user provides the address without mempool. Export the
function to lookup the address without mempool is required.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/common/mlx5/mlx5_common_mr.c | 2 +-
 drivers/common/mlx5/mlx5_common_mr.h | 4 ++++
 drivers/common/mlx5/version.map      | 1 +
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/common/mlx5/mlx5_common_mr.c b/drivers/common/mlx5/mlx5_common_mr.c
index 7b14b0c7bf..40ff9153bd 100644
--- a/drivers/common/mlx5/mlx5_common_mr.c
+++ b/drivers/common/mlx5/mlx5_common_mr.c
@@ -1059,7 +1059,7 @@ mr_lookup_caches(struct mlx5_mr_ctrl *mr_ctrl,
  * @return
  *   Searched LKey on success, UINT32_MAX on no match.
  */
-static uint32_t
+uint32_t
 mlx5_mr_addr2mr_bh(struct mlx5_mr_ctrl *mr_ctrl, uintptr_t addr)
 {
 	uint32_t lkey;
diff --git a/drivers/common/mlx5/mlx5_common_mr.h b/drivers/common/mlx5/mlx5_common_mr.h
index 12def1585f..66623868a2 100644
--- a/drivers/common/mlx5/mlx5_common_mr.h
+++ b/drivers/common/mlx5/mlx5_common_mr.h
@@ -240,6 +240,10 @@ mlx5_mr_create(struct mlx5_common_device *cdev,
 	       struct mlx5_mr_share_cache *share_cache,
 	       struct mr_cache_entry *entry, uintptr_t addr);
 
+__rte_internal
+uint32_t
+mlx5_mr_addr2mr_bh(struct mlx5_mr_ctrl *mr_ctrl, uintptr_t addr);
+
 /* mlx5_common_verbs.c */
 
 __rte_internal
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index e05e1aa8c5..f860b069de 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -122,6 +122,7 @@ INTERNAL {
 	mlx5_mr_ctrl_init;
 	mlx5_mr_flush_local_cache;
 	mlx5_mr_mb2mr_bh;
+	mlx5_mr_addr2mr_bh;
 
 	mlx5_nl_allmulti; # WINDOWS_NO_EXPORT
 	mlx5_nl_ifindex; # WINDOWS_NO_EXPORT
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v4 2/9] crypto/mlx5: split AES-XTS
  2023-06-20 14:11 ` [PATCH v4 " Suanming Mou
  2023-06-20 14:11   ` [PATCH v4 1/9] common/mlx5: export memory region lookup by address Suanming Mou
@ 2023-06-20 14:11   ` Suanming Mou
  2023-06-20 14:11   ` [PATCH v4 3/9] crypto/mlx5: add AES-GCM query and initialization Suanming Mou
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20 14:11 UTC (permalink / raw)
  To: gakhil, Matan Azrad; +Cc: rasland, dev

As there will be other crypto algo be supported. This commit splits
AES-XTS code to another *_xts.c file. The mlx5_crypto.c file will
just contain the common code.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/crypto/mlx5/meson.build       |   1 +
 drivers/crypto/mlx5/mlx5_crypto.c     | 642 ++------------------------
 drivers/crypto/mlx5/mlx5_crypto.h     |  33 ++
 drivers/crypto/mlx5/mlx5_crypto_xts.c | 594 ++++++++++++++++++++++++
 4 files changed, 667 insertions(+), 603 deletions(-)
 create mode 100644 drivers/crypto/mlx5/mlx5_crypto_xts.c

diff --git a/drivers/crypto/mlx5/meson.build b/drivers/crypto/mlx5/meson.build
index a2691ec0f0..045e8ce81d 100644
--- a/drivers/crypto/mlx5/meson.build
+++ b/drivers/crypto/mlx5/meson.build
@@ -15,6 +15,7 @@ endif
 
 sources = files(
         'mlx5_crypto.c',
+	'mlx5_crypto_xts.c',
         'mlx5_crypto_dek.c',
 )
 
diff --git a/drivers/crypto/mlx5/mlx5_crypto.c b/drivers/crypto/mlx5/mlx5_crypto.c
index 5267f48c1e..2e6bcc6ddc 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -40,33 +40,6 @@ int mlx5_crypto_logtype;
 
 uint8_t mlx5_crypto_driver_id;
 
-const struct rte_cryptodev_capabilities mlx5_crypto_caps[] = {
-	{		/* AES XTS */
-		.op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
-		{.sym = {
-			.xform_type = RTE_CRYPTO_SYM_XFORM_CIPHER,
-			{.cipher = {
-				.algo = RTE_CRYPTO_CIPHER_AES_XTS,
-				.block_size = 16,
-				.key_size = {
-					.min = 32,
-					.max = 64,
-					.increment = 32
-				},
-				.iv_size = {
-					.min = 16,
-					.max = 16,
-					.increment = 0
-				},
-				.dataunit_set =
-				RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_512_BYTES |
-				RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_4096_BYTES |
-				RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_1_MEGABYTES,
-			}, }
-		}, }
-	},
-};
-
 static const char mlx5_crypto_drv_name[] = RTE_STR(MLX5_CRYPTO_DRIVER_NAME);
 
 static const struct rte_driver mlx5_drv = {
@@ -76,21 +49,6 @@ static const struct rte_driver mlx5_drv = {
 
 static struct cryptodev_driver mlx5_cryptodev_driver;
 
-struct mlx5_crypto_session {
-	uint32_t bs_bpt_eo_es;
-	/**< bsf_size, bsf_p_type, encryption_order and encryption standard,
-	 * saved in big endian format.
-	 */
-	uint32_t bsp_res;
-	/**< crypto_block_size_pointer and reserved 24 bits saved in big
-	 * endian format.
-	 */
-	uint32_t iv_offset:16;
-	/**< Starting point for Initialisation Vector. */
-	struct mlx5_crypto_dek *dek; /**< Pointer to dek struct. */
-	uint32_t dek_id; /**< DEK ID */
-} __rte_packed;
-
 static void
 mlx5_crypto_dev_infos_get(struct rte_cryptodev *dev,
 			  struct rte_cryptodev_info *dev_info)
@@ -102,7 +60,7 @@ mlx5_crypto_dev_infos_get(struct rte_cryptodev *dev,
 		dev_info->driver_id = mlx5_crypto_driver_id;
 		dev_info->feature_flags =
 			MLX5_CRYPTO_FEATURE_FLAGS(priv->is_wrapped_mode);
-		dev_info->capabilities = mlx5_crypto_caps;
+		dev_info->capabilities = priv->caps;
 		dev_info->max_nb_queue_pairs = MLX5_CRYPTO_MAX_QPS;
 		dev_info->min_mbuf_headroom_req = 0;
 		dev_info->min_mbuf_tailroom_req = 0;
@@ -114,6 +72,38 @@ mlx5_crypto_dev_infos_get(struct rte_cryptodev *dev,
 	}
 }
 
+void
+mlx5_crypto_indirect_mkeys_release(struct mlx5_crypto_qp *qp,
+				   uint16_t n)
+{
+	uint32_t i;
+
+	for (i = 0; i < n; i++)
+		if (qp->mkey[i])
+			claim_zero(mlx5_devx_cmd_destroy(qp->mkey[i]));
+}
+
+int
+mlx5_crypto_indirect_mkeys_prepare(struct mlx5_crypto_priv *priv,
+				   struct mlx5_crypto_qp *qp,
+				   struct mlx5_devx_mkey_attr *attr,
+				   mlx5_crypto_mkey_update_t update_cb)
+{
+	uint32_t i;
+
+	for (i = 0; i < qp->entries_n; i++) {
+		attr->klm_array = update_cb(priv, qp, i);
+		qp->mkey[i] = mlx5_devx_cmd_mkey_create(priv->cdev->ctx, attr);
+		if (!qp->mkey[i])
+			goto error;
+	}
+	return 0;
+error:
+	DRV_LOG(ERR, "Failed to allocate indirect mkey.");
+	mlx5_crypto_indirect_mkeys_release(qp, i);
+	return -1;
+}
+
 static int
 mlx5_crypto_dev_configure(struct rte_cryptodev *dev,
 			  struct rte_cryptodev_config *config)
@@ -168,72 +158,6 @@ mlx5_crypto_sym_session_get_size(struct rte_cryptodev *dev __rte_unused)
 	return sizeof(struct mlx5_crypto_session);
 }
 
-static int
-mlx5_crypto_sym_session_configure(struct rte_cryptodev *dev,
-				  struct rte_crypto_sym_xform *xform,
-				  struct rte_cryptodev_sym_session *session)
-{
-	struct mlx5_crypto_priv *priv = dev->data->dev_private;
-	struct mlx5_crypto_session *sess_private_data =
-		CRYPTODEV_GET_SYM_SESS_PRIV(session);
-	struct rte_crypto_cipher_xform *cipher;
-	uint8_t encryption_order;
-
-	if (unlikely(xform->next != NULL)) {
-		DRV_LOG(ERR, "Xform next is not supported.");
-		return -ENOTSUP;
-	}
-	if (unlikely((xform->type != RTE_CRYPTO_SYM_XFORM_CIPHER) ||
-		     (xform->cipher.algo != RTE_CRYPTO_CIPHER_AES_XTS))) {
-		DRV_LOG(ERR, "Only AES-XTS algorithm is supported.");
-		return -ENOTSUP;
-	}
-	cipher = &xform->cipher;
-	sess_private_data->dek = mlx5_crypto_dek_prepare(priv, cipher);
-	if (sess_private_data->dek == NULL) {
-		DRV_LOG(ERR, "Failed to prepare dek.");
-		return -ENOMEM;
-	}
-	if (cipher->op == RTE_CRYPTO_CIPHER_OP_ENCRYPT)
-		encryption_order = MLX5_ENCRYPTION_ORDER_ENCRYPTED_RAW_MEMORY;
-	else
-		encryption_order = MLX5_ENCRYPTION_ORDER_ENCRYPTED_RAW_WIRE;
-	sess_private_data->bs_bpt_eo_es = rte_cpu_to_be_32
-			(MLX5_BSF_SIZE_64B << MLX5_BSF_SIZE_OFFSET |
-			 MLX5_BSF_P_TYPE_CRYPTO << MLX5_BSF_P_TYPE_OFFSET |
-			 encryption_order << MLX5_ENCRYPTION_ORDER_OFFSET |
-			 MLX5_ENCRYPTION_STANDARD_AES_XTS);
-	switch (xform->cipher.dataunit_len) {
-	case 0:
-		sess_private_data->bsp_res = 0;
-		break;
-	case 512:
-		sess_private_data->bsp_res = rte_cpu_to_be_32
-					     ((uint32_t)MLX5_BLOCK_SIZE_512B <<
-					     MLX5_BLOCK_SIZE_OFFSET);
-		break;
-	case 4096:
-		sess_private_data->bsp_res = rte_cpu_to_be_32
-					     ((uint32_t)MLX5_BLOCK_SIZE_4096B <<
-					     MLX5_BLOCK_SIZE_OFFSET);
-		break;
-	case 1048576:
-		sess_private_data->bsp_res = rte_cpu_to_be_32
-					     ((uint32_t)MLX5_BLOCK_SIZE_1MB <<
-					     MLX5_BLOCK_SIZE_OFFSET);
-		break;
-	default:
-		DRV_LOG(ERR, "Cipher data unit length is not supported.");
-		return -ENOTSUP;
-	}
-	sess_private_data->iv_offset = cipher->iv.offset;
-	sess_private_data->dek_id =
-			rte_cpu_to_be_32(sess_private_data->dek->obj->id &
-					 0xffffff);
-	DRV_LOG(DEBUG, "Session %p was configured.", sess_private_data);
-	return 0;
-}
-
 static void
 mlx5_crypto_sym_session_clear(struct rte_cryptodev *dev,
 			      struct rte_cryptodev_sym_session *sess)
@@ -249,412 +173,6 @@ mlx5_crypto_sym_session_clear(struct rte_cryptodev *dev,
 	DRV_LOG(DEBUG, "Session %p was cleared.", spriv);
 }
 
-static void
-mlx5_crypto_indirect_mkeys_release(struct mlx5_crypto_qp *qp, uint16_t n)
-{
-	uint16_t i;
-
-	for (i = 0; i < n; i++)
-		if (qp->mkey[i])
-			claim_zero(mlx5_devx_cmd_destroy(qp->mkey[i]));
-}
-
-static void
-mlx5_crypto_qp_release(struct mlx5_crypto_qp *qp)
-{
-	if (qp == NULL)
-		return;
-	mlx5_devx_qp_destroy(&qp->qp_obj);
-	mlx5_mr_btree_free(&qp->mr_ctrl.cache_bh);
-	mlx5_devx_cq_destroy(&qp->cq_obj);
-	rte_free(qp);
-}
-
-static int
-mlx5_crypto_queue_pair_release(struct rte_cryptodev *dev, uint16_t qp_id)
-{
-	struct mlx5_crypto_qp *qp = dev->data->queue_pairs[qp_id];
-
-	mlx5_crypto_indirect_mkeys_release(qp, qp->entries_n);
-	mlx5_crypto_qp_release(qp);
-	dev->data->queue_pairs[qp_id] = NULL;
-	return 0;
-}
-
-static __rte_noinline uint32_t
-mlx5_crypto_get_block_size(struct rte_crypto_op *op)
-{
-	uint32_t bl = op->sym->cipher.data.length;
-
-	switch (bl) {
-	case (1 << 20):
-		return RTE_BE32(MLX5_BLOCK_SIZE_1MB << MLX5_BLOCK_SIZE_OFFSET);
-	case (1 << 12):
-		return RTE_BE32(MLX5_BLOCK_SIZE_4096B <<
-				MLX5_BLOCK_SIZE_OFFSET);
-	case (1 << 9):
-		return RTE_BE32(MLX5_BLOCK_SIZE_512B << MLX5_BLOCK_SIZE_OFFSET);
-	default:
-		DRV_LOG(ERR, "Unknown block size: %u.", bl);
-		return UINT32_MAX;
-	}
-}
-
-static __rte_always_inline uint32_t
-mlx5_crypto_klm_set(struct mlx5_crypto_qp *qp, struct rte_mbuf *mbuf,
-		    struct mlx5_wqe_dseg *klm, uint32_t offset,
-		    uint32_t *remain)
-{
-	uint32_t data_len = (rte_pktmbuf_data_len(mbuf) - offset);
-	uintptr_t addr = rte_pktmbuf_mtod_offset(mbuf, uintptr_t, offset);
-
-	if (data_len > *remain)
-		data_len = *remain;
-	*remain -= data_len;
-	klm->bcount = rte_cpu_to_be_32(data_len);
-	klm->pbuf = rte_cpu_to_be_64(addr);
-	klm->lkey = mlx5_mr_mb2mr(&qp->mr_ctrl, mbuf);
-	return klm->lkey;
-
-}
-
-static __rte_always_inline uint32_t
-mlx5_crypto_klms_set(struct mlx5_crypto_qp *qp, struct rte_crypto_op *op,
-		     struct rte_mbuf *mbuf, struct mlx5_wqe_dseg *klm)
-{
-	uint32_t remain_len = op->sym->cipher.data.length;
-	uint32_t nb_segs = mbuf->nb_segs;
-	uint32_t klm_n = 1u;
-
-	/* First mbuf needs to take the cipher offset. */
-	if (unlikely(mlx5_crypto_klm_set(qp, mbuf, klm,
-		     op->sym->cipher.data.offset, &remain_len) == UINT32_MAX)) {
-		op->status = RTE_CRYPTO_OP_STATUS_ERROR;
-		return 0;
-	}
-	while (remain_len) {
-		nb_segs--;
-		mbuf = mbuf->next;
-		if (unlikely(mbuf == NULL || nb_segs == 0)) {
-			op->status = RTE_CRYPTO_OP_STATUS_INVALID_ARGS;
-			return 0;
-		}
-		if (unlikely(mlx5_crypto_klm_set(qp, mbuf, ++klm, 0,
-						 &remain_len) == UINT32_MAX)) {
-			op->status = RTE_CRYPTO_OP_STATUS_ERROR;
-			return 0;
-		}
-		klm_n++;
-	}
-	return klm_n;
-}
-
-static __rte_always_inline int
-mlx5_crypto_wqe_set(struct mlx5_crypto_priv *priv,
-			 struct mlx5_crypto_qp *qp,
-			 struct rte_crypto_op *op,
-			 struct mlx5_umr_wqe *umr)
-{
-	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
-	struct mlx5_wqe_cseg *cseg = &umr->ctr;
-	struct mlx5_wqe_mkey_cseg *mkc = &umr->mkc;
-	struct mlx5_wqe_dseg *klms = &umr->kseg[0];
-	struct mlx5_wqe_umr_bsf_seg *bsf = ((struct mlx5_wqe_umr_bsf_seg *)
-				      RTE_PTR_ADD(umr, priv->umr_wqe_size)) - 1;
-	uint32_t ds;
-	bool ipl = op->sym->m_dst == NULL || op->sym->m_dst == op->sym->m_src;
-	/* Set UMR WQE. */
-	uint32_t klm_n = mlx5_crypto_klms_set(qp, op,
-				   ipl ? op->sym->m_src : op->sym->m_dst, klms);
-
-	if (unlikely(klm_n == 0))
-		return 0;
-	bsf->bs_bpt_eo_es = sess->bs_bpt_eo_es;
-	if (unlikely(!sess->bsp_res)) {
-		bsf->bsp_res = mlx5_crypto_get_block_size(op);
-		if (unlikely(bsf->bsp_res == UINT32_MAX)) {
-			op->status = RTE_CRYPTO_OP_STATUS_INVALID_ARGS;
-			return 0;
-		}
-	} else {
-		bsf->bsp_res = sess->bsp_res;
-	}
-	bsf->raw_data_size = rte_cpu_to_be_32(op->sym->cipher.data.length);
-	memcpy(bsf->xts_initial_tweak,
-	       rte_crypto_op_ctod_offset(op, uint8_t *, sess->iv_offset), 16);
-	bsf->res_dp = sess->dek_id;
-	mkc->len = rte_cpu_to_be_64(op->sym->cipher.data.length);
-	cseg->opcode = rte_cpu_to_be_32((qp->db_pi << 8) | MLX5_OPCODE_UMR);
-	qp->db_pi += priv->umr_wqe_stride;
-	/* Set RDMA_WRITE WQE. */
-	cseg = RTE_PTR_ADD(cseg, priv->umr_wqe_size);
-	klms = RTE_PTR_ADD(cseg, sizeof(struct mlx5_rdma_write_wqe));
-	if (!ipl) {
-		klm_n = mlx5_crypto_klms_set(qp, op, op->sym->m_src, klms);
-		if (unlikely(klm_n == 0))
-			return 0;
-	} else {
-		memcpy(klms, &umr->kseg[0], sizeof(*klms) * klm_n);
-	}
-	ds = 2 + klm_n;
-	cseg->sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) | ds);
-	cseg->opcode = rte_cpu_to_be_32((qp->db_pi << 8) |
-							MLX5_OPCODE_RDMA_WRITE);
-	ds = RTE_ALIGN(ds, 4);
-	qp->db_pi += ds >> 2;
-	/* Set NOP WQE if needed. */
-	if (priv->max_rdmar_ds > ds) {
-		cseg += ds;
-		ds = priv->max_rdmar_ds - ds;
-		cseg->sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) | ds);
-		cseg->opcode = rte_cpu_to_be_32((qp->db_pi << 8) |
-							       MLX5_OPCODE_NOP);
-		qp->db_pi += ds >> 2; /* Here, DS is 4 aligned for sure. */
-	}
-	qp->wqe = (uint8_t *)cseg;
-	return 1;
-}
-
-static uint16_t
-mlx5_crypto_enqueue_burst(void *queue_pair, struct rte_crypto_op **ops,
-			  uint16_t nb_ops)
-{
-	struct mlx5_crypto_qp *qp = queue_pair;
-	struct mlx5_crypto_priv *priv = qp->priv;
-	struct mlx5_umr_wqe *umr;
-	struct rte_crypto_op *op;
-	uint16_t mask = qp->entries_n - 1;
-	uint16_t remain = qp->entries_n - (qp->pi - qp->ci);
-	uint32_t idx;
-
-	if (remain < nb_ops)
-		nb_ops = remain;
-	else
-		remain = nb_ops;
-	if (unlikely(remain == 0))
-		return 0;
-	do {
-		idx = qp->pi & mask;
-		op = *ops++;
-		umr = RTE_PTR_ADD(qp->qp_obj.umem_buf,
-			priv->wqe_set_size * idx);
-		if (unlikely(mlx5_crypto_wqe_set(priv, qp, op, umr) == 0)) {
-			qp->stats.enqueue_err_count++;
-			if (remain != nb_ops) {
-				qp->stats.enqueued_count -= remain;
-				break;
-			}
-			return 0;
-		}
-		qp->ops[idx] = op;
-		qp->pi++;
-	} while (--remain);
-	qp->stats.enqueued_count += nb_ops;
-	mlx5_doorbell_ring(&priv->uar.bf_db, *(volatile uint64_t *)qp->wqe,
-			   qp->db_pi, &qp->qp_obj.db_rec[MLX5_SND_DBR],
-			   !priv->uar.dbnc);
-	return nb_ops;
-}
-
-static __rte_noinline void
-mlx5_crypto_cqe_err_handle(struct mlx5_crypto_qp *qp, struct rte_crypto_op *op)
-{
-	const uint32_t idx = qp->ci & (qp->entries_n - 1);
-	volatile struct mlx5_err_cqe *cqe = (volatile struct mlx5_err_cqe *)
-							&qp->cq_obj.cqes[idx];
-
-	op->status = RTE_CRYPTO_OP_STATUS_ERROR;
-	qp->stats.dequeue_err_count++;
-	DRV_LOG(ERR, "CQE ERR:%x.\n", rte_be_to_cpu_32(cqe->syndrome));
-}
-
-static uint16_t
-mlx5_crypto_dequeue_burst(void *queue_pair, struct rte_crypto_op **ops,
-			  uint16_t nb_ops)
-{
-	struct mlx5_crypto_qp *qp = queue_pair;
-	volatile struct mlx5_cqe *restrict cqe;
-	struct rte_crypto_op *restrict op;
-	const unsigned int cq_size = qp->entries_n;
-	const unsigned int mask = cq_size - 1;
-	uint32_t idx;
-	uint32_t next_idx = qp->ci & mask;
-	const uint16_t max = RTE_MIN((uint16_t)(qp->pi - qp->ci), nb_ops);
-	uint16_t i = 0;
-	int ret;
-
-	if (unlikely(max == 0))
-		return 0;
-	do {
-		idx = next_idx;
-		next_idx = (qp->ci + 1) & mask;
-		op = qp->ops[idx];
-		cqe = &qp->cq_obj.cqes[idx];
-		ret = check_cqe(cqe, cq_size, qp->ci);
-		rte_io_rmb();
-		if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
-			if (unlikely(ret != MLX5_CQE_STATUS_HW_OWN))
-				mlx5_crypto_cqe_err_handle(qp, op);
-			break;
-		}
-		op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
-		ops[i++] = op;
-		qp->ci++;
-	} while (i < max);
-	if (likely(i != 0)) {
-		rte_io_wmb();
-		qp->cq_obj.db_rec[0] = rte_cpu_to_be_32(qp->ci);
-		qp->stats.dequeued_count += i;
-	}
-	return i;
-}
-
-static void
-mlx5_crypto_qp_init(struct mlx5_crypto_priv *priv, struct mlx5_crypto_qp *qp)
-{
-	uint32_t i;
-
-	for (i = 0 ; i < qp->entries_n; i++) {
-		struct mlx5_wqe_cseg *cseg = RTE_PTR_ADD(qp->qp_obj.umem_buf,
-			i * priv->wqe_set_size);
-		struct mlx5_wqe_umr_cseg *ucseg = (struct mlx5_wqe_umr_cseg *)
-								     (cseg + 1);
-		struct mlx5_wqe_umr_bsf_seg *bsf =
-			(struct mlx5_wqe_umr_bsf_seg *)(RTE_PTR_ADD(cseg,
-						       priv->umr_wqe_size)) - 1;
-		struct mlx5_wqe_rseg *rseg;
-
-		/* Init UMR WQE. */
-		cseg->sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) |
-					 (priv->umr_wqe_size / MLX5_WSEG_SIZE));
-		cseg->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
-				       MLX5_COMP_MODE_OFFSET);
-		cseg->misc = rte_cpu_to_be_32(qp->mkey[i]->id);
-		ucseg->if_cf_toe_cq_res = RTE_BE32(1u << MLX5_UMRC_IF_OFFSET);
-		ucseg->mkey_mask = RTE_BE64(1u << 0); /* Mkey length bit. */
-		ucseg->ko_to_bs = rte_cpu_to_be_32
-			((MLX5_CRYPTO_KLM_SEGS_NUM(priv->umr_wqe_size) <<
-			 MLX5_UMRC_KO_OFFSET) | (4 << MLX5_UMRC_TO_BS_OFFSET));
-		bsf->keytag = priv->keytag;
-		/* Init RDMA WRITE WQE. */
-		cseg = RTE_PTR_ADD(cseg, priv->umr_wqe_size);
-		cseg->flags = RTE_BE32((MLX5_COMP_ALWAYS <<
-				      MLX5_COMP_MODE_OFFSET) |
-				      MLX5_WQE_CTRL_INITIATOR_SMALL_FENCE);
-		rseg = (struct mlx5_wqe_rseg *)(cseg + 1);
-		rseg->rkey = rte_cpu_to_be_32(qp->mkey[i]->id);
-	}
-}
-
-static int
-mlx5_crypto_indirect_mkeys_prepare(struct mlx5_crypto_priv *priv,
-				  struct mlx5_crypto_qp *qp)
-{
-	struct mlx5_umr_wqe *umr;
-	uint32_t i;
-	struct mlx5_devx_mkey_attr attr = {
-		.pd = priv->cdev->pdn,
-		.umr_en = 1,
-		.crypto_en = 1,
-		.set_remote_rw = 1,
-		.klm_num = MLX5_CRYPTO_KLM_SEGS_NUM(priv->umr_wqe_size),
-	};
-
-	for (umr = (struct mlx5_umr_wqe *)qp->qp_obj.umem_buf, i = 0;
-	   i < qp->entries_n; i++, umr = RTE_PTR_ADD(umr, priv->wqe_set_size)) {
-		attr.klm_array = (struct mlx5_klm *)&umr->kseg[0];
-		qp->mkey[i] = mlx5_devx_cmd_mkey_create(priv->cdev->ctx, &attr);
-		if (!qp->mkey[i])
-			goto error;
-	}
-	return 0;
-error:
-	DRV_LOG(ERR, "Failed to allocate indirect mkey.");
-	mlx5_crypto_indirect_mkeys_release(qp, i);
-	return -1;
-}
-
-static int
-mlx5_crypto_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
-			     const struct rte_cryptodev_qp_conf *qp_conf,
-			     int socket_id)
-{
-	struct mlx5_crypto_priv *priv = dev->data->dev_private;
-	struct mlx5_devx_qp_attr attr = {0};
-	struct mlx5_crypto_qp *qp;
-	uint16_t log_nb_desc = rte_log2_u32(qp_conf->nb_descriptors);
-	uint32_t ret;
-	uint32_t alloc_size = sizeof(*qp);
-	uint32_t log_wqbb_n;
-	struct mlx5_devx_cq_attr cq_attr = {
-		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
-	};
-
-	if (dev->data->queue_pairs[qp_id] != NULL)
-		mlx5_crypto_queue_pair_release(dev, qp_id);
-	alloc_size = RTE_ALIGN(alloc_size, RTE_CACHE_LINE_SIZE);
-	alloc_size += (sizeof(struct rte_crypto_op *) +
-		       sizeof(struct mlx5_devx_obj *)) *
-		       RTE_BIT32(log_nb_desc);
-	qp = rte_zmalloc_socket(__func__, alloc_size, RTE_CACHE_LINE_SIZE,
-				socket_id);
-	if (qp == NULL) {
-		DRV_LOG(ERR, "Failed to allocate QP memory.");
-		rte_errno = ENOMEM;
-		return -rte_errno;
-	}
-	if (mlx5_devx_cq_create(priv->cdev->ctx, &qp->cq_obj, log_nb_desc,
-				&cq_attr, socket_id) != 0) {
-		DRV_LOG(ERR, "Failed to create CQ.");
-		goto error;
-	}
-	log_wqbb_n = rte_log2_u32(RTE_BIT32(log_nb_desc) *
-				(priv->wqe_set_size / MLX5_SEND_WQE_BB));
-	attr.pd = priv->cdev->pdn;
-	attr.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar.obj);
-	attr.cqn = qp->cq_obj.cq->id;
-	attr.num_of_receive_wqes = 0;
-	attr.num_of_send_wqbbs = RTE_BIT32(log_wqbb_n);
-	attr.ts_format =
-		mlx5_ts_format_conv(priv->cdev->config.hca_attr.qp_ts_format);
-	ret = mlx5_devx_qp_create(priv->cdev->ctx, &qp->qp_obj,
-					attr.num_of_send_wqbbs * MLX5_WQE_SIZE,
-					&attr, socket_id);
-	if (ret) {
-		DRV_LOG(ERR, "Failed to create QP.");
-		goto error;
-	}
-	if (mlx5_mr_ctrl_init(&qp->mr_ctrl, &priv->cdev->mr_scache.dev_gen,
-			      priv->dev_config.socket_id) != 0) {
-		DRV_LOG(ERR, "Cannot allocate MR Btree for qp %u.",
-			(uint32_t)qp_id);
-		rte_errno = ENOMEM;
-		goto error;
-	}
-	/*
-	 * In Order to configure self loopback, when calling devx qp2rts the
-	 * remote QP id that is used is the id of the same QP.
-	 */
-	if (mlx5_devx_qp2rts(&qp->qp_obj, qp->qp_obj.qp->id))
-		goto error;
-	qp->mkey = (struct mlx5_devx_obj **)RTE_ALIGN((uintptr_t)(qp + 1),
-							   RTE_CACHE_LINE_SIZE);
-	qp->ops = (struct rte_crypto_op **)(qp->mkey + RTE_BIT32(log_nb_desc));
-	qp->entries_n = 1 << log_nb_desc;
-	if (mlx5_crypto_indirect_mkeys_prepare(priv, qp)) {
-		DRV_LOG(ERR, "Cannot allocate indirect memory regions.");
-		rte_errno = ENOMEM;
-		goto error;
-	}
-	mlx5_crypto_qp_init(priv, qp);
-	qp->priv = priv;
-	dev->data->queue_pairs[qp_id] = qp;
-	return 0;
-error:
-	mlx5_crypto_qp_release(qp);
-	return -1;
-}
-
 static void
 mlx5_crypto_stats_get(struct rte_cryptodev *dev,
 		      struct rte_cryptodev_stats *stats)
@@ -691,10 +209,7 @@ static struct rte_cryptodev_ops mlx5_crypto_ops = {
 	.dev_infos_get			= mlx5_crypto_dev_infos_get,
 	.stats_get			= mlx5_crypto_stats_get,
 	.stats_reset			= mlx5_crypto_stats_reset,
-	.queue_pair_setup		= mlx5_crypto_queue_pair_setup,
-	.queue_pair_release		= mlx5_crypto_queue_pair_release,
 	.sym_session_get_size		= mlx5_crypto_sym_session_get_size,
-	.sym_session_configure		= mlx5_crypto_sym_session_configure,
 	.sym_session_clear		= mlx5_crypto_sym_session_clear,
 	.sym_get_raw_dp_ctx_size	= NULL,
 	.sym_configure_raw_dp_ctx	= NULL,
@@ -796,81 +311,6 @@ mlx5_crypto_parse_devargs(struct mlx5_kvargs_ctrl *mkvlist,
 	return 0;
 }
 
-/*
- * Calculate UMR WQE size and RDMA Write WQE size with the
- * following limitations:
- *	- Each WQE size is multiple of 64.
- *	- The summarize of both UMR WQE and RDMA_W WQE is a power of 2.
- *	- The number of entries in the UMR WQE's KLM list is multiple of 4.
- */
-static void
-mlx5_crypto_get_wqe_sizes(uint32_t segs_num, uint32_t *umr_size,
-			uint32_t *rdmaw_size)
-{
-	uint32_t diff, wqe_set_size;
-
-	*umr_size = MLX5_CRYPTO_UMR_WQE_STATIC_SIZE +
-			RTE_ALIGN(segs_num, 4) *
-			sizeof(struct mlx5_wqe_dseg);
-	/* Make sure UMR WQE size is multiple of WQBB. */
-	*umr_size = RTE_ALIGN(*umr_size, MLX5_SEND_WQE_BB);
-	*rdmaw_size = sizeof(struct mlx5_rdma_write_wqe) +
-			sizeof(struct mlx5_wqe_dseg) *
-			(segs_num <= 2 ? 2 : 2 +
-			RTE_ALIGN(segs_num - 2, 4));
-	/* Make sure RDMA_WRITE WQE size is multiple of WQBB. */
-	*rdmaw_size = RTE_ALIGN(*rdmaw_size, MLX5_SEND_WQE_BB);
-	wqe_set_size = *rdmaw_size + *umr_size;
-	diff = rte_align32pow2(wqe_set_size) - wqe_set_size;
-	/* Make sure wqe_set size is power of 2. */
-	if (diff)
-		*umr_size += diff;
-}
-
-static uint8_t
-mlx5_crypto_max_segs_num(uint16_t max_wqe_size)
-{
-	int klms_sizes = max_wqe_size - MLX5_CRYPTO_UMR_WQE_STATIC_SIZE;
-	uint32_t max_segs_cap = RTE_ALIGN_FLOOR(klms_sizes, MLX5_SEND_WQE_BB) /
-			sizeof(struct mlx5_wqe_dseg);
-
-	MLX5_ASSERT(klms_sizes >= MLX5_SEND_WQE_BB);
-	while (max_segs_cap) {
-		uint32_t umr_wqe_size, rdmw_wqe_size;
-
-		mlx5_crypto_get_wqe_sizes(max_segs_cap, &umr_wqe_size,
-						&rdmw_wqe_size);
-		if (umr_wqe_size <= max_wqe_size &&
-				rdmw_wqe_size <= max_wqe_size)
-			break;
-		max_segs_cap -= 4;
-	}
-	return max_segs_cap;
-}
-
-static int
-mlx5_crypto_configure_wqe_size(struct mlx5_crypto_priv *priv,
-				uint16_t max_wqe_size, uint32_t max_segs_num)
-{
-	uint32_t rdmw_wqe_size, umr_wqe_size;
-
-	mlx5_crypto_get_wqe_sizes(max_segs_num, &umr_wqe_size,
-					&rdmw_wqe_size);
-	priv->wqe_set_size = rdmw_wqe_size + umr_wqe_size;
-	if (umr_wqe_size > max_wqe_size ||
-				rdmw_wqe_size > max_wqe_size) {
-		DRV_LOG(ERR, "Invalid max_segs_num: %u. should be %u or lower.",
-			max_segs_num,
-			mlx5_crypto_max_segs_num(max_wqe_size));
-		rte_errno = EINVAL;
-		return -EINVAL;
-	}
-	priv->umr_wqe_size = (uint16_t)umr_wqe_size;
-	priv->umr_wqe_stride = priv->umr_wqe_size / MLX5_SEND_WQE_BB;
-	priv->max_rdmar_ds = rdmw_wqe_size / sizeof(struct mlx5_wqe_dseg);
-	return 0;
-}
-
 static int
 mlx5_crypto_dev_probe(struct mlx5_common_device *cdev,
 		      struct mlx5_kvargs_ctrl *mkvlist)
@@ -916,14 +356,18 @@ mlx5_crypto_dev_probe(struct mlx5_common_device *cdev,
 	DRV_LOG(INFO,
 		"Crypto device %s was created successfully.", ibdev_name);
 	crypto_dev->dev_ops = &mlx5_crypto_ops;
-	crypto_dev->dequeue_burst = mlx5_crypto_dequeue_burst;
-	crypto_dev->enqueue_burst = mlx5_crypto_enqueue_burst;
 	crypto_dev->feature_flags = MLX5_CRYPTO_FEATURE_FLAGS(wrapped_mode);
 	crypto_dev->driver_id = mlx5_crypto_driver_id;
 	priv = crypto_dev->data->dev_private;
 	priv->cdev = cdev;
 	priv->crypto_dev = crypto_dev;
 	priv->is_wrapped_mode = wrapped_mode;
+	priv->max_segs_num = devarg_prms.max_segs_num;
+	ret = mlx5_crypto_xts_init(priv);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to init AES-XTS crypto.");
+		return -ENOTSUP;
+	}
 	if (mlx5_devx_uar_prepare(cdev, &priv->uar) != 0) {
 		rte_cryptodev_pmd_destroy(priv->crypto_dev);
 		return -1;
@@ -939,14 +383,6 @@ mlx5_crypto_dev_probe(struct mlx5_common_device *cdev,
 		}
 		priv->login_obj = login;
 	}
-	ret = mlx5_crypto_configure_wqe_size(priv,
-		cdev->config.hca_attr.max_wqe_sz_sq, devarg_prms.max_segs_num);
-	if (ret) {
-		claim_zero(mlx5_devx_cmd_destroy(priv->login_obj));
-		mlx5_devx_uar_release(&priv->uar);
-		rte_cryptodev_pmd_destroy(priv->crypto_dev);
-		return -1;
-	}
 	priv->keytag = rte_cpu_to_be_64(devarg_prms.keytag);
 	DRV_LOG(INFO, "Max number of segments: %u.",
 		(unsigned int)RTE_MIN(
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index a2771b3dab..05d8fe97fe 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -31,6 +31,7 @@ struct mlx5_crypto_priv {
 	struct mlx5_uar uar; /* User Access Region. */
 	uint32_t max_segs_num; /* Maximum supported data segs. */
 	struct mlx5_hlist *dek_hlist; /* Dek hash list. */
+	const struct rte_cryptodev_capabilities *caps;
 	struct rte_cryptodev_config dev_config;
 	struct mlx5_devx_obj *login_obj;
 	uint64_t keytag;
@@ -70,6 +71,35 @@ struct mlx5_crypto_devarg_params {
 	uint32_t max_segs_num;
 };
 
+struct mlx5_crypto_session {
+	uint32_t bs_bpt_eo_es;
+	/**< bsf_size, bsf_p_type, encryption_order and encryption standard,
+	 * saved in big endian format.
+	 */
+	uint32_t bsp_res;
+	/**< crypto_block_size_pointer and reserved 24 bits saved in big
+	 * endian format.
+	 */
+	uint32_t iv_offset:16;
+	/**< Starting point for Initialisation Vector. */
+	struct mlx5_crypto_dek *dek; /**< Pointer to dek struct. */
+	uint32_t dek_id; /**< DEK ID */
+} __rte_packed;
+
+typedef void *(*mlx5_crypto_mkey_update_t)(struct mlx5_crypto_priv *priv,
+					   struct mlx5_crypto_qp *qp,
+					   uint32_t idx);
+
+void
+mlx5_crypto_indirect_mkeys_release(struct mlx5_crypto_qp *qp,
+				   uint16_t n);
+
+int
+mlx5_crypto_indirect_mkeys_prepare(struct mlx5_crypto_priv *priv,
+				   struct mlx5_crypto_qp *qp,
+				   struct mlx5_devx_mkey_attr *attr,
+				   mlx5_crypto_mkey_update_t update_cb);
+
 int
 mlx5_crypto_dek_destroy(struct mlx5_crypto_priv *priv,
 			struct mlx5_crypto_dek *dek);
@@ -84,4 +114,7 @@ mlx5_crypto_dek_setup(struct mlx5_crypto_priv *priv);
 void
 mlx5_crypto_dek_unset(struct mlx5_crypto_priv *priv);
 
+int
+mlx5_crypto_xts_init(struct mlx5_crypto_priv *priv);
+
 #endif /* MLX5_CRYPTO_H_ */
diff --git a/drivers/crypto/mlx5/mlx5_crypto_xts.c b/drivers/crypto/mlx5/mlx5_crypto_xts.c
new file mode 100644
index 0000000000..964d02e6ed
--- /dev/null
+++ b/drivers/crypto/mlx5/mlx5_crypto_xts.c
@@ -0,0 +1,594 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 NVIDIA Corporation & Affiliates
+ */
+
+#include <rte_malloc.h>
+#include <rte_mempool.h>
+#include <rte_eal_paging.h>
+#include <rte_errno.h>
+#include <rte_log.h>
+#include <bus_pci_driver.h>
+#include <rte_memory.h>
+
+#include <mlx5_glue.h>
+#include <mlx5_common.h>
+#include <mlx5_devx_cmds.h>
+#include <mlx5_common_os.h>
+
+#include "mlx5_crypto_utils.h"
+#include "mlx5_crypto.h"
+
+const struct rte_cryptodev_capabilities mlx5_crypto_caps[] = {
+	{		/* AES XTS */
+		.op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+		{.sym = {
+			.xform_type = RTE_CRYPTO_SYM_XFORM_CIPHER,
+			{.cipher = {
+				.algo = RTE_CRYPTO_CIPHER_AES_XTS,
+				.block_size = 16,
+				.key_size = {
+					.min = 32,
+					.max = 64,
+					.increment = 32
+				},
+				.iv_size = {
+					.min = 16,
+					.max = 16,
+					.increment = 0
+				},
+				.dataunit_set =
+				RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_512_BYTES |
+				RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_4096_BYTES |
+				RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_1_MEGABYTES,
+			}, }
+		}, }
+	},
+};
+
+static int
+mlx5_crypto_xts_sym_session_configure(struct rte_cryptodev *dev,
+				      struct rte_crypto_sym_xform *xform,
+				      struct rte_cryptodev_sym_session *session)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_crypto_session *sess_private_data =
+		CRYPTODEV_GET_SYM_SESS_PRIV(session);
+	struct rte_crypto_cipher_xform *cipher;
+	uint8_t encryption_order;
+
+	if (unlikely(xform->next != NULL)) {
+		DRV_LOG(ERR, "Xform next is not supported.");
+		return -ENOTSUP;
+	}
+	if (unlikely((xform->type != RTE_CRYPTO_SYM_XFORM_CIPHER) ||
+		     (xform->cipher.algo != RTE_CRYPTO_CIPHER_AES_XTS))) {
+		DRV_LOG(ERR, "Only AES-XTS algorithm is supported.");
+		return -ENOTSUP;
+	}
+	cipher = &xform->cipher;
+	sess_private_data->dek = mlx5_crypto_dek_prepare(priv, cipher);
+	if (sess_private_data->dek == NULL) {
+		DRV_LOG(ERR, "Failed to prepare dek.");
+		return -ENOMEM;
+	}
+	if (cipher->op == RTE_CRYPTO_CIPHER_OP_ENCRYPT)
+		encryption_order = MLX5_ENCRYPTION_ORDER_ENCRYPTED_RAW_MEMORY;
+	else
+		encryption_order = MLX5_ENCRYPTION_ORDER_ENCRYPTED_RAW_WIRE;
+	sess_private_data->bs_bpt_eo_es = rte_cpu_to_be_32
+			(MLX5_BSF_SIZE_64B << MLX5_BSF_SIZE_OFFSET |
+			 MLX5_BSF_P_TYPE_CRYPTO << MLX5_BSF_P_TYPE_OFFSET |
+			 encryption_order << MLX5_ENCRYPTION_ORDER_OFFSET |
+			 MLX5_ENCRYPTION_STANDARD_AES_XTS);
+	switch (xform->cipher.dataunit_len) {
+	case 0:
+		sess_private_data->bsp_res = 0;
+		break;
+	case 512:
+		sess_private_data->bsp_res = rte_cpu_to_be_32
+					     ((uint32_t)MLX5_BLOCK_SIZE_512B <<
+					     MLX5_BLOCK_SIZE_OFFSET);
+		break;
+	case 4096:
+		sess_private_data->bsp_res = rte_cpu_to_be_32
+					     ((uint32_t)MLX5_BLOCK_SIZE_4096B <<
+					     MLX5_BLOCK_SIZE_OFFSET);
+		break;
+	case 1048576:
+		sess_private_data->bsp_res = rte_cpu_to_be_32
+					     ((uint32_t)MLX5_BLOCK_SIZE_1MB <<
+					     MLX5_BLOCK_SIZE_OFFSET);
+		break;
+	default:
+		DRV_LOG(ERR, "Cipher data unit length is not supported.");
+		return -ENOTSUP;
+	}
+	sess_private_data->iv_offset = cipher->iv.offset;
+	sess_private_data->dek_id =
+			rte_cpu_to_be_32(sess_private_data->dek->obj->id &
+					 0xffffff);
+	DRV_LOG(DEBUG, "Session %p was configured.", sess_private_data);
+	return 0;
+}
+
+static void
+mlx5_crypto_xts_qp_release(struct mlx5_crypto_qp *qp)
+{
+	if (qp == NULL)
+		return;
+	mlx5_devx_qp_destroy(&qp->qp_obj);
+	mlx5_mr_btree_free(&qp->mr_ctrl.cache_bh);
+	mlx5_devx_cq_destroy(&qp->cq_obj);
+	rte_free(qp);
+}
+
+static int
+mlx5_crypto_xts_queue_pair_release(struct rte_cryptodev *dev, uint16_t qp_id)
+{
+	struct mlx5_crypto_qp *qp = dev->data->queue_pairs[qp_id];
+
+	mlx5_crypto_indirect_mkeys_release(qp, qp->entries_n);
+	mlx5_crypto_xts_qp_release(qp);
+	dev->data->queue_pairs[qp_id] = NULL;
+	return 0;
+}
+
+static __rte_noinline uint32_t
+mlx5_crypto_xts_get_block_size(struct rte_crypto_op *op)
+{
+	uint32_t bl = op->sym->cipher.data.length;
+
+	switch (bl) {
+	case (1 << 20):
+		return RTE_BE32(MLX5_BLOCK_SIZE_1MB << MLX5_BLOCK_SIZE_OFFSET);
+	case (1 << 12):
+		return RTE_BE32(MLX5_BLOCK_SIZE_4096B <<
+				MLX5_BLOCK_SIZE_OFFSET);
+	case (1 << 9):
+		return RTE_BE32(MLX5_BLOCK_SIZE_512B << MLX5_BLOCK_SIZE_OFFSET);
+	default:
+		DRV_LOG(ERR, "Unknown block size: %u.", bl);
+		return UINT32_MAX;
+	}
+}
+
+static __rte_always_inline uint32_t
+mlx5_crypto_xts_klm_set(struct mlx5_crypto_qp *qp, struct rte_mbuf *mbuf,
+			struct mlx5_wqe_dseg *klm, uint32_t offset,
+			uint32_t *remain)
+{
+	uint32_t data_len = (rte_pktmbuf_data_len(mbuf) - offset);
+	uintptr_t addr = rte_pktmbuf_mtod_offset(mbuf, uintptr_t, offset);
+
+	if (data_len > *remain)
+		data_len = *remain;
+	*remain -= data_len;
+	klm->bcount = rte_cpu_to_be_32(data_len);
+	klm->pbuf = rte_cpu_to_be_64(addr);
+	klm->lkey = mlx5_mr_mb2mr(&qp->mr_ctrl, mbuf);
+	return klm->lkey;
+
+}
+
+static __rte_always_inline uint32_t
+mlx5_crypto_xts_klms_set(struct mlx5_crypto_qp *qp, struct rte_crypto_op *op,
+			 struct rte_mbuf *mbuf, struct mlx5_wqe_dseg *klm)
+{
+	uint32_t remain_len = op->sym->cipher.data.length;
+	uint32_t nb_segs = mbuf->nb_segs;
+	uint32_t klm_n = 1u;
+
+	/* First mbuf needs to take the cipher offset. */
+	if (unlikely(mlx5_crypto_xts_klm_set(qp, mbuf, klm,
+		     op->sym->cipher.data.offset, &remain_len) == UINT32_MAX)) {
+		op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+		return 0;
+	}
+	while (remain_len) {
+		nb_segs--;
+		mbuf = mbuf->next;
+		if (unlikely(mbuf == NULL || nb_segs == 0)) {
+			op->status = RTE_CRYPTO_OP_STATUS_INVALID_ARGS;
+			return 0;
+		}
+		if (unlikely(mlx5_crypto_xts_klm_set(qp, mbuf, ++klm, 0,
+						&remain_len) == UINT32_MAX)) {
+			op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+			return 0;
+		}
+		klm_n++;
+	}
+	return klm_n;
+}
+
+static __rte_always_inline int
+mlx5_crypto_xts_wqe_set(struct mlx5_crypto_priv *priv,
+			 struct mlx5_crypto_qp *qp,
+			 struct rte_crypto_op *op,
+			 struct mlx5_umr_wqe *umr)
+{
+	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+	struct mlx5_wqe_cseg *cseg = &umr->ctr;
+	struct mlx5_wqe_mkey_cseg *mkc = &umr->mkc;
+	struct mlx5_wqe_dseg *klms = &umr->kseg[0];
+	struct mlx5_wqe_umr_bsf_seg *bsf = ((struct mlx5_wqe_umr_bsf_seg *)
+				      RTE_PTR_ADD(umr, priv->umr_wqe_size)) - 1;
+	uint32_t ds;
+	bool ipl = op->sym->m_dst == NULL || op->sym->m_dst == op->sym->m_src;
+	/* Set UMR WQE. */
+	uint32_t klm_n = mlx5_crypto_xts_klms_set(qp, op,
+				   ipl ? op->sym->m_src : op->sym->m_dst, klms);
+
+	if (unlikely(klm_n == 0))
+		return 0;
+	bsf->bs_bpt_eo_es = sess->bs_bpt_eo_es;
+	if (unlikely(!sess->bsp_res)) {
+		bsf->bsp_res = mlx5_crypto_xts_get_block_size(op);
+		if (unlikely(bsf->bsp_res == UINT32_MAX)) {
+			op->status = RTE_CRYPTO_OP_STATUS_INVALID_ARGS;
+			return 0;
+		}
+	} else {
+		bsf->bsp_res = sess->bsp_res;
+	}
+	bsf->raw_data_size = rte_cpu_to_be_32(op->sym->cipher.data.length);
+	memcpy(bsf->xts_initial_tweak,
+	       rte_crypto_op_ctod_offset(op, uint8_t *, sess->iv_offset), 16);
+	bsf->res_dp = sess->dek_id;
+	mkc->len = rte_cpu_to_be_64(op->sym->cipher.data.length);
+	cseg->opcode = rte_cpu_to_be_32((qp->db_pi << 8) | MLX5_OPCODE_UMR);
+	qp->db_pi += priv->umr_wqe_stride;
+	/* Set RDMA_WRITE WQE. */
+	cseg = RTE_PTR_ADD(cseg, priv->umr_wqe_size);
+	klms = RTE_PTR_ADD(cseg, sizeof(struct mlx5_rdma_write_wqe));
+	if (!ipl) {
+		klm_n = mlx5_crypto_xts_klms_set(qp, op, op->sym->m_src, klms);
+		if (unlikely(klm_n == 0))
+			return 0;
+	} else {
+		memcpy(klms, &umr->kseg[0], sizeof(*klms) * klm_n);
+	}
+	ds = 2 + klm_n;
+	cseg->sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) | ds);
+	cseg->opcode = rte_cpu_to_be_32((qp->db_pi << 8) |
+							MLX5_OPCODE_RDMA_WRITE);
+	ds = RTE_ALIGN(ds, 4);
+	qp->db_pi += ds >> 2;
+	/* Set NOP WQE if needed. */
+	if (priv->max_rdmar_ds > ds) {
+		cseg += ds;
+		ds = priv->max_rdmar_ds - ds;
+		cseg->sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) | ds);
+		cseg->opcode = rte_cpu_to_be_32((qp->db_pi << 8) |
+							       MLX5_OPCODE_NOP);
+		qp->db_pi += ds >> 2; /* Here, DS is 4 aligned for sure. */
+	}
+	qp->wqe = (uint8_t *)cseg;
+	return 1;
+}
+
+static uint16_t
+mlx5_crypto_xts_enqueue_burst(void *queue_pair, struct rte_crypto_op **ops,
+			      uint16_t nb_ops)
+{
+	struct mlx5_crypto_qp *qp = queue_pair;
+	struct mlx5_crypto_priv *priv = qp->priv;
+	struct mlx5_umr_wqe *umr;
+	struct rte_crypto_op *op;
+	uint16_t mask = qp->entries_n - 1;
+	uint16_t remain = qp->entries_n - (qp->pi - qp->ci);
+	uint32_t idx;
+
+	if (remain < nb_ops)
+		nb_ops = remain;
+	else
+		remain = nb_ops;
+	if (unlikely(remain == 0))
+		return 0;
+	do {
+		idx = qp->pi & mask;
+		op = *ops++;
+		umr = RTE_PTR_ADD(qp->qp_obj.umem_buf,
+			priv->wqe_set_size * idx);
+		if (unlikely(mlx5_crypto_xts_wqe_set(priv, qp, op, umr) == 0)) {
+			qp->stats.enqueue_err_count++;
+			if (remain != nb_ops) {
+				qp->stats.enqueued_count -= remain;
+				break;
+			}
+			return 0;
+		}
+		qp->ops[idx] = op;
+		qp->pi++;
+	} while (--remain);
+	qp->stats.enqueued_count += nb_ops;
+	mlx5_doorbell_ring(&priv->uar.bf_db, *(volatile uint64_t *)qp->wqe,
+			   qp->db_pi, &qp->qp_obj.db_rec[MLX5_SND_DBR],
+			   !priv->uar.dbnc);
+	return nb_ops;
+}
+
+static __rte_noinline void
+mlx5_crypto_xts_cqe_err_handle(struct mlx5_crypto_qp *qp, struct rte_crypto_op *op)
+{
+	const uint32_t idx = qp->ci & (qp->entries_n - 1);
+	volatile struct mlx5_err_cqe *cqe = (volatile struct mlx5_err_cqe *)
+							&qp->cq_obj.cqes[idx];
+
+	op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+	qp->stats.dequeue_err_count++;
+	DRV_LOG(ERR, "CQE ERR:%x.\n", rte_be_to_cpu_32(cqe->syndrome));
+}
+
+static uint16_t
+mlx5_crypto_xts_dequeue_burst(void *queue_pair, struct rte_crypto_op **ops,
+			  uint16_t nb_ops)
+{
+	struct mlx5_crypto_qp *qp = queue_pair;
+	volatile struct mlx5_cqe *restrict cqe;
+	struct rte_crypto_op *restrict op;
+	const unsigned int cq_size = qp->entries_n;
+	const unsigned int mask = cq_size - 1;
+	uint32_t idx;
+	uint32_t next_idx = qp->ci & mask;
+	const uint16_t max = RTE_MIN((uint16_t)(qp->pi - qp->ci), nb_ops);
+	uint16_t i = 0;
+	int ret;
+
+	if (unlikely(max == 0))
+		return 0;
+	do {
+		idx = next_idx;
+		next_idx = (qp->ci + 1) & mask;
+		op = qp->ops[idx];
+		cqe = &qp->cq_obj.cqes[idx];
+		ret = check_cqe(cqe, cq_size, qp->ci);
+		rte_io_rmb();
+		if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
+			if (unlikely(ret != MLX5_CQE_STATUS_HW_OWN))
+				mlx5_crypto_xts_cqe_err_handle(qp, op);
+			break;
+		}
+		op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
+		ops[i++] = op;
+		qp->ci++;
+	} while (i < max);
+	if (likely(i != 0)) {
+		rte_io_wmb();
+		qp->cq_obj.db_rec[0] = rte_cpu_to_be_32(qp->ci);
+		qp->stats.dequeued_count += i;
+	}
+	return i;
+}
+
+static void
+mlx5_crypto_xts_qp_init(struct mlx5_crypto_priv *priv, struct mlx5_crypto_qp *qp)
+{
+	uint32_t i;
+
+	for (i = 0 ; i < qp->entries_n; i++) {
+		struct mlx5_wqe_cseg *cseg = RTE_PTR_ADD(qp->qp_obj.umem_buf,
+			i * priv->wqe_set_size);
+		struct mlx5_wqe_umr_cseg *ucseg = (struct mlx5_wqe_umr_cseg *)
+								     (cseg + 1);
+		struct mlx5_wqe_umr_bsf_seg *bsf =
+			(struct mlx5_wqe_umr_bsf_seg *)(RTE_PTR_ADD(cseg,
+						       priv->umr_wqe_size)) - 1;
+		struct mlx5_wqe_rseg *rseg;
+
+		/* Init UMR WQE. */
+		cseg->sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) |
+					 (priv->umr_wqe_size / MLX5_WSEG_SIZE));
+		cseg->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR <<
+				       MLX5_COMP_MODE_OFFSET);
+		cseg->misc = rte_cpu_to_be_32(qp->mkey[i]->id);
+		ucseg->if_cf_toe_cq_res = RTE_BE32(1u << MLX5_UMRC_IF_OFFSET);
+		ucseg->mkey_mask = RTE_BE64(1u << 0); /* Mkey length bit. */
+		ucseg->ko_to_bs = rte_cpu_to_be_32
+			((MLX5_CRYPTO_KLM_SEGS_NUM(priv->umr_wqe_size) <<
+			 MLX5_UMRC_KO_OFFSET) | (4 << MLX5_UMRC_TO_BS_OFFSET));
+		bsf->keytag = priv->keytag;
+		/* Init RDMA WRITE WQE. */
+		cseg = RTE_PTR_ADD(cseg, priv->umr_wqe_size);
+		cseg->flags = RTE_BE32((MLX5_COMP_ALWAYS <<
+				      MLX5_COMP_MODE_OFFSET) |
+				      MLX5_WQE_CTRL_INITIATOR_SMALL_FENCE);
+		rseg = (struct mlx5_wqe_rseg *)(cseg + 1);
+		rseg->rkey = rte_cpu_to_be_32(qp->mkey[i]->id);
+	}
+}
+
+static void *
+mlx5_crypto_gcm_mkey_klm_update(struct mlx5_crypto_priv *priv,
+				struct mlx5_crypto_qp *qp,
+				uint32_t idx)
+{
+	return RTE_PTR_ADD(qp->qp_obj.umem_buf, priv->wqe_set_size * idx);
+}
+
+static int
+mlx5_crypto_xts_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
+				 const struct rte_cryptodev_qp_conf *qp_conf,
+				 int socket_id)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_devx_qp_attr attr = {0};
+	struct mlx5_crypto_qp *qp;
+	uint16_t log_nb_desc = rte_log2_u32(qp_conf->nb_descriptors);
+	uint32_t ret;
+	uint32_t alloc_size = sizeof(*qp);
+	uint32_t log_wqbb_n;
+	struct mlx5_devx_cq_attr cq_attr = {
+		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
+	};
+	struct mlx5_devx_mkey_attr mkey_attr = {
+		.pd = priv->cdev->pdn,
+		.umr_en = 1,
+		.crypto_en = 1,
+		.set_remote_rw = 1,
+		.klm_num = MLX5_CRYPTO_KLM_SEGS_NUM(priv->umr_wqe_size),
+	};
+
+	if (dev->data->queue_pairs[qp_id] != NULL)
+		mlx5_crypto_xts_queue_pair_release(dev, qp_id);
+	alloc_size = RTE_ALIGN(alloc_size, RTE_CACHE_LINE_SIZE);
+	alloc_size += (sizeof(struct rte_crypto_op *) +
+		       sizeof(struct mlx5_devx_obj *)) *
+		       RTE_BIT32(log_nb_desc);
+	qp = rte_zmalloc_socket(__func__, alloc_size, RTE_CACHE_LINE_SIZE,
+				socket_id);
+	if (qp == NULL) {
+		DRV_LOG(ERR, "Failed to allocate QP memory.");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	if (mlx5_devx_cq_create(priv->cdev->ctx, &qp->cq_obj, log_nb_desc,
+				&cq_attr, socket_id) != 0) {
+		DRV_LOG(ERR, "Failed to create CQ.");
+		goto error;
+	}
+	log_wqbb_n = rte_log2_u32(RTE_BIT32(log_nb_desc) *
+				(priv->wqe_set_size / MLX5_SEND_WQE_BB));
+	attr.pd = priv->cdev->pdn;
+	attr.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar.obj);
+	attr.cqn = qp->cq_obj.cq->id;
+	attr.num_of_receive_wqes = 0;
+	attr.num_of_send_wqbbs = RTE_BIT32(log_wqbb_n);
+	attr.ts_format =
+		mlx5_ts_format_conv(priv->cdev->config.hca_attr.qp_ts_format);
+	ret = mlx5_devx_qp_create(priv->cdev->ctx, &qp->qp_obj,
+					attr.num_of_send_wqbbs * MLX5_WQE_SIZE,
+					&attr, socket_id);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to create QP.");
+		goto error;
+	}
+	if (mlx5_mr_ctrl_init(&qp->mr_ctrl, &priv->cdev->mr_scache.dev_gen,
+			      priv->dev_config.socket_id) != 0) {
+		DRV_LOG(ERR, "Cannot allocate MR Btree for qp %u.",
+			(uint32_t)qp_id);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	/*
+	 * In Order to configure self loopback, when calling devx qp2rts the
+	 * remote QP id that is used is the id of the same QP.
+	 */
+	if (mlx5_devx_qp2rts(&qp->qp_obj, qp->qp_obj.qp->id))
+		goto error;
+	qp->mkey = (struct mlx5_devx_obj **)RTE_ALIGN((uintptr_t)(qp + 1),
+							   RTE_CACHE_LINE_SIZE);
+	qp->ops = (struct rte_crypto_op **)(qp->mkey + RTE_BIT32(log_nb_desc));
+	qp->entries_n = 1 << log_nb_desc;
+	if (mlx5_crypto_indirect_mkeys_prepare(priv, qp, &mkey_attr,
+					       mlx5_crypto_gcm_mkey_klm_update)) {
+		DRV_LOG(ERR, "Cannot allocate indirect memory regions.");
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	mlx5_crypto_xts_qp_init(priv, qp);
+	qp->priv = priv;
+	dev->data->queue_pairs[qp_id] = qp;
+	return 0;
+error:
+	mlx5_crypto_xts_qp_release(qp);
+	return -1;
+}
+
+/*
+ * Calculate UMR WQE size and RDMA Write WQE size with the
+ * following limitations:
+ *	- Each WQE size is multiple of 64.
+ *	- The summarize of both UMR WQE and RDMA_W WQE is a power of 2.
+ *	- The number of entries in the UMR WQE's KLM list is multiple of 4.
+ */
+static void
+mlx5_crypto_xts_get_wqe_sizes(uint32_t segs_num, uint32_t *umr_size,
+			      uint32_t *rdmaw_size)
+{
+	uint32_t diff, wqe_set_size;
+
+	*umr_size = MLX5_CRYPTO_UMR_WQE_STATIC_SIZE +
+			RTE_ALIGN(segs_num, 4) *
+			sizeof(struct mlx5_wqe_dseg);
+	/* Make sure UMR WQE size is multiple of WQBB. */
+	*umr_size = RTE_ALIGN(*umr_size, MLX5_SEND_WQE_BB);
+	*rdmaw_size = sizeof(struct mlx5_rdma_write_wqe) +
+			sizeof(struct mlx5_wqe_dseg) *
+			(segs_num <= 2 ? 2 : 2 +
+			RTE_ALIGN(segs_num - 2, 4));
+	/* Make sure RDMA_WRITE WQE size is multiple of WQBB. */
+	*rdmaw_size = RTE_ALIGN(*rdmaw_size, MLX5_SEND_WQE_BB);
+	wqe_set_size = *rdmaw_size + *umr_size;
+	diff = rte_align32pow2(wqe_set_size) - wqe_set_size;
+	/* Make sure wqe_set size is power of 2. */
+	if (diff)
+		*umr_size += diff;
+}
+
+static uint8_t
+mlx5_crypto_xts_max_segs_num(uint16_t max_wqe_size)
+{
+	int klms_sizes = max_wqe_size - MLX5_CRYPTO_UMR_WQE_STATIC_SIZE;
+	uint32_t max_segs_cap = RTE_ALIGN_FLOOR(klms_sizes, MLX5_SEND_WQE_BB) /
+			sizeof(struct mlx5_wqe_dseg);
+
+	MLX5_ASSERT(klms_sizes >= MLX5_SEND_WQE_BB);
+	while (max_segs_cap) {
+		uint32_t umr_wqe_size, rdmw_wqe_size;
+
+		mlx5_crypto_xts_get_wqe_sizes(max_segs_cap, &umr_wqe_size,
+						&rdmw_wqe_size);
+		if (umr_wqe_size <= max_wqe_size &&
+				rdmw_wqe_size <= max_wqe_size)
+			break;
+		max_segs_cap -= 4;
+	}
+	return max_segs_cap;
+}
+
+static int
+mlx5_crypto_xts_configure_wqe_size(struct mlx5_crypto_priv *priv,
+				   uint16_t max_wqe_size, uint32_t max_segs_num)
+{
+	uint32_t rdmw_wqe_size, umr_wqe_size;
+
+	mlx5_crypto_xts_get_wqe_sizes(max_segs_num, &umr_wqe_size,
+			&rdmw_wqe_size);
+	priv->wqe_set_size = rdmw_wqe_size + umr_wqe_size;
+	if (umr_wqe_size > max_wqe_size ||
+				rdmw_wqe_size > max_wqe_size) {
+		DRV_LOG(ERR, "Invalid max_segs_num: %u. should be %u or lower.",
+			max_segs_num,
+			mlx5_crypto_xts_max_segs_num(max_wqe_size));
+		rte_errno = EINVAL;
+		return -EINVAL;
+	}
+	priv->umr_wqe_size = (uint16_t)umr_wqe_size;
+	priv->umr_wqe_stride = priv->umr_wqe_size / MLX5_SEND_WQE_BB;
+	priv->max_rdmar_ds = rdmw_wqe_size / sizeof(struct mlx5_wqe_dseg);
+	return 0;
+}
+
+int
+mlx5_crypto_xts_init(struct mlx5_crypto_priv *priv)
+{
+	struct mlx5_common_device *cdev = priv->cdev;
+	struct rte_cryptodev *crypto_dev = priv->crypto_dev;
+	struct rte_cryptodev_ops *dev_ops = crypto_dev->dev_ops;
+	int ret;
+
+	ret = mlx5_crypto_xts_configure_wqe_size(priv,
+		cdev->config.hca_attr.max_wqe_sz_sq, priv->max_segs_num);
+	if (ret)
+		return -EINVAL;
+	/* Override AES-XST specified ops. */
+	dev_ops->sym_session_configure = mlx5_crypto_xts_sym_session_configure;
+	dev_ops->queue_pair_setup = mlx5_crypto_xts_queue_pair_setup;
+	dev_ops->queue_pair_release = mlx5_crypto_xts_queue_pair_release;
+	crypto_dev->dequeue_burst = mlx5_crypto_xts_dequeue_burst;
+	crypto_dev->enqueue_burst = mlx5_crypto_xts_enqueue_burst;
+	priv->caps = mlx5_crypto_caps;
+	return 0;
+}
+
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v4 3/9] crypto/mlx5: add AES-GCM query and initialization
  2023-06-20 14:11 ` [PATCH v4 " Suanming Mou
  2023-06-20 14:11   ` [PATCH v4 1/9] common/mlx5: export memory region lookup by address Suanming Mou
  2023-06-20 14:11   ` [PATCH v4 2/9] crypto/mlx5: split AES-XTS Suanming Mou
@ 2023-06-20 14:11   ` Suanming Mou
  2023-06-20 14:11   ` [PATCH v4 4/9] crypto/mlx5: add AES-GCM encryption key Suanming Mou
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20 14:11 UTC (permalink / raw)
  To: gakhil, Matan Azrad, Viacheslav Ovsiienko, Ori Kam; +Cc: rasland, dev

AES-GCM provides both authenticated encryption and the ability to check
the integrity and authentication of additional authenticated data (AAD)
that is sent in the clear.

This commit adds the AES-GCM attributes query and initialization function.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c  | 15 +++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h  | 13 ++++++++++
 drivers/common/mlx5/mlx5_prm.h        | 19 +++++++++++---
 drivers/crypto/mlx5/meson.build       |  1 +
 drivers/crypto/mlx5/mlx5_crypto.c     |  4 ++-
 drivers/crypto/mlx5/mlx5_crypto.h     |  3 +++
 drivers/crypto/mlx5/mlx5_crypto_gcm.c | 36 +++++++++++++++++++++++++++
 7 files changed, 87 insertions(+), 4 deletions(-)
 create mode 100644 drivers/crypto/mlx5/mlx5_crypto_gcm.c

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 1e418a0353..4332081165 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -1117,6 +1117,21 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 		attr->crypto_wrapped_import_method = !!(MLX5_GET(crypto_caps,
 						hcattr, wrapped_import_method)
 						& 1 << 2);
+		attr->crypto_mmo.crypto_mmo_qp = MLX5_GET(crypto_caps, hcattr, crypto_mmo_qp);
+		attr->crypto_mmo.gcm_256_encrypt =
+			MLX5_GET(crypto_caps, hcattr, crypto_aes_gcm_256_encrypt);
+		attr->crypto_mmo.gcm_128_encrypt =
+			MLX5_GET(crypto_caps, hcattr, crypto_aes_gcm_128_encrypt);
+		attr->crypto_mmo.gcm_256_decrypt =
+			MLX5_GET(crypto_caps, hcattr, crypto_aes_gcm_256_decrypt);
+		attr->crypto_mmo.gcm_128_decrypt =
+			MLX5_GET(crypto_caps, hcattr, crypto_aes_gcm_128_decrypt);
+		attr->crypto_mmo.gcm_auth_tag_128 =
+			MLX5_GET(crypto_caps, hcattr, gcm_auth_tag_128);
+		attr->crypto_mmo.gcm_auth_tag_96 =
+			MLX5_GET(crypto_caps, hcattr, gcm_auth_tag_96);
+		attr->crypto_mmo.log_crypto_mmo_max_size =
+			MLX5_GET(crypto_caps, hcattr, log_crypto_mmo_max_size);
 	}
 	if (hca_cap_2_sup) {
 		hcattr = mlx5_devx_get_hca_cap(ctx, in, out, &rc,
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index dc3359268d..cb3f3a211b 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -125,6 +125,18 @@ struct mlx5_hca_flex_attr {
 	uint8_t  header_length_mask_width;
 };
 
+__extension__
+struct mlx5_hca_crypto_mmo_attr {
+	uint32_t crypto_mmo_qp:1;
+	uint32_t gcm_256_encrypt:1;
+	uint32_t gcm_128_encrypt:1;
+	uint32_t gcm_256_decrypt:1;
+	uint32_t gcm_128_decrypt:1;
+	uint32_t gcm_auth_tag_128:1;
+	uint32_t gcm_auth_tag_96:1;
+	uint32_t log_crypto_mmo_max_size:6;
+};
+
 /* ISO C restricts enumerator values to range of 'int' */
 __extension__
 enum {
@@ -250,6 +262,7 @@ struct mlx5_hca_attr {
 	struct mlx5_hca_vdpa_attr vdpa;
 	struct mlx5_hca_flow_attr flow;
 	struct mlx5_hca_flex_attr flex;
+	struct mlx5_hca_crypto_mmo_attr crypto_mmo;
 	int log_max_qp_sz;
 	int log_max_cq_sz;
 	int log_max_qp;
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 3d5b6b9ba5..e41b5b3528 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -4582,7 +4582,9 @@ struct mlx5_ifc_crypto_caps_bits {
 	u8 synchronize_dek[0x1];
 	u8 int_kek_manual[0x1];
 	u8 int_kek_auto[0x1];
-	u8 reserved_at_6[0x12];
+	u8 reserved_at_6[0xd];
+	u8 sw_wrapped_dek_key_purpose[0x1];
+	u8 reserved_at_14[0x4];
 	u8 wrapped_import_method[0x8];
 	u8 reserved_at_20[0x3];
 	u8 log_dek_max_alloc[0x5];
@@ -4599,8 +4601,19 @@ struct mlx5_ifc_crypto_caps_bits {
 	u8 log_dek_granularity[0x5];
 	u8 reserved_at_68[0x3];
 	u8 log_max_num_int_kek[0x5];
-	u8 reserved_at_70[0x10];
-	u8 reserved_at_80[0x780];
+	u8 sw_wrapped_dek_new[0x10];
+	u8 reserved_at_80[0x80];
+	u8 crypto_mmo_qp[0x1];
+	u8 crypto_aes_gcm_256_encrypt[0x1];
+	u8 crypto_aes_gcm_128_encrypt[0x1];
+	u8 crypto_aes_gcm_256_decrypt[0x1];
+	u8 crypto_aes_gcm_128_decrypt[0x1];
+	u8 gcm_auth_tag_128[0x1];
+	u8 gcm_auth_tag_96[0x1];
+	u8 reserved_at_107[0x3];
+	u8 log_crypto_mmo_max_size[0x6];
+	u8 reserved_at_110[0x10];
+	u8 reserved_at_120[0x6e0];
 };
 
 struct mlx5_ifc_crypto_commissioning_register_bits {
diff --git a/drivers/crypto/mlx5/meson.build b/drivers/crypto/mlx5/meson.build
index 045e8ce81d..17ffce89f0 100644
--- a/drivers/crypto/mlx5/meson.build
+++ b/drivers/crypto/mlx5/meson.build
@@ -16,6 +16,7 @@ endif
 sources = files(
         'mlx5_crypto.c',
 	'mlx5_crypto_xts.c',
+	'mlx5_crypto_gcm.c',
         'mlx5_crypto_dek.c',
 )
 
diff --git a/drivers/crypto/mlx5/mlx5_crypto.c b/drivers/crypto/mlx5/mlx5_crypto.c
index 2e6bcc6ddc..ff632cd69a 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -335,7 +335,9 @@ mlx5_crypto_dev_probe(struct mlx5_common_device *cdev,
 		rte_errno = ENOTSUP;
 		return -rte_errno;
 	}
-	if (!cdev->config.hca_attr.crypto || !cdev->config.hca_attr.aes_xts) {
+	if (!cdev->config.hca_attr.crypto ||
+	   (!cdev->config.hca_attr.aes_xts &&
+	    !cdev->config.hca_attr.crypto_mmo.crypto_mmo_qp)) {
 		DRV_LOG(ERR, "Not enough capabilities to support crypto "
 			"operations, maybe old FW/OFED version?");
 		rte_errno = ENOTSUP;
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index 05d8fe97fe..76f368ee91 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -117,4 +117,7 @@ mlx5_crypto_dek_unset(struct mlx5_crypto_priv *priv);
 int
 mlx5_crypto_xts_init(struct mlx5_crypto_priv *priv);
 
+int
+mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv);
+
 #endif /* MLX5_CRYPTO_H_ */
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
new file mode 100644
index 0000000000..bd78c6d66b
--- /dev/null
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 NVIDIA Corporation & Affiliates
+ */
+
+#include <rte_malloc.h>
+#include <rte_mempool.h>
+#include <rte_eal_paging.h>
+#include <rte_errno.h>
+#include <rte_log.h>
+#include <bus_pci_driver.h>
+#include <rte_memory.h>
+
+#include <mlx5_glue.h>
+#include <mlx5_common.h>
+#include <mlx5_devx_cmds.h>
+#include <mlx5_common_os.h>
+
+#include "mlx5_crypto_utils.h"
+#include "mlx5_crypto.h"
+
+static struct rte_cryptodev_capabilities mlx5_crypto_gcm_caps[] = {
+	{
+		.op = RTE_CRYPTO_OP_TYPE_UNDEFINED,
+	},
+	{
+		.op = RTE_CRYPTO_OP_TYPE_UNDEFINED,
+	}
+};
+
+int
+mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
+{
+	priv->caps = mlx5_crypto_gcm_caps;
+	return 0;
+}
+
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v4 4/9] crypto/mlx5: add AES-GCM encryption key
  2023-06-20 14:11 ` [PATCH v4 " Suanming Mou
                     ` (2 preceding siblings ...)
  2023-06-20 14:11   ` [PATCH v4 3/9] crypto/mlx5: add AES-GCM query and initialization Suanming Mou
@ 2023-06-20 14:11   ` Suanming Mou
  2023-06-20 14:11   ` [PATCH v4 5/9] crypto/mlx5: add AES-GCM session configure Suanming Mou
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20 14:11 UTC (permalink / raw)
  To: gakhil, Matan Azrad; +Cc: rasland, dev

The crypto device requires the DEK(data encryption key) object for
data encryption/decryption operation.

This commit adds the AES-GCM DEK object management support.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/crypto/mlx5/mlx5_crypto.h     |  17 ++++-
 drivers/crypto/mlx5/mlx5_crypto_dek.c | 102 +++++++++++++-------------
 drivers/crypto/mlx5/mlx5_crypto_gcm.c |  33 +++++++++
 drivers/crypto/mlx5/mlx5_crypto_xts.c |  53 ++++++++++++-
 4 files changed, 150 insertions(+), 55 deletions(-)

diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index 76f368ee91..bb5a557a38 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -86,6 +86,11 @@ struct mlx5_crypto_session {
 	uint32_t dek_id; /**< DEK ID */
 } __rte_packed;
 
+struct mlx5_crypto_dek_ctx {
+	struct rte_crypto_sym_xform *xform;
+	struct mlx5_crypto_priv *priv;
+};
+
 typedef void *(*mlx5_crypto_mkey_update_t)(struct mlx5_crypto_priv *priv,
 					   struct mlx5_crypto_qp *qp,
 					   uint32_t idx);
@@ -106,7 +111,7 @@ mlx5_crypto_dek_destroy(struct mlx5_crypto_priv *priv,
 
 struct mlx5_crypto_dek *
 mlx5_crypto_dek_prepare(struct mlx5_crypto_priv *priv,
-			struct rte_crypto_cipher_xform *cipher);
+			struct rte_crypto_sym_xform *xform);
 
 int
 mlx5_crypto_dek_setup(struct mlx5_crypto_priv *priv);
@@ -120,4 +125,14 @@ mlx5_crypto_xts_init(struct mlx5_crypto_priv *priv);
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv);
 
+int
+mlx5_crypto_dek_fill_xts_attr(struct mlx5_crypto_dek *dek,
+			      struct mlx5_devx_dek_attr *dek_attr,
+			      void *cb_ctx);
+
+int
+mlx5_crypto_dek_fill_gcm_attr(struct mlx5_crypto_dek *dek,
+			      struct mlx5_devx_dek_attr *dek_attr,
+			      void *cb_ctx);
+
 #endif /* MLX5_CRYPTO_H_ */
diff --git a/drivers/crypto/mlx5/mlx5_crypto_dek.c b/drivers/crypto/mlx5/mlx5_crypto_dek.c
index 7339ef2bd9..716bcc0545 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_dek.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_dek.c
@@ -13,10 +13,24 @@
 #include "mlx5_crypto_utils.h"
 #include "mlx5_crypto.h"
 
-struct mlx5_crypto_dek_ctx {
-	struct rte_crypto_cipher_xform *cipher;
-	struct mlx5_crypto_priv *priv;
-};
+static int
+mlx5_crypto_dek_get_key(struct rte_crypto_sym_xform *xform,
+			const uint8_t **key,
+			uint16_t *key_len)
+{
+	if (xform->type == RTE_CRYPTO_SYM_XFORM_CIPHER) {
+		*key = xform->cipher.key.data;
+		*key_len = xform->cipher.key.length;
+	} else if (xform->type == RTE_CRYPTO_SYM_XFORM_AEAD) {
+		*key = xform->aead.key.data;
+		*key_len = xform->aead.key.length;
+	} else {
+		DRV_LOG(ERR, "Xform dek type not supported.");
+		rte_errno = -EINVAL;
+		return -1;
+	}
+	return 0;
+}
 
 int
 mlx5_crypto_dek_destroy(struct mlx5_crypto_priv *priv,
@@ -27,19 +41,22 @@ mlx5_crypto_dek_destroy(struct mlx5_crypto_priv *priv,
 
 struct mlx5_crypto_dek *
 mlx5_crypto_dek_prepare(struct mlx5_crypto_priv *priv,
-			struct rte_crypto_cipher_xform *cipher)
+			struct rte_crypto_sym_xform *xform)
 {
+	const uint8_t *key;
+	uint16_t key_len;
 	struct mlx5_hlist *dek_hlist = priv->dek_hlist;
 	struct mlx5_crypto_dek_ctx dek_ctx = {
-		.cipher = cipher,
+		.xform = xform,
 		.priv = priv,
 	};
-	struct rte_crypto_cipher_xform *cipher_ctx = cipher;
-	uint64_t key64 = __rte_raw_cksum(cipher_ctx->key.data,
-					 cipher_ctx->key.length, 0);
-	struct mlx5_list_entry *entry = mlx5_hlist_register(dek_hlist,
-							     key64, &dek_ctx);
+	uint64_t key64;
+	struct mlx5_list_entry *entry;
 
+	if (mlx5_crypto_dek_get_key(xform, &key, &key_len))
+		return NULL;
+	key64 = __rte_raw_cksum(key, key_len, 0);
+	entry = mlx5_hlist_register(dek_hlist, key64, &dek_ctx);
 	return entry == NULL ? NULL :
 			     container_of(entry, struct mlx5_crypto_dek, entry);
 }
@@ -76,76 +93,55 @@ mlx5_crypto_dek_match_cb(void *tool_ctx __rte_unused,
 			 struct mlx5_list_entry *entry, void *cb_ctx)
 {
 	struct mlx5_crypto_dek_ctx *ctx = cb_ctx;
-	struct rte_crypto_cipher_xform *cipher_ctx = ctx->cipher;
+	struct rte_crypto_sym_xform *xform = ctx->xform;
 	struct mlx5_crypto_dek *dek =
 			container_of(entry, typeof(*dek), entry);
 	uint32_t key_len = dek->size;
+	uint16_t xkey_len;
+	const uint8_t *key;
 
-	if (key_len != cipher_ctx->key.length)
+	if (mlx5_crypto_dek_get_key(xform, &key, &xkey_len))
+		return -1;
+	if (key_len != xkey_len)
 		return -1;
-	return memcmp(cipher_ctx->key.data, dek->data, cipher_ctx->key.length);
+	return memcmp(key, dek->data, xkey_len);
 }
 
 static struct mlx5_list_entry *
 mlx5_crypto_dek_create_cb(void *tool_ctx __rte_unused, void *cb_ctx)
 {
 	struct mlx5_crypto_dek_ctx *ctx = cb_ctx;
-	struct rte_crypto_cipher_xform *cipher_ctx = ctx->cipher;
+	struct rte_crypto_sym_xform *xform = ctx->xform;
 	struct mlx5_crypto_dek *dek = rte_zmalloc(__func__, sizeof(*dek),
 						  RTE_CACHE_LINE_SIZE);
 	struct mlx5_devx_dek_attr dek_attr = {
 		.pd = ctx->priv->cdev->pdn,
-		.key_purpose = MLX5_CRYPTO_KEY_PURPOSE_AES_XTS,
-		.has_keytag = 1,
 	};
-	bool is_wrapped = ctx->priv->is_wrapped_mode;
+	int ret = -1;
 
 	if (dek == NULL) {
 		DRV_LOG(ERR, "Failed to allocate dek memory.");
 		return NULL;
 	}
-	if (is_wrapped) {
-		switch (cipher_ctx->key.length) {
-		case 48:
-			dek->size = 48;
-			dek_attr.key_size = MLX5_CRYPTO_KEY_SIZE_128b;
-			break;
-		case 80:
-			dek->size = 80;
-			dek_attr.key_size = MLX5_CRYPTO_KEY_SIZE_256b;
-			break;
-		default:
-			DRV_LOG(ERR, "Wrapped key size not supported.");
-			return NULL;
-		}
-	} else {
-		switch (cipher_ctx->key.length) {
-		case 32:
-			dek->size = 40;
-			dek_attr.key_size = MLX5_CRYPTO_KEY_SIZE_128b;
-			break;
-		case 64:
-			dek->size = 72;
-			dek_attr.key_size = MLX5_CRYPTO_KEY_SIZE_256b;
-			break;
-		default:
-			DRV_LOG(ERR, "Key size not supported.");
-			return NULL;
-		}
-		memcpy(&dek_attr.key[cipher_ctx->key.length],
-						&ctx->priv->keytag, 8);
-	}
-	memcpy(&dek_attr.key, cipher_ctx->key.data, cipher_ctx->key.length);
+	if (xform->type == RTE_CRYPTO_SYM_XFORM_CIPHER)
+		ret = mlx5_crypto_dek_fill_xts_attr(dek, &dek_attr, cb_ctx);
+	else if (xform->type == RTE_CRYPTO_SYM_XFORM_AEAD)
+		ret = mlx5_crypto_dek_fill_gcm_attr(dek, &dek_attr, cb_ctx);
+	if (ret)
+		goto fail;
 	dek->obj = mlx5_devx_cmd_create_dek_obj(ctx->priv->cdev->ctx,
 						&dek_attr);
 	if (dek->obj == NULL) {
-		rte_free(dek);
-		return NULL;
+		DRV_LOG(ERR, "Failed to create dek obj.");
+		goto fail;
 	}
-	memcpy(&dek->data, cipher_ctx->key.data, cipher_ctx->key.length);
 	return &dek->entry;
+fail:
+	rte_free(dek);
+	return NULL;
 }
 
+
 static void
 mlx5_crypto_dek_remove_cb(void *tool_ctx __rte_unused,
 			  struct mlx5_list_entry *entry)
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index bd78c6d66b..5b315ef42c 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -27,6 +27,39 @@ static struct rte_cryptodev_capabilities mlx5_crypto_gcm_caps[] = {
 	}
 };
 
+int
+mlx5_crypto_dek_fill_gcm_attr(struct mlx5_crypto_dek *dek,
+			      struct mlx5_devx_dek_attr *dek_attr,
+			      void *cb_ctx)
+{
+	uint32_t offset = 0;
+	struct mlx5_crypto_dek_ctx *ctx = cb_ctx;
+	struct rte_crypto_aead_xform *aead_ctx = &ctx->xform->aead;
+
+	if (aead_ctx->algo != RTE_CRYPTO_AEAD_AES_GCM) {
+		DRV_LOG(ERR, "Only AES-GCM algo supported.");
+		return -EINVAL;
+	}
+	dek_attr->key_purpose = MLX5_CRYPTO_KEY_PURPOSE_GCM;
+	switch (aead_ctx->key.length) {
+	case 16:
+		offset = 16;
+		dek->size = 16;
+		dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_128b;
+		break;
+	case 32:
+		dek->size = 32;
+		dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_256b;
+		break;
+	default:
+		DRV_LOG(ERR, "Wrapped key size not supported.");
+		return -EINVAL;
+	}
+	memcpy(&dek_attr->key[offset], aead_ctx->key.data, aead_ctx->key.length);
+	memcpy(&dek->data, aead_ctx->key.data, aead_ctx->key.length);
+	return 0;
+}
+
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 {
diff --git a/drivers/crypto/mlx5/mlx5_crypto_xts.c b/drivers/crypto/mlx5/mlx5_crypto_xts.c
index 964d02e6ed..661da5f589 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_xts.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_xts.c
@@ -45,6 +45,57 @@ const struct rte_cryptodev_capabilities mlx5_crypto_caps[] = {
 	},
 };
 
+int
+mlx5_crypto_dek_fill_xts_attr(struct mlx5_crypto_dek *dek,
+			      struct mlx5_devx_dek_attr *dek_attr,
+			      void *cb_ctx)
+{
+	struct mlx5_crypto_dek_ctx *ctx = cb_ctx;
+	struct rte_crypto_cipher_xform *cipher_ctx = &ctx->xform->cipher;
+	bool is_wrapped = ctx->priv->is_wrapped_mode;
+
+	if (cipher_ctx->algo != RTE_CRYPTO_CIPHER_AES_XTS) {
+		DRV_LOG(ERR, "Only AES-XTS algo supported.");
+		return -EINVAL;
+	}
+	dek_attr->key_purpose = MLX5_CRYPTO_KEY_PURPOSE_AES_XTS;
+	dek_attr->has_keytag = 1;
+	if (is_wrapped) {
+		switch (cipher_ctx->key.length) {
+		case 48:
+			dek->size = 48;
+			dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_128b;
+			break;
+		case 80:
+			dek->size = 80;
+			dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_256b;
+			break;
+		default:
+			DRV_LOG(ERR, "Wrapped key size not supported.");
+			return -EINVAL;
+		}
+	} else {
+		switch (cipher_ctx->key.length) {
+		case 32:
+			dek->size = 40;
+			dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_128b;
+			break;
+		case 64:
+			dek->size = 72;
+			dek_attr->key_size = MLX5_CRYPTO_KEY_SIZE_256b;
+			break;
+		default:
+			DRV_LOG(ERR, "Key size not supported.");
+			return -EINVAL;
+		}
+		memcpy(&dek_attr->key[cipher_ctx->key.length],
+						&ctx->priv->keytag, 8);
+	}
+	memcpy(&dek_attr->key, cipher_ctx->key.data, cipher_ctx->key.length);
+	memcpy(&dek->data, cipher_ctx->key.data, cipher_ctx->key.length);
+	return 0;
+}
+
 static int
 mlx5_crypto_xts_sym_session_configure(struct rte_cryptodev *dev,
 				      struct rte_crypto_sym_xform *xform,
@@ -66,7 +117,7 @@ mlx5_crypto_xts_sym_session_configure(struct rte_cryptodev *dev,
 		return -ENOTSUP;
 	}
 	cipher = &xform->cipher;
-	sess_private_data->dek = mlx5_crypto_dek_prepare(priv, cipher);
+	sess_private_data->dek = mlx5_crypto_dek_prepare(priv, xform);
 	if (sess_private_data->dek == NULL) {
 		DRV_LOG(ERR, "Failed to prepare dek.");
 		return -ENOMEM;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v4 5/9] crypto/mlx5: add AES-GCM session configure
  2023-06-20 14:11 ` [PATCH v4 " Suanming Mou
                     ` (3 preceding siblings ...)
  2023-06-20 14:11   ` [PATCH v4 4/9] crypto/mlx5: add AES-GCM encryption key Suanming Mou
@ 2023-06-20 14:11   ` Suanming Mou
  2023-06-20 14:11   ` [PATCH v4 6/9] common/mlx5: add WQE-based QP synchronous basics Suanming Mou
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20 14:11 UTC (permalink / raw)
  To: gakhil, Matan Azrad, Viacheslav Ovsiienko, Ori Kam; +Cc: rasland, dev

Sessions are used in symmetric transformations in order to prepare
objects and data for packet processing stage.

The AES-GCM session includes IV, AAD, digest(tag), DEK, operation
mode information.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/common/mlx5/mlx5_prm.h        | 12 +++++++
 drivers/crypto/mlx5/mlx5_crypto.h     | 40 ++++++++++++++++++-----
 drivers/crypto/mlx5/mlx5_crypto_gcm.c | 47 +++++++++++++++++++++++++++
 3 files changed, 91 insertions(+), 8 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index e41b5b3528..96ce342d29 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -523,11 +523,23 @@ enum {
 	MLX5_BLOCK_SIZE_4048B	= 0x6,
 };
 
+enum {
+	MLX5_ENCRYPTION_TYPE_AES_GCM = 0x3,
+};
+
+enum {
+	MLX5_CRYPTO_OP_TYPE_ENCRYPTION = 0x0,
+	MLX5_CRYPTO_OP_TYPE_DECRYPTION = 0x1,
+};
+
 #define MLX5_BSF_SIZE_OFFSET		30
 #define MLX5_BSF_P_TYPE_OFFSET		24
 #define MLX5_ENCRYPTION_ORDER_OFFSET	16
 #define MLX5_BLOCK_SIZE_OFFSET		24
 
+#define MLX5_CRYPTO_MMO_TYPE_OFFSET 24
+#define MLX5_CRYPTO_MMO_OP_OFFSET 20
+
 struct mlx5_wqe_umr_bsf_seg {
 	/*
 	 * bs_bpt_eo_es contains:
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index bb5a557a38..6cb4d4ddec 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -72,16 +72,40 @@ struct mlx5_crypto_devarg_params {
 };
 
 struct mlx5_crypto_session {
-	uint32_t bs_bpt_eo_es;
-	/**< bsf_size, bsf_p_type, encryption_order and encryption standard,
-	 * saved in big endian format.
-	 */
-	uint32_t bsp_res;
-	/**< crypto_block_size_pointer and reserved 24 bits saved in big
-	 * endian format.
-	 */
+	union {
+		/**< AES-XTS configuration. */
+		struct {
+			uint32_t bs_bpt_eo_es;
+			/**< bsf_size, bsf_p_type, encryption_order and encryption standard,
+			 * saved in big endian format.
+			 */
+			uint32_t bsp_res;
+			/**< crypto_block_size_pointer and reserved 24 bits saved in big
+			 * endian format.
+			 */
+		};
+		/**< AES-GCM configuration. */
+		struct {
+			uint32_t mmo_ctrl;
+			/**< Crypto control fields with algo type and op type in big
+			 * endian format.
+			 */
+			uint32_t wqe_aad_len;
+			/**< Crypto AAD length field in big endian format. */
+			uint32_t wqe_tag_len;
+			/**< Crypto tag length field in big endian format. */
+			uint16_t tag_len;
+			/**< AES-GCM crypto digest size in bytes. */
+			uint16_t aad_len;
+			/**< The length of the additional authenticated data (AAD) in bytes. */
+			uint32_t op_type;
+			/**< Operation type. */
+		};
+	};
 	uint32_t iv_offset:16;
 	/**< Starting point for Initialisation Vector. */
+	uint32_t iv_len;
+	/**< Initialisation Vector length. */
 	struct mlx5_crypto_dek *dek; /**< Pointer to dek struct. */
 	uint32_t dek_id; /**< DEK ID */
 } __rte_packed;
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index 5b315ef42c..5f55314382 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -60,9 +60,56 @@ mlx5_crypto_dek_fill_gcm_attr(struct mlx5_crypto_dek *dek,
 	return 0;
 }
 
+static int
+mlx5_crypto_sym_gcm_session_configure(struct rte_cryptodev *dev,
+				  struct rte_crypto_sym_xform *xform,
+				  struct rte_cryptodev_sym_session *session)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_crypto_session *sess_private_data = CRYPTODEV_GET_SYM_SESS_PRIV(session);
+	struct rte_crypto_aead_xform *aead = &xform->aead;
+	uint32_t op_type;
+
+	if (unlikely(xform->next != NULL)) {
+		DRV_LOG(ERR, "Xform next is not supported.");
+		return -ENOTSUP;
+	}
+	if (aead->algo != RTE_CRYPTO_AEAD_AES_GCM) {
+		DRV_LOG(ERR, "Only AES-GCM algorithm is supported.");
+		return -ENOTSUP;
+	}
+	if (aead->op == RTE_CRYPTO_AEAD_OP_ENCRYPT)
+		op_type = MLX5_CRYPTO_OP_TYPE_ENCRYPTION;
+	else
+		op_type = MLX5_CRYPTO_OP_TYPE_DECRYPTION;
+	sess_private_data->op_type = op_type;
+	sess_private_data->mmo_ctrl = rte_cpu_to_be_32
+			(op_type << MLX5_CRYPTO_MMO_OP_OFFSET |
+			 MLX5_ENCRYPTION_TYPE_AES_GCM << MLX5_CRYPTO_MMO_TYPE_OFFSET);
+	sess_private_data->aad_len = aead->aad_length;
+	sess_private_data->tag_len = aead->digest_length;
+	sess_private_data->iv_offset = aead->iv.offset;
+	sess_private_data->iv_len = aead->iv.length;
+	sess_private_data->dek = mlx5_crypto_dek_prepare(priv, xform);
+	if (sess_private_data->dek == NULL) {
+		DRV_LOG(ERR, "Failed to prepare dek.");
+		return -ENOMEM;
+	}
+	sess_private_data->dek_id =
+			rte_cpu_to_be_32(sess_private_data->dek->obj->id &
+					 0xffffff);
+	DRV_LOG(DEBUG, "Session %p was configured.", sess_private_data);
+	return 0;
+}
+
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 {
+	struct rte_cryptodev *crypto_dev = priv->crypto_dev;
+	struct rte_cryptodev_ops *dev_ops = crypto_dev->dev_ops;
+
+	/* Override AES-GCM specified ops. */
+	dev_ops->sym_session_configure = mlx5_crypto_sym_gcm_session_configure;
 	priv->caps = mlx5_crypto_gcm_caps;
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v4 6/9] common/mlx5: add WQE-based QP synchronous basics
  2023-06-20 14:11 ` [PATCH v4 " Suanming Mou
                     ` (4 preceding siblings ...)
  2023-06-20 14:11   ` [PATCH v4 5/9] crypto/mlx5: add AES-GCM session configure Suanming Mou
@ 2023-06-20 14:11   ` Suanming Mou
  2023-06-20 14:11   ` [PATCH v4 7/9] crypto/mlx5: add queue pair setup for GCM Suanming Mou
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20 14:11 UTC (permalink / raw)
  To: gakhil, Matan Azrad, Viacheslav Ovsiienko, Ori Kam; +Cc: rasland, dev

Nvidia HW provides a synchronous mechanism between QPs. When
creating the QPs, user can set one as primary and another as
follower. The follower QP's WQE execution can be controlled
by primary QP via SEND_EN WQE.

This commit introduces the SEND_EN WQE to improve the WQE
execution sync-up between primary and follower QPs.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c |  6 ++++++
 drivers/common/mlx5/mlx5_devx_cmds.h |  3 +++
 drivers/common/mlx5/mlx5_prm.h       | 11 +++++++++++
 3 files changed, 20 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 4332081165..ef87862a6d 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2475,6 +2475,12 @@ mlx5_devx_cmd_create_qp(void *ctx,
 				 attr->dbr_umem_valid);
 			MLX5_SET(qpc, qpc, dbr_umem_id, attr->dbr_umem_id);
 		}
+		if (attr->cd_master)
+			MLX5_SET(qpc, qpc, cd_master, attr->cd_master);
+		if (attr->cd_slave_send)
+			MLX5_SET(qpc, qpc, cd_slave_send, attr->cd_slave_send);
+		if (attr->cd_slave_recv)
+			MLX5_SET(qpc, qpc, cd_slave_receive, attr->cd_slave_recv);
 		MLX5_SET64(qpc, qpc, dbr_addr, attr->dbr_address);
 		MLX5_SET64(create_qp_in, in, wq_umem_offset,
 			   attr->wq_umem_offset);
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index cb3f3a211b..e071cd841f 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -559,6 +559,9 @@ struct mlx5_devx_qp_attr {
 	uint64_t wq_umem_offset;
 	uint32_t user_index:24;
 	uint32_t mmo:1;
+	uint32_t cd_master:1;
+	uint32_t cd_slave_send:1;
+	uint32_t cd_slave_recv:1;
 };
 
 struct mlx5_devx_virtio_q_couners_attr {
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 96ce342d29..4990bcaacd 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -589,6 +589,17 @@ struct mlx5_rdma_write_wqe {
 	struct mlx5_wqe_dseg dseg[];
 } __rte_packed;
 
+struct mlx5_wqe_send_en_seg {
+	uint32_t reserve[2];
+	uint32_t sqnpc;
+	uint32_t qpn;
+} __rte_packed;
+
+struct mlx5_wqe_send_en_wqe {
+	struct mlx5_wqe_cseg ctr;
+	struct mlx5_wqe_send_en_seg sseg;
+} __rte_packed;
+
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v4 7/9] crypto/mlx5: add queue pair setup for GCM
  2023-06-20 14:11 ` [PATCH v4 " Suanming Mou
                     ` (5 preceding siblings ...)
  2023-06-20 14:11   ` [PATCH v4 6/9] common/mlx5: add WQE-based QP synchronous basics Suanming Mou
@ 2023-06-20 14:11   ` Suanming Mou
  2023-06-20 14:11   ` [PATCH v4 8/9] crypto/mlx5: add enqueue and dequeue operations Suanming Mou
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20 14:11 UTC (permalink / raw)
  To: gakhil, Matan Azrad, Viacheslav Ovsiienko, Ori Kam; +Cc: rasland, dev

Crypto queue pair is for handling the encryption/decryption operations.

As AES-GCM AEAD API provides AAD, mbuf, digest separately, low-level FW
only accepts the data in a single contiguous memory region, two internal
QPs are created for AES-GCM queue pair. One for organizing the memory
to be contiguous if they are not. The other is for crypto.

If the buffers are checked as implicitly contiguous, the buffer will be
sent to the crypto QP directly for encryption/decryption. If not, the
buffers will be handled by the first UMR QP. The UMR QP will convert
the buffers to be contiguous one. Then the well organized "new" buffer
can be handled by crypto QP.

The crypto QP is initialized as follower, and UMR as leader. Once
crypto operation input buffer requires memory address space converting
by UMR QP, the crypto QP processing will be triggered by UMR QP.
Otherwise, the ring crypto QP doorbell directly.

The existing max_segs_num devarg is used for define how many segments
the chained mbuf contains same as AES-XTS before.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/common/mlx5/mlx5_common_mr.h  |   1 +
 drivers/common/mlx5/mlx5_prm.h        |  22 +++
 drivers/common/mlx5/version.map       |   2 +
 drivers/crypto/mlx5/mlx5_crypto.h     |  15 ++
 drivers/crypto/mlx5/mlx5_crypto_gcm.c | 230 ++++++++++++++++++++++++++
 5 files changed, 270 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_common_mr.h b/drivers/common/mlx5/mlx5_common_mr.h
index 66623868a2..8789d403b1 100644
--- a/drivers/common/mlx5/mlx5_common_mr.h
+++ b/drivers/common/mlx5/mlx5_common_mr.h
@@ -254,6 +254,7 @@ __rte_internal
 void
 mlx5_common_verbs_dereg_mr(struct mlx5_pmd_mr *pmd_mr);
 
+__rte_internal
 void
 mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb, mlx5_dereg_mr_t *dereg_mr_cb);
 
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 4990bcaacd..4f6925733a 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -470,6 +470,15 @@ struct mlx5_wqe_rseg {
 #define MLX5_UMRC_KO_OFFSET 16u
 #define MLX5_UMRC_TO_BS_OFFSET 0u
 
+/*
+ * As PRM describes, the address of the UMR pointer must be
+ * aligned to 2KB.
+ */
+#define MLX5_UMR_KLM_PTR_ALIGN (1 << 11)
+
+#define MLX5_UMR_KLM_NUM_ALIGN \
+	(MLX5_UMR_KLM_PTR_ALIGN / sizeof(struct mlx5_klm))
+
 struct mlx5_wqe_umr_cseg {
 	uint32_t if_cf_toe_cq_res;
 	uint32_t ko_to_bs;
@@ -674,6 +683,19 @@ union mlx5_gga_compress_opaque {
 	uint32_t data[64];
 };
 
+union mlx5_gga_crypto_opaque {
+	struct {
+		uint32_t syndrome;
+		uint32_t reserved0[2];
+		struct {
+			uint32_t iv[3];
+			uint32_t tag_size;
+			uint32_t aad_size;
+		} cp __rte_packed;
+	} __rte_packed;
+	uint8_t data[64];
+};
+
 struct mlx5_ifc_regexp_mmo_control_bits {
 	uint8_t reserved_at_31[0x2];
 	uint8_t le[0x1];
diff --git a/drivers/common/mlx5/version.map b/drivers/common/mlx5/version.map
index f860b069de..0758ba76de 100644
--- a/drivers/common/mlx5/version.map
+++ b/drivers/common/mlx5/version.map
@@ -159,5 +159,7 @@ INTERNAL {
 
 	mlx5_os_interrupt_handler_create; # WINDOWS_NO_EXPORT
 	mlx5_os_interrupt_handler_destroy; # WINDOWS_NO_EXPORT
+
+	mlx5_os_set_reg_mr_cb;
 	local: *;
 };
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index 6cb4d4ddec..88a09a6b1c 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -28,8 +28,11 @@ struct mlx5_crypto_priv {
 	TAILQ_ENTRY(mlx5_crypto_priv) next;
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
 	struct rte_cryptodev *crypto_dev;
+	mlx5_reg_mr_t reg_mr_cb; /* Callback to reg_mr func */
+	mlx5_dereg_mr_t dereg_mr_cb; /* Callback to dereg_mr func */
 	struct mlx5_uar uar; /* User Access Region. */
 	uint32_t max_segs_num; /* Maximum supported data segs. */
+	uint32_t max_klm_num; /* Maximum supported klm. */
 	struct mlx5_hlist *dek_hlist; /* Dek hash list. */
 	const struct rte_cryptodev_capabilities *caps;
 	struct rte_cryptodev_config dev_config;
@@ -46,15 +49,27 @@ struct mlx5_crypto_qp {
 	struct mlx5_crypto_priv *priv;
 	struct mlx5_devx_cq cq_obj;
 	struct mlx5_devx_qp qp_obj;
+	struct mlx5_devx_qp umr_qp_obj;
 	struct rte_cryptodev_stats stats;
 	struct rte_crypto_op **ops;
 	struct mlx5_devx_obj **mkey; /* WQE's indirect mekys. */
+	struct mlx5_klm *klm_array;
+	union mlx5_gga_crypto_opaque *opaque_addr;
 	struct mlx5_mr_ctrl mr_ctrl;
+	struct mlx5_pmd_mr mr;
+	/* Crypto QP. */
 	uint8_t *wqe;
 	uint16_t entries_n;
+	uint16_t cq_entries_n;
 	uint16_t pi;
 	uint16_t ci;
 	uint16_t db_pi;
+	/* UMR QP. */
+	uint8_t *umr_wqe;
+	uint16_t umr_wqbbs;
+	uint16_t umr_pi;
+	uint16_t umr_ci;
+	uint32_t umr_errors;
 };
 
 struct mlx5_crypto_dek {
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index 5f55314382..c3859547ee 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -18,6 +18,20 @@
 #include "mlx5_crypto_utils.h"
 #include "mlx5_crypto.h"
 
+/*
+ * AES-GCM uses indirect KLM mode. The UMR WQE comprises of WQE control +
+ * UMR control + mkey context + indirect KLM. The WQE size is aligned to
+ * be 3 WQEBBS.
+ */
+#define MLX5_UMR_GCM_WQE_SIZE \
+	(RTE_ALIGN(sizeof(struct mlx5_umr_wqe) + sizeof(struct mlx5_wqe_dseg), \
+			MLX5_SEND_WQE_BB))
+
+#define MLX5_UMR_GCM_WQE_SET_SIZE \
+	(MLX5_UMR_GCM_WQE_SIZE + \
+	 RTE_ALIGN(sizeof(struct mlx5_wqe_send_en_wqe), \
+	 MLX5_SEND_WQE_BB))
+
 static struct rte_cryptodev_capabilities mlx5_crypto_gcm_caps[] = {
 	{
 		.op = RTE_CRYPTO_OP_TYPE_UNDEFINED,
@@ -86,6 +100,8 @@ mlx5_crypto_sym_gcm_session_configure(struct rte_cryptodev *dev,
 	sess_private_data->mmo_ctrl = rte_cpu_to_be_32
 			(op_type << MLX5_CRYPTO_MMO_OP_OFFSET |
 			 MLX5_ENCRYPTION_TYPE_AES_GCM << MLX5_CRYPTO_MMO_TYPE_OFFSET);
+	sess_private_data->wqe_aad_len = rte_cpu_to_be_32((uint32_t)aead->aad_length);
+	sess_private_data->wqe_tag_len = rte_cpu_to_be_32((uint32_t)aead->digest_length);
 	sess_private_data->aad_len = aead->aad_length;
 	sess_private_data->tag_len = aead->digest_length;
 	sess_private_data->iv_offset = aead->iv.offset;
@@ -102,6 +118,216 @@ mlx5_crypto_sym_gcm_session_configure(struct rte_cryptodev *dev,
 	return 0;
 }
 
+static void *
+mlx5_crypto_gcm_mkey_klm_update(struct mlx5_crypto_priv *priv,
+				struct mlx5_crypto_qp *qp __rte_unused,
+				uint32_t idx)
+{
+	return &qp->klm_array[idx * priv->max_klm_num];
+}
+
+static int
+mlx5_crypto_gcm_qp_release(struct rte_cryptodev *dev, uint16_t qp_id)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_crypto_qp *qp = dev->data->queue_pairs[qp_id];
+
+	if (qp->umr_qp_obj.qp != NULL)
+		mlx5_devx_qp_destroy(&qp->umr_qp_obj);
+	if (qp->qp_obj.qp != NULL)
+		mlx5_devx_qp_destroy(&qp->qp_obj);
+	if (qp->cq_obj.cq != NULL)
+		mlx5_devx_cq_destroy(&qp->cq_obj);
+	if (qp->mr.obj != NULL) {
+		void *opaq = qp->mr.addr;
+
+		priv->dereg_mr_cb(&qp->mr);
+		rte_free(opaq);
+	}
+	mlx5_crypto_indirect_mkeys_release(qp, qp->entries_n);
+	mlx5_mr_btree_free(&qp->mr_ctrl.cache_bh);
+	rte_free(qp);
+	dev->data->queue_pairs[qp_id] = NULL;
+	return 0;
+}
+
+static void
+mlx5_crypto_gcm_init_qp(struct mlx5_crypto_qp *qp)
+{
+	volatile struct mlx5_gga_wqe *restrict wqe =
+				    (volatile struct mlx5_gga_wqe *)qp->qp_obj.wqes;
+	volatile union mlx5_gga_crypto_opaque *opaq = qp->opaque_addr;
+	const uint32_t sq_ds = rte_cpu_to_be_32((qp->qp_obj.qp->id << 8) | 4u);
+	const uint32_t flags = RTE_BE32(MLX5_COMP_ALWAYS <<
+					MLX5_COMP_MODE_OFFSET);
+	const uint32_t opaq_lkey = rte_cpu_to_be_32(qp->mr.lkey);
+	int i;
+
+	/* All the next fields state should stay constant. */
+	for (i = 0; i < qp->entries_n; ++i, ++wqe) {
+		wqe->sq_ds = sq_ds;
+		wqe->flags = flags;
+		wqe->opaque_lkey = opaq_lkey;
+		wqe->opaque_vaddr = rte_cpu_to_be_64((uint64_t)(uintptr_t)&opaq[i]);
+	}
+}
+
+static inline int
+mlx5_crypto_gcm_umr_qp_setup(struct rte_cryptodev *dev, struct mlx5_crypto_qp *qp,
+			     int socket_id)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_devx_qp_attr attr = {0};
+	uint32_t ret;
+	uint32_t log_wqbb_n;
+
+	/* Set UMR + SEND_EN WQE as maximum same with crypto. */
+	log_wqbb_n = rte_log2_u32(qp->entries_n *
+			(MLX5_UMR_GCM_WQE_SET_SIZE / MLX5_SEND_WQE_BB));
+	attr.pd = priv->cdev->pdn;
+	attr.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar.obj);
+	attr.cqn = qp->cq_obj.cq->id;
+	attr.num_of_receive_wqes = 0;
+	attr.num_of_send_wqbbs = RTE_BIT32(log_wqbb_n);
+	attr.ts_format =
+		mlx5_ts_format_conv(priv->cdev->config.hca_attr.qp_ts_format);
+	attr.cd_master = 1;
+	ret = mlx5_devx_qp_create(priv->cdev->ctx, &qp->umr_qp_obj,
+				  attr.num_of_send_wqbbs * MLX5_SEND_WQE_BB,
+				  &attr, socket_id);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to create UMR QP.");
+		return -1;
+	}
+	if (mlx5_devx_qp2rts(&qp->umr_qp_obj, qp->umr_qp_obj.qp->id)) {
+		DRV_LOG(ERR, "Failed to change UMR QP state to RTS.");
+		return -1;
+	}
+	/* Save the UMR WQEBBS for checking the WQE boundary. */
+	qp->umr_wqbbs = attr.num_of_send_wqbbs;
+	return 0;
+}
+
+static int
+mlx5_crypto_gcm_qp_setup(struct rte_cryptodev *dev, uint16_t qp_id,
+			 const struct rte_cryptodev_qp_conf *qp_conf,
+			 int socket_id)
+{
+	struct mlx5_crypto_priv *priv = dev->data->dev_private;
+	struct mlx5_hca_attr *attr = &priv->cdev->config.hca_attr;
+	struct mlx5_crypto_qp *qp;
+	struct mlx5_devx_cq_attr cq_attr = {
+		.uar_page_id = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
+	};
+	struct mlx5_devx_qp_attr qp_attr = {
+		.pd = priv->cdev->pdn,
+		.uar_index = mlx5_os_get_devx_uar_page_id(priv->uar.obj),
+		.user_index = qp_id,
+	};
+	struct mlx5_devx_mkey_attr mkey_attr = {
+		.pd = priv->cdev->pdn,
+		.umr_en = 1,
+		.klm_num = priv->max_klm_num,
+	};
+	uint32_t log_ops_n = rte_log2_u32(qp_conf->nb_descriptors);
+	uint32_t entries = RTE_BIT32(log_ops_n);
+	uint32_t alloc_size = sizeof(*qp);
+	size_t mr_size, opaq_size;
+	void *mr_buf;
+	int ret;
+
+	alloc_size = RTE_ALIGN(alloc_size, RTE_CACHE_LINE_SIZE);
+	alloc_size += (sizeof(struct rte_crypto_op *) +
+		       sizeof(struct mlx5_devx_obj *)) * entries;
+	qp = rte_zmalloc_socket(__func__, alloc_size, RTE_CACHE_LINE_SIZE,
+				socket_id);
+	if (qp == NULL) {
+		DRV_LOG(ERR, "Failed to allocate qp memory.");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	qp->priv = priv;
+	qp->entries_n = entries;
+	if (mlx5_mr_ctrl_init(&qp->mr_ctrl, &priv->cdev->mr_scache.dev_gen,
+				  priv->dev_config.socket_id)) {
+		DRV_LOG(ERR, "Cannot allocate MR Btree for qp %u.",
+			(uint32_t)qp_id);
+		rte_errno = ENOMEM;
+		goto err;
+	}
+	/*
+	 * The following KLM pointer must be aligned with
+	 * MLX5_UMR_KLM_PTR_ALIGN. Aligned opaq_size here
+	 * to make the KLM pointer with offset be aligned.
+	 */
+	opaq_size = RTE_ALIGN(sizeof(union mlx5_gga_crypto_opaque) * entries,
+			      MLX5_UMR_KLM_PTR_ALIGN);
+	mr_size = (priv->max_klm_num * sizeof(struct mlx5_klm) * entries) + opaq_size;
+	mr_buf = rte_calloc(__func__, (size_t)1, mr_size, MLX5_UMR_KLM_PTR_ALIGN);
+	if (mr_buf == NULL) {
+		DRV_LOG(ERR, "Failed to allocate mr memory.");
+		rte_errno = ENOMEM;
+		goto err;
+	}
+	if (priv->reg_mr_cb(priv->cdev->pd, mr_buf, mr_size, &qp->mr) != 0) {
+		rte_free(mr_buf);
+		DRV_LOG(ERR, "Failed to register opaque MR.");
+		rte_errno = ENOMEM;
+		goto err;
+	}
+	qp->opaque_addr = qp->mr.addr;
+	qp->klm_array = RTE_PTR_ADD(qp->opaque_addr, opaq_size);
+	/*
+	 * Triple the CQ size as UMR QP which contains UMR and SEND_EN WQE
+	 * will share this CQ .
+	 */
+	qp->cq_entries_n = rte_align32pow2(entries * 3);
+	ret = mlx5_devx_cq_create(priv->cdev->ctx, &qp->cq_obj,
+				  rte_log2_u32(qp->cq_entries_n),
+				  &cq_attr, socket_id);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Failed to create CQ.");
+		goto err;
+	}
+	qp_attr.cqn = qp->cq_obj.cq->id;
+	qp_attr.ts_format = mlx5_ts_format_conv(attr->qp_ts_format);
+	qp_attr.num_of_receive_wqes = 0;
+	qp_attr.num_of_send_wqbbs = entries;
+	qp_attr.mmo = attr->crypto_mmo.crypto_mmo_qp;
+	/* Set MMO QP as follower as the input data may depend on UMR. */
+	qp_attr.cd_slave_send = 1;
+	ret = mlx5_devx_qp_create(priv->cdev->ctx, &qp->qp_obj,
+				  qp_attr.num_of_send_wqbbs * MLX5_WQE_SIZE,
+				  &qp_attr, socket_id);
+	if (ret != 0) {
+		DRV_LOG(ERR, "Failed to create QP.");
+		goto err;
+	}
+	mlx5_crypto_gcm_init_qp(qp);
+	ret = mlx5_devx_qp2rts(&qp->qp_obj, 0);
+	if (ret)
+		goto err;
+	qp->ops = (struct rte_crypto_op **)(qp + 1);
+	qp->mkey = (struct mlx5_devx_obj **)(qp->ops + entries);
+	if (mlx5_crypto_gcm_umr_qp_setup(dev, qp, socket_id)) {
+		DRV_LOG(ERR, "Failed to setup UMR QP.");
+		goto err;
+	}
+	DRV_LOG(INFO, "QP %u: SQN=0x%X CQN=0x%X entries num = %u",
+		(uint32_t)qp_id, qp->qp_obj.qp->id, qp->cq_obj.cq->id, entries);
+	if (mlx5_crypto_indirect_mkeys_prepare(priv, qp, &mkey_attr,
+					       mlx5_crypto_gcm_mkey_klm_update)) {
+		DRV_LOG(ERR, "Cannot allocate indirect memory regions.");
+		rte_errno = ENOMEM;
+		goto err;
+	}
+	dev->data->queue_pairs[qp_id] = qp;
+	return 0;
+err:
+	mlx5_crypto_gcm_qp_release(dev, qp_id);
+	return -1;
+}
+
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 {
@@ -110,6 +336,10 @@ mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 
 	/* Override AES-GCM specified ops. */
 	dev_ops->sym_session_configure = mlx5_crypto_sym_gcm_session_configure;
+	mlx5_os_set_reg_mr_cb(&priv->reg_mr_cb, &priv->dereg_mr_cb);
+	dev_ops->queue_pair_setup = mlx5_crypto_gcm_qp_setup;
+	dev_ops->queue_pair_release = mlx5_crypto_gcm_qp_release;
+	priv->max_klm_num = RTE_ALIGN((priv->max_segs_num + 1) * 2 + 1, MLX5_UMR_KLM_NUM_ALIGN);
 	priv->caps = mlx5_crypto_gcm_caps;
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v4 8/9] crypto/mlx5: add enqueue and dequeue operations
  2023-06-20 14:11 ` [PATCH v4 " Suanming Mou
                     ` (6 preceding siblings ...)
  2023-06-20 14:11   ` [PATCH v4 7/9] crypto/mlx5: add queue pair setup for GCM Suanming Mou
@ 2023-06-20 14:11   ` Suanming Mou
  2023-06-20 14:11   ` [PATCH v4 9/9] crypto/mlx5: enable AES-GCM capability Suanming Mou
  2023-06-20 18:49   ` [EXT] [PATCH v4 0/9] crypto/mlx5: support AES-GCM Akhil Goyal
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20 14:11 UTC (permalink / raw)
  To: gakhil, Matan Azrad, Viacheslav Ovsiienko, Ori Kam; +Cc: rasland, dev

The crypto operations are performed with crypto WQE. If the input
buffers(AAD, mbuf, digest) are not contiguous and there is no enough
headroom/tailroom for copying AAD/digest, as the requirement from FW,
an UMR WQE is needed to generate contiguous address space for crypto
WQE. The UMR WQE and crypto WQE are handled in two different QPs.

Crypto operation with non-contiguous buffers will have its own UMR
WQE, while the operation with contiguous buffers doesn't need the
UMR WQE. Once the all the operations WQE in the enqueue burst built
finishes, if any UMR WQEs are built, an additional SEND_EN WQE will
be as the final WQE of the burst in the UMR QP. The purpose of that
SEND_EN WQE is to trigger the crypto QP processing with the UMR ready
input memory address space buffers.

The QP for crypto operations contains only the crypto WQE and the QP
WQEs are built as fixed in QP setup. The QP processing is triggered
by doorbell ring or the SEND_EN WQE from UMR QP.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/common/mlx5/mlx5_prm.h        |   1 +
 drivers/crypto/mlx5/mlx5_crypto.c     |   9 +-
 drivers/crypto/mlx5/mlx5_crypto.h     |   8 +
 drivers/crypto/mlx5/mlx5_crypto_gcm.c | 588 ++++++++++++++++++++++++++
 4 files changed, 604 insertions(+), 2 deletions(-)

diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index 4f6925733a..d33d05238c 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -617,6 +617,7 @@ struct mlx5_wqe_send_en_wqe {
 /* MMO metadata segment */
 
 #define	MLX5_OPCODE_MMO	0x2fu
+#define	MLX5_OPC_MOD_MMO_CRYPTO 0x6u
 #define	MLX5_OPC_MOD_MMO_REGEX 0x4u
 #define	MLX5_OPC_MOD_MMO_COMP 0x2u
 #define	MLX5_OPC_MOD_MMO_DECOMP 0x3u
diff --git a/drivers/crypto/mlx5/mlx5_crypto.c b/drivers/crypto/mlx5/mlx5_crypto.c
index ff632cd69a..4d7d3ef2a3 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -62,8 +62,13 @@ mlx5_crypto_dev_infos_get(struct rte_cryptodev *dev,
 			MLX5_CRYPTO_FEATURE_FLAGS(priv->is_wrapped_mode);
 		dev_info->capabilities = priv->caps;
 		dev_info->max_nb_queue_pairs = MLX5_CRYPTO_MAX_QPS;
-		dev_info->min_mbuf_headroom_req = 0;
-		dev_info->min_mbuf_tailroom_req = 0;
+		if (priv->caps->sym.xform_type == RTE_CRYPTO_SYM_XFORM_AEAD) {
+			dev_info->min_mbuf_headroom_req = MLX5_CRYPTO_GCM_MAX_AAD;
+			dev_info->min_mbuf_tailroom_req = MLX5_CRYPTO_GCM_MAX_DIGEST;
+		} else {
+			dev_info->min_mbuf_headroom_req = 0;
+			dev_info->min_mbuf_tailroom_req = 0;
+		}
 		dev_info->sym.max_nb_sessions = 0;
 		/*
 		 * If 0, the device does not have any limitation in number of
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index 88a09a6b1c..6dcb41b27c 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -23,6 +23,8 @@
 #define MLX5_CRYPTO_KLM_SEGS_NUM(umr_wqe_sz) ((umr_wqe_sz -\
 					MLX5_CRYPTO_UMR_WQE_STATIC_SIZE) /\
 					MLX5_WSEG_SIZE)
+#define MLX5_CRYPTO_GCM_MAX_AAD 64
+#define MLX5_CRYPTO_GCM_MAX_DIGEST 16
 
 struct mlx5_crypto_priv {
 	TAILQ_ENTRY(mlx5_crypto_priv) next;
@@ -61,6 +63,9 @@ struct mlx5_crypto_qp {
 	uint8_t *wqe;
 	uint16_t entries_n;
 	uint16_t cq_entries_n;
+	uint16_t reported_ci;
+	uint16_t qp_ci;
+	uint16_t cq_ci;
 	uint16_t pi;
 	uint16_t ci;
 	uint16_t db_pi;
@@ -70,6 +75,9 @@ struct mlx5_crypto_qp {
 	uint16_t umr_pi;
 	uint16_t umr_ci;
 	uint32_t umr_errors;
+	uint16_t last_gga_pi;
+	bool has_umr;
+	uint16_t cpy_tag_op;
 };
 
 struct mlx5_crypto_dek {
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index c3859547ee..8389c03c91 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -9,6 +9,7 @@
 #include <rte_log.h>
 #include <bus_pci_driver.h>
 #include <rte_memory.h>
+#include <rte_io.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_common.h>
@@ -32,6 +33,40 @@
 	 RTE_ALIGN(sizeof(struct mlx5_wqe_send_en_wqe), \
 	 MLX5_SEND_WQE_BB))
 
+#define MLX5_UMR_GCM_WQE_STRIDE \
+	(MLX5_UMR_GCM_WQE_SIZE / MLX5_SEND_WQE_BB)
+
+#define MLX5_MMO_CRYPTO_OPC (MLX5_OPCODE_MMO | \
+	(MLX5_OPC_MOD_MMO_CRYPTO << WQE_CSEG_OPC_MOD_OFFSET))
+
+/*
+ * The status default value is RTE_CRYPTO_OP_STATUS_SUCCESS.
+ * Copy tag should fill different value to status.
+ */
+#define MLX5_CRYPTO_OP_STATUS_GCM_TAG_COPY (RTE_CRYPTO_OP_STATUS_SUCCESS + 1)
+
+struct mlx5_crypto_gcm_op_info {
+	bool need_umr;
+	bool is_oop;
+	bool is_enc;
+	void *digest;
+	void *src_addr;
+};
+
+struct mlx5_crypto_gcm_data {
+	void *src_addr;
+	uint32_t src_bytes;
+	void *dst_addr;
+	uint32_t dst_bytes;
+	uint32_t src_mkey;
+	uint32_t dst_mkey;
+};
+
+struct mlx5_crypto_gcm_tag_cpy_info {
+	void *digest;
+	uint8_t tag_len;
+} __rte_packed;
+
 static struct rte_cryptodev_capabilities mlx5_crypto_gcm_caps[] = {
 	{
 		.op = RTE_CRYPTO_OP_TYPE_UNDEFINED,
@@ -328,6 +363,557 @@ mlx5_crypto_gcm_qp_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 	return -1;
 }
 
+static __rte_always_inline void
+mlx5_crypto_gcm_get_op_info(struct mlx5_crypto_qp *qp,
+			    struct rte_crypto_op *op,
+			    struct mlx5_crypto_gcm_op_info *op_info)
+{
+	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+	struct rte_mbuf *m_src = op->sym->m_src;
+	void *aad_addr = op->sym->aead.aad.data;
+	void *tag_addr = op->sym->aead.digest.data;
+	void *src_addr = rte_pktmbuf_mtod_offset(m_src, void *, op->sym->aead.data.offset);
+	struct rte_mbuf *m_dst = m_src;
+	void *dst_addr = src_addr;
+	void *expected_aad = NULL;
+	void *expected_tag = NULL;
+	bool is_enc = sess->op_type == MLX5_CRYPTO_OP_TYPE_ENCRYPTION;
+	bool cp_aad = false;
+	bool cp_tag = false;
+
+	op_info->is_oop = false;
+	op_info->need_umr = false;
+	op_info->is_enc = is_enc;
+	op_info->digest = NULL;
+	op_info->src_addr = aad_addr;
+	if (op->sym->m_dst && op->sym->m_dst != m_src) {
+		op_info->is_oop = true;
+		m_dst = op->sym->m_dst;
+		dst_addr = rte_pktmbuf_mtod_offset(m_dst, void *, op->sym->aead.data.offset);
+		if (m_dst->nb_segs > 1) {
+			op_info->need_umr = true;
+			return;
+		}
+		/*
+		 * If the op's mbuf has extra data offset, don't copy AAD to
+		 * this area.
+		 */
+		if (rte_pktmbuf_headroom(m_dst) < sess->aad_len ||
+		    op->sym->aead.data.offset) {
+			op_info->need_umr = true;
+			return;
+		}
+	}
+	if (m_src->nb_segs > 1) {
+		op_info->need_umr = true;
+		return;
+	}
+	expected_aad = RTE_PTR_SUB(src_addr, sess->aad_len);
+	if (expected_aad != aad_addr) {
+		/*
+		 * If the op's mbuf has extra data offset, don't copy AAD to
+		 * this area.
+		 */
+		if (sess->aad_len > MLX5_CRYPTO_GCM_MAX_AAD ||
+		    sess->aad_len > rte_pktmbuf_headroom(m_src) ||
+		    op->sym->aead.data.offset) {
+			op_info->need_umr = true;
+			return;
+		}
+		cp_aad = true;
+		op_info->src_addr = expected_aad;
+	}
+	expected_tag = RTE_PTR_ADD(is_enc ? dst_addr : src_addr, op->sym->aead.data.length);
+	if (expected_tag != tag_addr) {
+		struct rte_mbuf *mbuf = is_enc ? m_dst : m_src;
+
+		/*
+		 * If op's mbuf is not fully set as payload, don't copy digest to
+		 * the left area.
+		 */
+		if (rte_pktmbuf_tailroom(mbuf) < sess->tag_len ||
+		    rte_pktmbuf_data_len(mbuf) != op->sym->aead.data.length) {
+			op_info->need_umr = true;
+			return;
+		}
+		if (is_enc) {
+			op_info->digest = expected_tag;
+			qp->cpy_tag_op++;
+		} else {
+			cp_tag = true;
+		}
+	}
+	if (cp_aad)
+		memcpy(expected_aad, aad_addr, sess->aad_len);
+	if (cp_tag)
+		memcpy(expected_tag, tag_addr, sess->tag_len);
+}
+
+static __rte_always_inline uint32_t
+_mlx5_crypto_gcm_umr_build_mbuf_klm(struct mlx5_crypto_qp *qp,
+				    struct rte_mbuf *mbuf,
+				    struct mlx5_klm *klm,
+				    uint32_t offset,
+				    uint32_t *remain)
+{
+	uint32_t data_len = (rte_pktmbuf_data_len(mbuf) - offset);
+	uintptr_t addr = rte_pktmbuf_mtod_offset(mbuf, uintptr_t, offset);
+
+	if (data_len > *remain)
+		data_len = *remain;
+	*remain -= data_len;
+	klm->byte_count = rte_cpu_to_be_32(data_len);
+	klm->address = rte_cpu_to_be_64(addr);
+	klm->mkey = mlx5_mr_mb2mr(&qp->mr_ctrl, mbuf);
+	return klm->mkey;
+}
+
+static __rte_always_inline int
+mlx5_crypto_gcm_build_mbuf_chain_klms(struct mlx5_crypto_qp *qp,
+				      struct rte_crypto_op *op,
+				      struct rte_mbuf *mbuf,
+				      struct mlx5_klm *klm)
+{
+	uint32_t remain_len = op->sym->aead.data.length;
+	__rte_unused uint32_t nb_segs = mbuf->nb_segs;
+	uint32_t klm_n = 0;
+
+	/* mbuf seg num should be less than max_segs_num. */
+	MLX5_ASSERT(nb_segs <= qp->priv->max_segs_num);
+	/* First mbuf needs to take the data offset. */
+	if (unlikely(_mlx5_crypto_gcm_umr_build_mbuf_klm(qp, mbuf, klm,
+		     op->sym->aead.data.offset, &remain_len) == UINT32_MAX)) {
+		op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+		return 0;
+	}
+	klm++;
+	klm_n++;
+	while (remain_len) {
+		nb_segs--;
+		mbuf = mbuf->next;
+		MLX5_ASSERT(mbuf && nb_segs);
+		if (unlikely(_mlx5_crypto_gcm_umr_build_mbuf_klm(qp, mbuf, klm,
+						0, &remain_len) == UINT32_MAX)) {
+			op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+			return 0;
+		}
+		klm++;
+		klm_n++;
+	}
+	return klm_n;
+}
+
+static __rte_always_inline int
+mlx5_crypto_gcm_build_klm_by_addr(struct mlx5_crypto_qp *qp,
+				  struct mlx5_klm *klm,
+				  void *addr,
+				  uint32_t len)
+{
+	klm->byte_count = rte_cpu_to_be_32(len);
+	klm->address = rte_cpu_to_be_64((uintptr_t)addr);
+	klm->mkey = mlx5_mr_addr2mr_bh(&qp->mr_ctrl, (uintptr_t)addr);
+	if (klm->mkey == UINT32_MAX)
+		return 0;
+	return 1;
+}
+
+static __rte_always_inline int
+mlx5_crypto_gcm_build_op_klm(struct mlx5_crypto_qp *qp,
+			     struct rte_crypto_op *op,
+			     struct mlx5_crypto_gcm_op_info *op_info,
+			     struct mlx5_klm *klm,
+			     uint32_t *len)
+{
+	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+	struct mlx5_klm *digest = NULL, *aad = NULL;
+	uint32_t total_len = op->sym->aead.data.length + sess->aad_len + sess->tag_len;
+	uint32_t klm_n = 0, klm_src = 0, klm_dst = 0;
+
+	/* Build AAD KLM. */
+	aad = klm;
+	if (!mlx5_crypto_gcm_build_klm_by_addr(qp, aad, op->sym->aead.aad.data, sess->aad_len))
+		return 0;
+	klm_n++;
+	/* Build src mubf KLM. */
+	klm_src = mlx5_crypto_gcm_build_mbuf_chain_klms(qp, op, op->sym->m_src, &klm[klm_n]);
+	if (!klm_src)
+		return 0;
+	klm_n += klm_src;
+	/* Reserve digest KLM if needed. */
+	if (!op_info->is_oop ||
+	    sess->op_type == MLX5_CRYPTO_OP_TYPE_DECRYPTION) {
+		digest = &klm[klm_n];
+		klm_n++;
+	}
+	/* Build dst mbuf KLM. */
+	if (op_info->is_oop) {
+		klm[klm_n] = *aad;
+		klm_n++;
+		klm_dst = mlx5_crypto_gcm_build_mbuf_chain_klms(qp, op, op->sym->m_dst,
+								&klm[klm_n]);
+		if (!klm_dst)
+			return 0;
+		klm_n += klm_dst;
+		total_len += (op->sym->aead.data.length + sess->aad_len);
+	}
+	/* Update digest at the end if it is not set. */
+	if (!digest) {
+		digest = &klm[klm_n];
+		klm_n++;
+	}
+	/* Build digest KLM. */
+	if (!mlx5_crypto_gcm_build_klm_by_addr(qp, digest, op->sym->aead.digest.data,
+					       sess->tag_len))
+		return 0;
+	*len = total_len;
+	return klm_n;
+}
+
+static __rte_always_inline struct mlx5_wqe_cseg *
+mlx5_crypto_gcm_get_umr_wqe(struct mlx5_crypto_qp *qp)
+{
+	uint32_t wqe_offset = qp->umr_pi & (qp->umr_wqbbs - 1);
+	uint32_t left_wqbbs = qp->umr_wqbbs - wqe_offset;
+	struct mlx5_wqe_cseg *wqe;
+
+	/* If UMR WQE is near the boundary. */
+	if (left_wqbbs < MLX5_UMR_GCM_WQE_STRIDE) {
+		/* Append NOP WQE as the left WQEBBS is not enough for UMR. */
+		wqe = RTE_PTR_ADD(qp->umr_qp_obj.umem_buf, wqe_offset * MLX5_SEND_WQE_BB);
+		wqe->opcode = rte_cpu_to_be_32(MLX5_OPCODE_NOP | ((uint32_t)qp->umr_pi << 8));
+		wqe->sq_ds = rte_cpu_to_be_32((qp->umr_qp_obj.qp->id << 8) | (left_wqbbs << 2));
+		wqe->flags = RTE_BE32(0);
+		wqe->misc = RTE_BE32(0);
+		qp->umr_pi += left_wqbbs;
+		wqe_offset = qp->umr_pi & (qp->umr_wqbbs - 1);
+	}
+	wqe_offset *= MLX5_SEND_WQE_BB;
+	return RTE_PTR_ADD(qp->umr_qp_obj.umem_buf, wqe_offset);
+}
+
+static __rte_always_inline int
+mlx5_crypto_gcm_build_umr(struct mlx5_crypto_qp *qp,
+			  struct rte_crypto_op *op,
+			  uint32_t idx,
+			  struct mlx5_crypto_gcm_op_info *op_info,
+			  struct mlx5_crypto_gcm_data *data)
+{
+	struct mlx5_crypto_priv *priv = qp->priv;
+	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+	struct mlx5_wqe_cseg *wqe;
+	struct mlx5_wqe_umr_cseg *ucseg;
+	struct mlx5_wqe_mkey_cseg *mkc;
+	struct mlx5_klm *iklm;
+	struct mlx5_klm *klm = &qp->klm_array[idx * priv->max_klm_num];
+	uint16_t klm_size, klm_align;
+	uint32_t total_len;
+
+	/* Build KLM base on the op. */
+	klm_size = mlx5_crypto_gcm_build_op_klm(qp, op, op_info, klm, &total_len);
+	if (!klm_size)
+		return -EINVAL;
+	klm_align = RTE_ALIGN(klm_size, 4);
+	/* Get UMR WQE memory. */
+	wqe = mlx5_crypto_gcm_get_umr_wqe(qp);
+	memset(wqe, 0, MLX5_UMR_GCM_WQE_SIZE);
+	/* Set WQE control seg. Non-inline KLM UMR WQE size must be 9 WQE_DS. */
+	wqe->opcode = rte_cpu_to_be_32(MLX5_OPCODE_UMR | ((uint32_t)qp->umr_pi << 8));
+	wqe->sq_ds = rte_cpu_to_be_32((qp->umr_qp_obj.qp->id << 8) | 9);
+	wqe->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR << MLX5_COMP_MODE_OFFSET);
+	wqe->misc = rte_cpu_to_be_32(qp->mkey[idx]->id);
+	/* Set UMR WQE control seg. */
+	ucseg = (struct mlx5_wqe_umr_cseg *)(wqe + 1);
+	ucseg->mkey_mask |= RTE_BE64(1u << 0);
+	ucseg->ko_to_bs = rte_cpu_to_be_32(klm_align << MLX5_UMRC_KO_OFFSET);
+	/* Set mkey context seg. */
+	mkc = (struct mlx5_wqe_mkey_cseg *)(ucseg + 1);
+	mkc->len = rte_cpu_to_be_64(total_len);
+	mkc->qpn_mkey = rte_cpu_to_be_32(0xffffff00 | (qp->mkey[idx]->id & 0xff));
+	/* Set UMR pointer to data seg. */
+	iklm = (struct mlx5_klm *)(mkc + 1);
+	iklm->address = rte_cpu_to_be_64((uintptr_t)((char *)klm));
+	iklm->mkey = rte_cpu_to_be_32(qp->mr.lkey);
+	data->src_mkey = rte_cpu_to_be_32(qp->mkey[idx]->id);
+	data->dst_mkey = data->src_mkey;
+	data->src_addr = 0;
+	data->src_bytes = sess->aad_len + op->sym->aead.data.length;
+	data->dst_bytes = data->src_bytes;
+	if (op_info->is_enc)
+		data->dst_bytes += sess->tag_len;
+	else
+		data->src_bytes += sess->tag_len;
+	if (op_info->is_oop)
+		data->dst_addr = (void *)(uintptr_t)(data->src_bytes);
+	else
+		data->dst_addr = 0;
+	/* Clear the padding memory. */
+	memset(&klm[klm_size], 0, sizeof(struct mlx5_klm) * (klm_align - klm_size));
+	/* Update PI and WQE */
+	qp->umr_pi += MLX5_UMR_GCM_WQE_STRIDE;
+	qp->umr_wqe = (uint8_t *)wqe;
+	return 0;
+}
+
+static __rte_always_inline void
+mlx5_crypto_gcm_build_send_en(struct mlx5_crypto_qp *qp)
+{
+	uint32_t wqe_offset = (qp->umr_pi & (qp->umr_wqbbs - 1)) * MLX5_SEND_WQE_BB;
+	struct mlx5_wqe_cseg *cs = RTE_PTR_ADD(qp->umr_qp_obj.wqes, wqe_offset);
+	struct mlx5_wqe_qseg *qs = RTE_PTR_ADD(cs, sizeof(struct mlx5_wqe_cseg));
+
+	cs->opcode = rte_cpu_to_be_32(MLX5_OPCODE_SEND_EN | ((uint32_t)qp->umr_pi << 8));
+	cs->sq_ds = rte_cpu_to_be_32((qp->umr_qp_obj.qp->id << 8) | 2);
+	/*
+	 * No need to generate the SEND_EN CQE as we want only GGA CQE
+	 * in the CQ normally. We can compare qp->last_send_gga_pi with
+	 * qp->pi to know if all SEND_EN be consumed.
+	 */
+	cs->flags = RTE_BE32((MLX5_COMP_ONLY_FIRST_ERR << MLX5_COMP_MODE_OFFSET) |
+			MLX5_WQE_CTRL_INITIATOR_SMALL_FENCE);
+	cs->misc = RTE_BE32(0);
+	qs->max_index = rte_cpu_to_be_32(qp->pi);
+	qs->qpn_cqn = rte_cpu_to_be_32(qp->qp_obj.qp->id);
+	qp->umr_wqe = (uint8_t *)cs;
+	qp->umr_pi += 1;
+}
+
+static __rte_always_inline void
+mlx5_crypto_gcm_wqe_set(struct mlx5_crypto_qp *qp,
+			struct rte_crypto_op *op,
+			uint32_t idx,
+			struct mlx5_crypto_gcm_data *data)
+{
+	struct mlx5_crypto_session *sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+	struct mlx5_gga_wqe *wqe = &((struct mlx5_gga_wqe *)qp->qp_obj.wqes)[idx];
+	union mlx5_gga_crypto_opaque *opaq = qp->opaque_addr;
+
+	memcpy(opaq[idx].cp.iv,
+		rte_crypto_op_ctod_offset(op, uint8_t *, sess->iv_offset), sess->iv_len);
+	opaq[idx].cp.tag_size = sess->wqe_tag_len;
+	opaq[idx].cp.aad_size = sess->wqe_aad_len;
+	/* Update control seg. */
+	wqe->opcode = rte_cpu_to_be_32(MLX5_MMO_CRYPTO_OPC + (qp->pi << 8));
+	wqe->gga_ctrl1 = sess->mmo_ctrl;
+	wqe->gga_ctrl2 = sess->dek_id;
+	wqe->flags = RTE_BE32(MLX5_COMP_ONLY_FIRST_ERR << MLX5_COMP_MODE_OFFSET);
+	/* Update op_info seg. */
+	wqe->gather.bcount = rte_cpu_to_be_32(data->src_bytes);
+	wqe->gather.lkey = data->src_mkey;
+	wqe->gather.pbuf = rte_cpu_to_be_64((uintptr_t)data->src_addr);
+	/* Update output seg. */
+	wqe->scatter.bcount = rte_cpu_to_be_32(data->dst_bytes);
+	wqe->scatter.lkey = data->dst_mkey;
+	wqe->scatter.pbuf = rte_cpu_to_be_64((uintptr_t)data->dst_addr);
+	qp->wqe = (uint8_t *)wqe;
+}
+
+static uint16_t
+mlx5_crypto_gcm_enqueue_burst(void *queue_pair,
+			      struct rte_crypto_op **ops,
+			      uint16_t nb_ops)
+{
+	struct mlx5_crypto_qp *qp = queue_pair;
+	struct mlx5_crypto_session *sess;
+	struct mlx5_crypto_priv *priv = qp->priv;
+	struct mlx5_crypto_gcm_tag_cpy_info *tag;
+	struct mlx5_crypto_gcm_data gcm_data;
+	struct rte_crypto_op *op;
+	struct mlx5_crypto_gcm_op_info op_info;
+	uint16_t mask = qp->entries_n - 1;
+	uint16_t remain = qp->entries_n - (qp->pi - qp->qp_ci);
+	uint32_t idx;
+	uint16_t umr_cnt = 0;
+
+	if (remain < nb_ops)
+		nb_ops = remain;
+	else
+		remain = nb_ops;
+	if (unlikely(remain == 0))
+		return 0;
+	do {
+		op = *ops++;
+		sess = CRYPTODEV_GET_SYM_SESS_PRIV(op->sym->session);
+		idx = qp->pi & mask;
+		mlx5_crypto_gcm_get_op_info(qp, op, &op_info);
+		if (!op_info.need_umr) {
+			gcm_data.src_addr = op_info.src_addr;
+			gcm_data.src_bytes = op->sym->aead.data.length + sess->aad_len;
+			gcm_data.src_mkey = mlx5_mr_mb2mr(&qp->mr_ctrl, op->sym->m_src);
+			if (op_info.is_oop) {
+				gcm_data.dst_addr = RTE_PTR_SUB
+					(rte_pktmbuf_mtod_offset(op->sym->m_dst,
+					 void *, op->sym->aead.data.offset), sess->aad_len);
+				gcm_data.dst_mkey = mlx5_mr_mb2mr(&qp->mr_ctrl, op->sym->m_dst);
+			} else {
+				gcm_data.dst_addr = gcm_data.src_addr;
+				gcm_data.dst_mkey = gcm_data.src_mkey;
+			}
+			gcm_data.dst_bytes = gcm_data.src_bytes;
+			if (op_info.is_enc)
+				gcm_data.dst_bytes += sess->tag_len;
+			else
+				gcm_data.src_bytes += sess->tag_len;
+		} else {
+			if (unlikely(mlx5_crypto_gcm_build_umr(qp, op, idx,
+							&op_info, &gcm_data))) {
+				qp->stats.enqueue_err_count++;
+				if (remain != nb_ops) {
+					qp->stats.enqueued_count -= remain;
+					break;
+				}
+				return 0;
+			}
+			umr_cnt++;
+		}
+		mlx5_crypto_gcm_wqe_set(qp, op, idx, &gcm_data);
+		if (op_info.digest) {
+			tag = (struct mlx5_crypto_gcm_tag_cpy_info *)op->sym->aead.digest.data;
+			tag->digest = op_info.digest;
+			tag->tag_len = sess->tag_len;
+			op->status = MLX5_CRYPTO_OP_STATUS_GCM_TAG_COPY;
+		} else {
+			op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
+		}
+		qp->ops[idx] = op;
+		qp->pi++;
+	} while (--remain);
+	qp->stats.enqueued_count += nb_ops;
+	/* Update the last GGA cseg with COMP. */
+	((struct mlx5_wqe_cseg *)qp->wqe)->flags =
+		RTE_BE32(MLX5_COMP_ALWAYS << MLX5_COMP_MODE_OFFSET);
+	/* Only when there are no pending SEND_EN WQEs in background. */
+	if (!umr_cnt && !qp->has_umr) {
+		mlx5_doorbell_ring(&priv->uar.bf_db, *(volatile uint64_t *)qp->wqe,
+				   qp->pi, &qp->qp_obj.db_rec[MLX5_SND_DBR],
+				   !priv->uar.dbnc);
+	} else {
+		mlx5_crypto_gcm_build_send_en(qp);
+		mlx5_doorbell_ring(&priv->uar.bf_db, *(volatile uint64_t *)qp->umr_wqe,
+				   qp->umr_pi, &qp->umr_qp_obj.db_rec[MLX5_SND_DBR],
+				   !priv->uar.dbnc);
+		qp->last_gga_pi = qp->pi;
+		qp->has_umr = true;
+	}
+	return nb_ops;
+}
+
+static __rte_noinline void
+mlx5_crypto_gcm_cqe_err_handle(struct mlx5_crypto_qp *qp, struct rte_crypto_op *op)
+{
+	uint8_t op_code;
+	const uint32_t idx = qp->cq_ci & (qp->entries_n - 1);
+	volatile struct mlx5_err_cqe *cqe = (volatile struct mlx5_err_cqe *)
+							&qp->cq_obj.cqes[idx];
+
+	op_code = rte_be_to_cpu_32(cqe->s_wqe_opcode_qpn) >> MLX5_CQ_INDEX_WIDTH;
+	DRV_LOG(ERR, "CQE ERR:0x%x, Vendor_ERR:0x%x, OP:0x%x, QPN:0x%x, WQE_CNT:0x%x",
+		cqe->syndrome, cqe->vendor_err_synd, op_code,
+		(rte_be_to_cpu_32(cqe->s_wqe_opcode_qpn) & 0xffffff),
+		rte_be_to_cpu_16(cqe->wqe_counter));
+	if (op && op_code == MLX5_OPCODE_MMO) {
+		op->status = RTE_CRYPTO_OP_STATUS_ERROR;
+		qp->stats.dequeue_err_count++;
+	}
+}
+
+static __rte_always_inline void
+mlx5_crypto_gcm_fill_op(struct mlx5_crypto_qp *qp,
+			struct rte_crypto_op **ops,
+			uint16_t orci,
+			uint16_t rci,
+			uint16_t op_mask)
+{
+	uint16_t n;
+
+	orci &= op_mask;
+	rci &= op_mask;
+	if (unlikely(orci > rci)) {
+		n = op_mask - orci + 1;
+		memcpy(ops, &qp->ops[orci], n * sizeof(*ops));
+		orci = 0;
+	} else {
+		n = 0;
+	}
+	/* rci can be 0 here, memcpy will skip that. */
+	memcpy(&ops[n], &qp->ops[orci], (rci - orci) * sizeof(*ops));
+}
+
+static __rte_always_inline void
+mlx5_crypto_gcm_cpy_tag(struct mlx5_crypto_qp *qp,
+			uint16_t orci,
+			uint16_t rci,
+			uint16_t op_mask)
+{
+	struct rte_crypto_op *op;
+	struct mlx5_crypto_gcm_tag_cpy_info *tag;
+
+	while (qp->cpy_tag_op && orci != rci) {
+		op = qp->ops[orci & op_mask];
+		if (op->status == MLX5_CRYPTO_OP_STATUS_GCM_TAG_COPY) {
+			tag = (struct mlx5_crypto_gcm_tag_cpy_info *)op->sym->aead.digest.data;
+			memcpy(op->sym->aead.digest.data, tag->digest, tag->tag_len);
+			op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
+			qp->cpy_tag_op--;
+		}
+		orci++;
+	}
+}
+
+static uint16_t
+mlx5_crypto_gcm_dequeue_burst(void *queue_pair,
+			      struct rte_crypto_op **ops,
+			      uint16_t nb_ops)
+{
+	struct mlx5_crypto_qp *qp = queue_pair;
+	volatile struct mlx5_cqe *restrict cqe;
+	const unsigned int cq_size = qp->cq_entries_n;
+	const unsigned int mask = cq_size - 1;
+	const unsigned int op_mask = qp->entries_n - 1;
+	uint32_t idx;
+	uint32_t next_idx = qp->cq_ci & mask;
+	uint16_t reported_ci = qp->reported_ci;
+	uint16_t qp_ci = qp->qp_ci;
+	const uint16_t max = RTE_MIN((uint16_t)(qp->pi - reported_ci), nb_ops);
+	uint16_t op_num = 0;
+	int ret;
+
+	if (unlikely(max == 0))
+		return 0;
+	while (qp_ci - reported_ci < max) {
+		idx = next_idx;
+		next_idx = (qp->cq_ci + 1) & mask;
+		cqe = &qp->cq_obj.cqes[idx];
+		ret = check_cqe(cqe, cq_size, qp->cq_ci);
+		if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
+			if (unlikely(ret != MLX5_CQE_STATUS_HW_OWN))
+				mlx5_crypto_gcm_cqe_err_handle(qp,
+						qp->ops[reported_ci & op_mask]);
+			break;
+		}
+		qp_ci = rte_be_to_cpu_16(cqe->wqe_counter) + 1;
+		if (qp->has_umr &&
+		    (qp->last_gga_pi + 1) == qp_ci)
+			qp->has_umr = false;
+		qp->cq_ci++;
+	}
+	/* If wqe_counter changed, means CQE handled. */
+	if (likely(qp->qp_ci != qp_ci)) {
+		qp->qp_ci = qp_ci;
+		rte_io_wmb();
+		qp->cq_obj.db_rec[0] = rte_cpu_to_be_32(qp->cq_ci);
+	}
+	/* If reported_ci is not same with qp_ci, means op retrieved. */
+	if (qp_ci != reported_ci) {
+		op_num = RTE_MIN((uint16_t)(qp_ci - reported_ci), max);
+		reported_ci += op_num;
+		mlx5_crypto_gcm_cpy_tag(qp, qp->reported_ci, reported_ci, op_mask);
+		mlx5_crypto_gcm_fill_op(qp, ops, qp->reported_ci, reported_ci, op_mask);
+		qp->stats.dequeued_count += op_num;
+		qp->reported_ci = reported_ci;
+	}
+	return op_num;
+}
+
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 {
@@ -339,6 +925,8 @@ mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 	mlx5_os_set_reg_mr_cb(&priv->reg_mr_cb, &priv->dereg_mr_cb);
 	dev_ops->queue_pair_setup = mlx5_crypto_gcm_qp_setup;
 	dev_ops->queue_pair_release = mlx5_crypto_gcm_qp_release;
+	crypto_dev->dequeue_burst = mlx5_crypto_gcm_dequeue_burst;
+	crypto_dev->enqueue_burst = mlx5_crypto_gcm_enqueue_burst;
 	priv->max_klm_num = RTE_ALIGN((priv->max_segs_num + 1) * 2 + 1, MLX5_UMR_KLM_NUM_ALIGN);
 	priv->caps = mlx5_crypto_gcm_caps;
 	return 0;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v4 9/9] crypto/mlx5: enable AES-GCM capability
  2023-06-20 14:11 ` [PATCH v4 " Suanming Mou
                     ` (7 preceding siblings ...)
  2023-06-20 14:11   ` [PATCH v4 8/9] crypto/mlx5: add enqueue and dequeue operations Suanming Mou
@ 2023-06-20 14:11   ` Suanming Mou
  2023-06-20 18:49   ` [EXT] [PATCH v4 0/9] crypto/mlx5: support AES-GCM Akhil Goyal
  9 siblings, 0 replies; 54+ messages in thread
From: Suanming Mou @ 2023-06-20 14:11 UTC (permalink / raw)
  To: gakhil, Matan Azrad; +Cc: rasland, dev

This commit generates AES-GCM capability based on the NIC
attributes and enables AES-GCM algo.

An new devarg "algo" is added to identify if the crypto PMD will
be initialized as AES-GCM(algo=1) or AES-XTS(algo=0, default).

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 doc/guides/cryptodevs/features/mlx5.ini |  2 +
 doc/guides/cryptodevs/mlx5.rst          | 48 ++++++++++++++++++-
 doc/guides/rel_notes/release_23_07.rst  |  1 +
 drivers/crypto/mlx5/mlx5_crypto.c       | 26 ++++++++--
 drivers/crypto/mlx5/mlx5_crypto.h       |  1 +
 drivers/crypto/mlx5/mlx5_crypto_gcm.c   | 63 +++++++++++++++++++++++++
 6 files changed, 136 insertions(+), 5 deletions(-)

diff --git a/doc/guides/cryptodevs/features/mlx5.ini b/doc/guides/cryptodevs/features/mlx5.ini
index 0d210b2114..9bf1defac8 100644
--- a/doc/guides/cryptodevs/features/mlx5.ini
+++ b/doc/guides/cryptodevs/features/mlx5.ini
@@ -30,6 +30,8 @@ AES XTS (256)  = Y
 ; Supported AEAD algorithms of a mlx5 crypto driver.
 ;
 [AEAD]
+AES GCM (128)  = Y
+AES GCM (256)  = Y
 
 ;
 ; Supported Asymmetric algorithms of a mlx5 crypto driver.
diff --git a/doc/guides/cryptodevs/mlx5.rst b/doc/guides/cryptodevs/mlx5.rst
index b35ac5f5f2..9a0ae8b0d2 100644
--- a/doc/guides/cryptodevs/mlx5.rst
+++ b/doc/guides/cryptodevs/mlx5.rst
@@ -21,6 +21,11 @@ and **NVIDIA BlueField-3** family adapters.
 Overview
 --------
 
+Nvidia MLX5 crypto driver supports AES-XTs and AES-GCM cryption.
+
+AES-XTS
+-------
+
 The device can provide disk encryption services,
 allowing data encryption and decryption towards a disk.
 Having all encryption/decryption operations done in a single device
@@ -38,13 +43,19 @@ The encryption does not require text to be aligned to the AES block size (128b).
 
 See :doc:`../../platform/mlx5` guide for more design details.
 
+AES-GCM
+-------
+The encryption and decryption processes the traffic as standard RTE crypto
+API defines. The supported AAD/digest/key size can be read from dev_info.
+
+
 Configuration
 -------------
 
 See the :ref:`mlx5 common configuration <mlx5_common_env>`.
 
 A device comes out of NVIDIA factory with pre-defined import methods.
-There are two possible import methods: wrapped or plaintext.
+There are two possible import methods: wrapped or plaintext(valid to AES-XTS only).
 
 In case the device is in wrapped mode, it needs to be moved to crypto operational mode.
 In order to move the device to crypto operational mode, credential and KEK
@@ -120,24 +131,36 @@ Driver options
 Please refer to :ref:`mlx5 common options <mlx5_common_driver_options>`
 for an additional list of options shared with other mlx5 drivers.
 
+- ``algo`` parameter [int]
+
+  - 0. AES-XTS crypto.
+
+  - 1. AES-GCM crypto.
+
+  Set to zero(AES-XTS) by default.
+
 - ``wcs_file`` parameter [string] - mandatory in wrapped mode
 
   File path including only the wrapped credential in string format of hexadecimal
   numbers, represent 48 bytes (8 bytes IV added by the AES key wrap algorithm).
+  This option is valid only to AES-XTS.
 
 - ``import_kek_id`` parameter [int]
 
   The identifier of the KEK, default value is 0 represents the operational
   register import_kek..
+  This option is valid only to AES-XTS.
 
 - ``credential_id`` parameter [int]
 
   The identifier of the credential, default value is 0 represents the operational
   register credential.
+  This option is valid only to AES-XTS.
 
 - ``keytag`` parameter [int]
 
   The plaintext of the keytag appended to the AES-XTS keys, default value is 0.
+  This option is valid only to AES-XTS.
 
 - ``max_segs_num`` parameter [int]
 
@@ -161,6 +184,8 @@ Limitations
 - The supported data-unit lengths are 512B and 4KB and 1MB. In case the `dataunit_len`
   is not provided in the cipher xform, the OP length is limited to the above
   values.
+- AES-GCM is only supported on BlueField-3.
+- AES-GCM only supported key import plaintext mode.
 
 
 Prerequisites
@@ -172,6 +197,7 @@ FW Prerequisites
 - xx.31.0328 for ConnectX-6.
 - xx.32.0108 for ConnectX-6 Dx and BlueField-2.
 - xx.36.xxxx for ConnectX-7 and BlueField-3.
+- xx.37.3010 for BlueField-3 and newer for AES-GCM.
 
 Linux Prerequisites
 ~~~~~~~~~~~~~~~~~~~
@@ -186,3 +212,23 @@ Windows Prerequisites
 
 - NVIDIA WINOF-2 version: **2.60** or higher.
   See :ref:`mlx5 common prerequisites <mlx5_windows_prerequisites>` for more details.
+
+
+Notes for rte_crypto AES-GCM
+----------------------------
+
+In AES-GCM mode, the HW requires continuous input and output of Additional
+Authenticated Data (AAD), payload, and digest (if needed). However, the RTE
+API only provides a single AAD input, which means that in the out-of-place
+mode, the AAD will be used in both input and output. This reuse of AAD in the
+out-of-place mode breaks the continuous output, which degrades the performance
+and introduces extra UMR WQE. If digest is not continuous after payload will
+also lead to that extra UMR WQE.
+
+To address this issue, current RTE API provides min_mbuf_headroom_req and
+min_mbuf_tailroom_req in rte_cryptodev_info as a hint to the PMD. It
+indicates the PMD can use the buffer before and after the mbuf payload as AAD
+and digest space. With this hint, the PMD will use the buffer before and
+after the mbuf payload directly via copying AAD and digest. However, the
+application must ensure that there is enough headroom and tailroom reserved
+for the mbuf. Or, for non-continuous operations, extra UMR WQE will be used.
diff --git a/doc/guides/rel_notes/release_23_07.rst b/doc/guides/rel_notes/release_23_07.rst
index 027ae7bd2d..bbb8eddbca 100644
--- a/doc/guides/rel_notes/release_23_07.rst
+++ b/doc/guides/rel_notes/release_23_07.rst
@@ -131,6 +131,7 @@ New Features
   * Added support for CQE compression on Windows.
   * Added support for enhanced multi-packet write on Windows.
   * Added support for quota flow action and item.
+  * Added support for AES-GCM crypto.
 
 * **Added vmxnet3 version 7 support.**
 
diff --git a/drivers/crypto/mlx5/mlx5_crypto.c b/drivers/crypto/mlx5/mlx5_crypto.c
index 4d7d3ef2a3..081e96ad4d 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.c
+++ b/drivers/crypto/mlx5/mlx5_crypto.c
@@ -269,6 +269,14 @@ mlx5_crypto_args_check_handler(const char *key, const char *val, void *opaque)
 		attr->credential_pointer = (uint32_t)tmp;
 	} else if (strcmp(key, "keytag") == 0) {
 		devarg_prms->keytag = tmp;
+	} else if (strcmp(key, "algo") == 0) {
+		if (tmp == 1) {
+			devarg_prms->is_aes_gcm = 1;
+		} else if (tmp > 1) {
+			DRV_LOG(ERR, "Invalid algo.");
+			rte_errno = EINVAL;
+			return -rte_errno;
+		}
 	}
 	return 0;
 }
@@ -285,6 +293,7 @@ mlx5_crypto_parse_devargs(struct mlx5_kvargs_ctrl *mkvlist,
 		"keytag",
 		"max_segs_num",
 		"wcs_file",
+		"algo",
 		NULL,
 	};
 
@@ -370,10 +379,19 @@ mlx5_crypto_dev_probe(struct mlx5_common_device *cdev,
 	priv->crypto_dev = crypto_dev;
 	priv->is_wrapped_mode = wrapped_mode;
 	priv->max_segs_num = devarg_prms.max_segs_num;
-	ret = mlx5_crypto_xts_init(priv);
-	if (ret) {
-		DRV_LOG(ERR, "Failed to init AES-XTS crypto.");
-		return -ENOTSUP;
+	/* Init and override AES-GCM configuration. */
+	if (devarg_prms.is_aes_gcm) {
+		ret = mlx5_crypto_gcm_init(priv);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to init AES-GCM crypto.");
+			return -ENOTSUP;
+		}
+	} else {
+		ret = mlx5_crypto_xts_init(priv);
+		if (ret) {
+			DRV_LOG(ERR, "Failed to init AES-XTS crypto.");
+			return -ENOTSUP;
+		}
 	}
 	if (mlx5_devx_uar_prepare(cdev, &priv->uar) != 0) {
 		rte_cryptodev_pmd_destroy(priv->crypto_dev);
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index 6dcb41b27c..36dacdcda4 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -92,6 +92,7 @@ struct mlx5_crypto_devarg_params {
 	struct mlx5_devx_crypto_login_attr login_attr;
 	uint64_t keytag;
 	uint32_t max_segs_num;
+	uint32_t is_aes_gcm:1;
 };
 
 struct mlx5_crypto_session {
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index 8389c03c91..e26a338365 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -109,6 +109,60 @@ mlx5_crypto_dek_fill_gcm_attr(struct mlx5_crypto_dek *dek,
 	return 0;
 }
 
+static int
+mlx5_crypto_generate_gcm_cap(struct mlx5_hca_crypto_mmo_attr *mmo_attr,
+			     struct rte_cryptodev_capabilities *cap)
+{
+	/* Init key size. */
+	if (mmo_attr->gcm_128_encrypt && mmo_attr->gcm_128_decrypt &&
+		mmo_attr->gcm_256_encrypt && mmo_attr->gcm_256_decrypt) {
+		cap->sym.aead.key_size.min = 16;
+		cap->sym.aead.key_size.max = 32;
+		cap->sym.aead.key_size.increment = 16;
+	} else if (mmo_attr->gcm_256_encrypt && mmo_attr->gcm_256_decrypt) {
+		cap->sym.aead.key_size.min = 32;
+		cap->sym.aead.key_size.max = 32;
+		cap->sym.aead.key_size.increment = 0;
+	} else if (mmo_attr->gcm_128_encrypt && mmo_attr->gcm_128_decrypt) {
+		cap->sym.aead.key_size.min = 16;
+		cap->sym.aead.key_size.max = 16;
+		cap->sym.aead.key_size.increment = 0;
+	} else {
+		DRV_LOG(ERR, "No available AES-GCM encryption/decryption supported.");
+		return -1;
+	}
+	/* Init tag size. */
+	if (mmo_attr->gcm_auth_tag_128 && mmo_attr->gcm_auth_tag_96) {
+		cap->sym.aead.digest_size.min = 12;
+		cap->sym.aead.digest_size.max = 16;
+		cap->sym.aead.digest_size.increment = 4;
+	} else if (mmo_attr->gcm_auth_tag_96) {
+		cap->sym.aead.digest_size.min = 12;
+		cap->sym.aead.digest_size.max = 12;
+		cap->sym.aead.digest_size.increment = 0;
+	} else if (mmo_attr->gcm_auth_tag_128) {
+		cap->sym.aead.digest_size.min = 16;
+		cap->sym.aead.digest_size.max = 16;
+		cap->sym.aead.digest_size.increment = 0;
+	} else {
+		DRV_LOG(ERR, "No available AES-GCM tag size supported.");
+		return -1;
+	}
+	/* Init AAD size. */
+	cap->sym.aead.aad_size.min = 0;
+	cap->sym.aead.aad_size.max = UINT16_MAX;
+	cap->sym.aead.aad_size.increment = 1;
+	/* Init IV size. */
+	cap->sym.aead.iv_size.min = 12;
+	cap->sym.aead.iv_size.max = 12;
+	cap->sym.aead.iv_size.increment = 0;
+	/* Init left items. */
+	cap->op = RTE_CRYPTO_OP_TYPE_SYMMETRIC;
+	cap->sym.xform_type = RTE_CRYPTO_SYM_XFORM_AEAD;
+	cap->sym.aead.algo = RTE_CRYPTO_AEAD_AES_GCM;
+	return 0;
+}
+
 static int
 mlx5_crypto_sym_gcm_session_configure(struct rte_cryptodev *dev,
 				  struct rte_crypto_sym_xform *xform,
@@ -917,8 +971,10 @@ mlx5_crypto_gcm_dequeue_burst(void *queue_pair,
 int
 mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 {
+	struct mlx5_common_device *cdev = priv->cdev;
 	struct rte_cryptodev *crypto_dev = priv->crypto_dev;
 	struct rte_cryptodev_ops *dev_ops = crypto_dev->dev_ops;
+	int ret;
 
 	/* Override AES-GCM specified ops. */
 	dev_ops->sym_session_configure = mlx5_crypto_sym_gcm_session_configure;
@@ -928,6 +984,13 @@ mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 	crypto_dev->dequeue_burst = mlx5_crypto_gcm_dequeue_burst;
 	crypto_dev->enqueue_burst = mlx5_crypto_gcm_enqueue_burst;
 	priv->max_klm_num = RTE_ALIGN((priv->max_segs_num + 1) * 2 + 1, MLX5_UMR_KLM_NUM_ALIGN);
+	/* Generate GCM capability. */
+	ret = mlx5_crypto_generate_gcm_cap(&cdev->config.hca_attr.crypto_mmo,
+					   mlx5_crypto_gcm_caps);
+	if (ret) {
+		DRV_LOG(ERR, "No enough AES-GCM cap.");
+		return -1;
+	}
 	priv->caps = mlx5_crypto_gcm_caps;
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [EXT] [PATCH v4 0/9] crypto/mlx5: support AES-GCM
  2023-06-20 14:11 ` [PATCH v4 " Suanming Mou
                     ` (8 preceding siblings ...)
  2023-06-20 14:11   ` [PATCH v4 9/9] crypto/mlx5: enable AES-GCM capability Suanming Mou
@ 2023-06-20 18:49   ` Akhil Goyal
  2023-06-23  9:31     ` Thomas Monjalon
  9 siblings, 1 reply; 54+ messages in thread
From: Akhil Goyal @ 2023-06-20 18:49 UTC (permalink / raw)
  To: Suanming Mou; +Cc: rasland, dev

> AES-GCM provides both authenticated encryption and the ability to check
> the integrity and authentication of additional authenticated data (AAD)
> that is sent in the clear.
> 
> The crypto operations are performed with crypto WQE. If the input
> buffers(AAD, mbuf, digest) are not contiguous and there is no enough
> headroom or tailroom for AAD or digest, as the requirement from FW, an
> UMR WQE is needed to generate contiguous address space for crypto WQE.
> The UMR WQE and crypto WQE are handled in two different QPs.
> 
> The QP for UMR operation contains two types of WQE, UMR and SEND_EN
> WQE. The WQEs are built dynamically according to the crypto operation
> buffer address. Crypto operation with non-contiguous buffers will
> have its own UMR WQE, while the operation with contiguous buffers
> doesn't need the UMR WQE. Once the all the operations WQE in the
> enqueue burst built finishes, if any UMR WQEs are built, additional
> SEND_EN WQE will be as the final WQE of the burst in the UMR QP.
> The purpose of that SEND_EN WQE is to trigger the crypto QP processing
> with the UMR ready input memory address space buffers.
> 
> The QP for crypto operations contains only the crypto WQE and the QP
> WQEs are built as fixed in QP setup. The QP processing is triggered
> by doorbell ring or the SEND_EN WQE from UMR QP.
> 
> v2:
>   - split XTS and GCM code to different file.
>   - add headroom and tailroom optimize.
> 
> v3:
>  - fix AES-GCM 128b key creation.
> 
> v4:
>  - add missing feature cap in mlx5.ini
Series applied to dpdk-next-crypto
Thanks.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [EXT] [PATCH v4 0/9] crypto/mlx5: support AES-GCM
  2023-06-20 18:49   ` [EXT] [PATCH v4 0/9] crypto/mlx5: support AES-GCM Akhil Goyal
@ 2023-06-23  9:31     ` Thomas Monjalon
  0 siblings, 0 replies; 54+ messages in thread
From: Thomas Monjalon @ 2023-06-23  9:31 UTC (permalink / raw)
  To: Suanming Mou, Akhil Goyal; +Cc: dev, rasland

20/06/2023 20:49, Akhil Goyal:
> > AES-GCM provides both authenticated encryption and the ability to check
> > the integrity and authentication of additional authenticated data (AAD)
> > that is sent in the clear.
[...]
> Series applied to dpdk-next-crypto

This has to be added to avoid a compilation failure with MinGW:

@@ -25,6 +25,8 @@ mlx5_crypto_dek_get_key(struct rte_crypto_sym_xform *xform,
                *key = xform->aead.key.data;
                *key_len = xform->aead.key.length;
        } else {
+               *key = NULL;
+               *key_len = 0;
                DRV_LOG(ERR, "Xform dek type not supported.");
                rte_errno = -EINVAL;
                return -1;

I will squash it where appropriate.



^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2023-06-23  9:32 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-18  9:23 [RFC PATCH 0/5] crypto/mlx5: support AES-GCM Suanming Mou
2023-04-18  9:23 ` [RFC PATCH 1/5] crypto/mlx5: add AES-GCM capability Suanming Mou
2023-05-17  7:37   ` [EXT] " Akhil Goyal
2023-05-17  7:42     ` Suanming Mou
2023-05-17  7:47       ` Akhil Goyal
2023-05-17  7:51         ` Suanming Mou
2023-05-17  8:02           ` Akhil Goyal
2023-05-17  8:06             ` Suanming Mou
2023-04-18  9:23 ` [RFC PATCH 2/5] crypto/mlx5: add AES-GCM encryption key Suanming Mou
2023-04-18  9:23 ` [RFC PATCH 3/5] crypto/mlx5: add AES-GCM session configure Suanming Mou
2023-04-18  9:23 ` [RFC PATCH 4/5] crypto/mlx5: add queue pair setup Suanming Mou
2023-04-18  9:23 ` [RFC PATCH 5/5] crypto/mlx5: add enqueue and dequeue operations Suanming Mou
2023-05-26  3:14 ` [PATCH v2 0/9] crypto/mlx5: support AES-GCM Suanming Mou
2023-05-26  3:14   ` [PATCH v2 1/9] common/mlx5: export memory region lookup by address Suanming Mou
2023-05-26  3:14   ` [PATCH v2 2/9] crypto/mlx5: split AES-XTS Suanming Mou
2023-05-26  3:14   ` [PATCH v2 3/9] crypto/mlx5: add AES-GCM query and initialization Suanming Mou
2023-05-26  3:14   ` [PATCH v2 4/9] crypto/mlx5: add AES-GCM encryption key Suanming Mou
2023-05-26  3:14   ` [PATCH v2 5/9] crypto/mlx5: add AES-GCM session configure Suanming Mou
2023-05-26  3:14   ` [PATCH v2 6/9] common/mlx5: add WQE-based QP synchronous basics Suanming Mou
2023-05-26  3:14   ` [PATCH v2 7/9] crypto/mlx5: add queue pair setup for GCM Suanming Mou
2023-05-26  3:14   ` [PATCH v2 8/9] crypto/mlx5: add enqueue and dequeue operations Suanming Mou
2023-05-26  3:14   ` [PATCH v2 9/9] crypto/mlx5: enable AES-GCM capability Suanming Mou
2023-06-14 18:11   ` [EXT] [PATCH v2 0/9] crypto/mlx5: support AES-GCM Akhil Goyal
2023-06-20  1:22     ` Suanming Mou
2023-06-20  1:23 ` Suanming Mou
2023-06-20  1:23   ` [PATCH v3 1/9] common/mlx5: export memory region lookup by address Suanming Mou
2023-06-20  1:23   ` [PATCH v3 2/9] crypto/mlx5: split AES-XTS Suanming Mou
2023-06-20  1:23   ` [PATCH v3 3/9] crypto/mlx5: add AES-GCM query and initialization Suanming Mou
2023-06-20  1:23   ` [PATCH v3 4/9] crypto/mlx5: add AES-GCM encryption key Suanming Mou
2023-06-20  1:23   ` [PATCH v3 5/9] crypto/mlx5: add AES-GCM session configure Suanming Mou
2023-06-20  1:23   ` [PATCH v3 6/9] common/mlx5: add WQE-based QP synchronous basics Suanming Mou
2023-06-20  1:23   ` [PATCH v3 7/9] crypto/mlx5: add queue pair setup for GCM Suanming Mou
2023-06-20  1:23   ` [PATCH v3 8/9] crypto/mlx5: add enqueue and dequeue operations Suanming Mou
2023-06-20  1:23   ` [PATCH v3 9/9] crypto/mlx5: enable AES-GCM capability Suanming Mou
2023-06-20  9:25     ` [EXT] " Akhil Goyal
2023-06-20  9:42       ` Suanming Mou
2023-06-20  9:48         ` Akhil Goyal
2023-06-20  9:56           ` Suanming Mou
2023-06-20  9:55   ` [PATCH v2 0/9] crypto/mlx5: support AES-GCM Suanming Mou
2023-06-20  9:58     ` Akhil Goyal
2023-06-20 10:03       ` Suanming Mou
2023-06-20 13:52         ` Matan Azrad
2023-06-20 14:11 ` [PATCH v4 " Suanming Mou
2023-06-20 14:11   ` [PATCH v4 1/9] common/mlx5: export memory region lookup by address Suanming Mou
2023-06-20 14:11   ` [PATCH v4 2/9] crypto/mlx5: split AES-XTS Suanming Mou
2023-06-20 14:11   ` [PATCH v4 3/9] crypto/mlx5: add AES-GCM query and initialization Suanming Mou
2023-06-20 14:11   ` [PATCH v4 4/9] crypto/mlx5: add AES-GCM encryption key Suanming Mou
2023-06-20 14:11   ` [PATCH v4 5/9] crypto/mlx5: add AES-GCM session configure Suanming Mou
2023-06-20 14:11   ` [PATCH v4 6/9] common/mlx5: add WQE-based QP synchronous basics Suanming Mou
2023-06-20 14:11   ` [PATCH v4 7/9] crypto/mlx5: add queue pair setup for GCM Suanming Mou
2023-06-20 14:11   ` [PATCH v4 8/9] crypto/mlx5: add enqueue and dequeue operations Suanming Mou
2023-06-20 14:11   ` [PATCH v4 9/9] crypto/mlx5: enable AES-GCM capability Suanming Mou
2023-06-20 18:49   ` [EXT] [PATCH v4 0/9] crypto/mlx5: support AES-GCM Akhil Goyal
2023-06-23  9:31     ` Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).