DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [RFC PATCH 0/9] security: add software synchronous crypto process
@ 2019-09-03 15:40 Fan Zhang
  2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API Fan Zhang
                   ` (9 more replies)
  0 siblings, 10 replies; 87+ messages in thread
From: Fan Zhang @ 2019-09-03 15:40 UTC (permalink / raw)
  To: dev
  Cc: akhil.goyal, konstantin.ananyev, declan.doherty,
	pablo.de.lara.guarch, Fan Zhang

This RFC patch adds a way to rte_security to process symmetric crypto
workload in bulk synchronously for SW crypto devices. 

Originally both SW and HW crypto PMDs works under rte_cryptodev to
process the crypto workload asynchronously. This way provides uniformity
to both PMD types but also introduce unnecessary performance penalty to
SW PMDs such as extra SW ring enqueue/dequeue steps to "simulate"
asynchronous working manner and unnecessary HW addresses computation.

We introduce a new way for SW crypto devices that perform crypto operation
synchronously with only fields required for the computation as input. The
proof-of-concept AESNI-GCM and AESNI-MB SW PMDs are updated with the
support of this new method. To demonstrate the performance gain with
this method 2 simple performance evaluation apps under unit-test are added
"app/test: security_aesni_gcm_perftest/security_aesni_mb_perftest". The
users can freely compare their results against crypto perf application
results.

Fan Zhang (9):
  security: introduce CPU Crypto action type and API
  crypto/aesni_gcm: add rte_security handler
  app/test: add security cpu crypto autotest
  app/test: add security cpu crypto perftest
  crypto/aesni_mb: add rte_security handler
  app/test: add aesni_mb security cpu crypto autotest
  app/test: add aesni_mb security cpu crypto perftest
  ipsec: add rte_security cpu_crypto action support
  examples/ipsec-secgw: add security cpu_crypto action support

 app/test/Makefile                                  |    1 +
 app/test/meson.build                               |    1 +
 app/test/test_security_cpu_crypto.c                | 1326 ++++++++++++++++++++
 drivers/crypto/aesni_gcm/aesni_gcm_pmd.c           |   91 +-
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c       |   95 ++
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h   |   23 +
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c         |  291 ++++-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c     |   91 +-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h |   21 +-
 examples/ipsec-secgw/ipsec.c                       |   22 +
 examples/ipsec-secgw/ipsec_process.c               |    4 +-
 examples/ipsec-secgw/sa.c                          |   13 +-
 examples/ipsec-secgw/test/run_test.sh              |   10 +
 .../test/trs_3descbc_sha1_cpu_crypto_defs.sh       |    5 +
 .../test/trs_aescbc_sha1_cpu_crypto_defs.sh        |    5 +
 .../test/trs_aesctr_sha1_cpu_crypto_defs.sh        |    5 +
 .../ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh |    5 +
 .../test/trs_aesgcm_mb_cpu_crypto_defs.sh          |    7 +
 .../test/tun_3descbc_sha1_cpu_crypto_defs.sh       |    5 +
 .../test/tun_aescbc_sha1_cpu_crypto_defs.sh        |    5 +
 .../test/tun_aesctr_sha1_cpu_crypto_defs.sh        |    5 +
 .../ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh |    5 +
 .../test/tun_aesgcm_mb_cpu_crypto_defs.sh          |    7 +
 lib/librte_ipsec/esp_inb.c                         |  174 ++-
 lib/librte_ipsec/esp_outb.c                        |  290 ++++-
 lib/librte_ipsec/sa.c                              |   53 +-
 lib/librte_ipsec/sa.h                              |   29 +
 lib/librte_ipsec/ses.c                             |    4 +-
 lib/librte_security/rte_security.c                 |   16 +
 lib/librte_security/rte_security.h                 |   51 +-
 lib/librte_security/rte_security_driver.h          |   19 +
 lib/librte_security/rte_security_version.map       |    1 +
 32 files changed, 2658 insertions(+), 22 deletions(-)
 create mode 100644 app/test/test_security_cpu_crypto.c
 create mode 100644 examples/ipsec-secgw/test/trs_3descbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aescbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aesctr_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aesgcm_mb_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_3descbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aescbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aesctr_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aesgcm_mb_cpu_crypto_defs.sh

-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-09-03 15:40 [dpdk-dev] [RFC PATCH 0/9] security: add software synchronous crypto process Fan Zhang
@ 2019-09-03 15:40 ` Fan Zhang
  2019-09-04 10:32   ` Akhil Goyal
  2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 2/9] crypto/aesni_gcm: add rte_security handler Fan Zhang
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 87+ messages in thread
From: Fan Zhang @ 2019-09-03 15:40 UTC (permalink / raw)
  To: dev
  Cc: akhil.goyal, konstantin.ananyev, declan.doherty,
	pablo.de.lara.guarch, Fan Zhang

This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO action type to
security library. The type represents performing crypto operation with CPU
cycles. The patch also includes a new API to process crypto operations in
bulk and the function pointers for PMDs.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 lib/librte_security/rte_security.c           | 16 +++++++++
 lib/librte_security/rte_security.h           | 51 +++++++++++++++++++++++++++-
 lib/librte_security/rte_security_driver.h    | 19 +++++++++++
 lib/librte_security/rte_security_version.map |  1 +
 4 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/lib/librte_security/rte_security.c b/lib/librte_security/rte_security.c
index bc81ce15d..0f85c1b59 100644
--- a/lib/librte_security/rte_security.c
+++ b/lib/librte_security/rte_security.c
@@ -141,3 +141,19 @@ rte_security_capability_get(struct rte_security_ctx *instance,
 
 	return NULL;
 }
+
+void
+rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
+		struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num)
+{
+	uint32_t i;
+
+	for (i = 0; i < num; i++)
+		status[i] = -1;
+
+	RTE_FUNC_PTR_OR_RET(*instance->ops->process_cpu_crypto_bulk);
+	instance->ops->process_cpu_crypto_bulk(sess, buf, iv,
+			aad, digest, status, num);
+}
diff --git a/lib/librte_security/rte_security.h b/lib/librte_security/rte_security.h
index 96806e3a2..5a0f8901b 100644
--- a/lib/librte_security/rte_security.h
+++ b/lib/librte_security/rte_security.h
@@ -18,6 +18,7 @@ extern "C" {
 #endif
 
 #include <sys/types.h>
+#include <sys/uio.h>
 
 #include <netinet/in.h>
 #include <netinet/ip.h>
@@ -272,6 +273,20 @@ struct rte_security_pdcp_xform {
 	uint32_t hfn_threshold;
 };
 
+struct rte_security_cpu_crypto_xform {
+	/** For cipher/authentication crypto operation the authentication may
+	 * cover more content then the cipher. E.g., for IPSec ESP encryption
+	 * with AES-CBC and SHA1-HMAC, the encryption happens after the ESP
+	 * header but whole packet (apart from MAC header) is authenticated.
+	 * The cipher_offset field is used to deduct the cipher data pointer
+	 * from the buffer to be processed.
+	 *
+	 * NOTE this parameter shall be ignored by AEAD algorithms, since it
+	 * uses the same offset for cipher and authentication.
+	 */
+	int32_t cipher_offset;
+};
+
 /**
  * Security session action type.
  */
@@ -286,10 +301,14 @@ enum rte_security_session_action_type {
 	/**< All security protocol processing is performed inline during
 	 * transmission
 	 */
-	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
+	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL,
 	/**< All security protocol processing including crypto is performed
 	 * on a lookaside accelerator
 	 */
+	RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
+	/**< Crypto processing for security protocol is processed by CPU
+	 * synchronously
+	 */
 };
 
 /** Security session protocol definition */
@@ -315,6 +334,7 @@ struct rte_security_session_conf {
 		struct rte_security_ipsec_xform ipsec;
 		struct rte_security_macsec_xform macsec;
 		struct rte_security_pdcp_xform pdcp;
+		struct rte_security_cpu_crypto_xform cpucrypto;
 	};
 	/**< Configuration parameters for security session */
 	struct rte_crypto_sym_xform *crypto_xform;
@@ -639,6 +659,35 @@ const struct rte_security_capability *
 rte_security_capability_get(struct rte_security_ctx *instance,
 			    struct rte_security_capability_idx *idx);
 
+/**
+ * Security vector structure, contains pointer to vector array and the length
+ * of the array
+ */
+struct rte_security_vec {
+	struct iovec *vec;
+	uint32_t num;
+};
+
+/**
+ * Processing bulk crypto workload with CPU
+ *
+ * @param	instance	security instance.
+ * @param	sess		security session
+ * @param	buf		array of buffer SGL vectors
+ * @param	iv		array of IV pointers
+ * @param	aad		array of AAD pointers
+ * @param	digest		array of digest pointers
+ * @param	status		array of status for the function to return
+ * @param	num		number of elements in each array
+ *
+ */
+__rte_experimental
+void
+rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
+		struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_security/rte_security_driver.h b/lib/librte_security/rte_security_driver.h
index 1b561f852..70fcb0c26 100644
--- a/lib/librte_security/rte_security_driver.h
+++ b/lib/librte_security/rte_security_driver.h
@@ -132,6 +132,23 @@ typedef int (*security_get_userdata_t)(void *device,
 typedef const struct rte_security_capability *(*security_capabilities_get_t)(
 		void *device);
 
+/**
+ * Process security operations in bulk using CPU accelerated method.
+ *
+ * @param	sess		Security session structure.
+ * @param	buf		Buffer to the vectors to be processed.
+ * @param	iv		IV pointers.
+ * @param	aad		AAD pointers.
+ * @param	digest		Digest pointers.
+ * @param	status		Array of status value.
+ * @param	num		Number of elements in each array.
+ */
+
+typedef void (*security_process_cpu_crypto_bulk_t)(
+		struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num);
+
 /** Security operations function pointer table */
 struct rte_security_ops {
 	security_session_create_t session_create;
@@ -150,6 +167,8 @@ struct rte_security_ops {
 	/**< Get userdata associated with session which processed the packet. */
 	security_capabilities_get_t capabilities_get;
 	/**< Get security capabilities. */
+	security_process_cpu_crypto_bulk_t process_cpu_crypto_bulk;
+	/**< Process data in bulk. */
 };
 
 #ifdef __cplusplus
diff --git a/lib/librte_security/rte_security_version.map b/lib/librte_security/rte_security_version.map
index 53267bf3c..2132e7a00 100644
--- a/lib/librte_security/rte_security_version.map
+++ b/lib/librte_security/rte_security_version.map
@@ -18,4 +18,5 @@ EXPERIMENTAL {
 	rte_security_get_userdata;
 	rte_security_session_stats_get;
 	rte_security_session_update;
+	rte_security_process_cpu_crypto_bulk;
 };
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [RFC PATCH 2/9] crypto/aesni_gcm: add rte_security handler
  2019-09-03 15:40 [dpdk-dev] [RFC PATCH 0/9] security: add software synchronous crypto process Fan Zhang
  2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API Fan Zhang
@ 2019-09-03 15:40 ` Fan Zhang
  2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 3/9] app/test: add security cpu crypto autotest Fan Zhang
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Fan Zhang @ 2019-09-03 15:40 UTC (permalink / raw)
  To: dev
  Cc: akhil.goyal, konstantin.ananyev, declan.doherty,
	pablo.de.lara.guarch, Fan Zhang

This patch add rte_security support support to AESNI-GCM PMD. The PMD now
initialize security context instance, create/delete PMD specific security
sessions, and process crypto workloads in synchronous mode with
scatter-gather list buffer supported.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 drivers/crypto/aesni_gcm/aesni_gcm_pmd.c         | 91 ++++++++++++++++++++++-
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c     | 95 ++++++++++++++++++++++++
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h | 23 ++++++
 3 files changed, 208 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
index 1006a5c4d..0a346eddd 100644
--- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
+++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
@@ -6,6 +6,7 @@
 #include <rte_hexdump.h>
 #include <rte_cryptodev.h>
 #include <rte_cryptodev_pmd.h>
+#include <rte_security_driver.h>
 #include <rte_bus_vdev.h>
 #include <rte_malloc.h>
 #include <rte_cpuflags.h>
@@ -174,6 +175,56 @@ aesni_gcm_get_session(struct aesni_gcm_qp *qp, struct rte_crypto_op *op)
 	return sess;
 }
 
+static __rte_always_inline int
+process_gcm_security_sgl_buf(struct aesni_gcm_security_session *sess,
+		struct rte_security_vec *buf, uint8_t *iv,
+		uint8_t *aad, uint8_t *digest)
+{
+	struct aesni_gcm_session *session = &sess->sess;
+	uint8_t *tag;
+	uint32_t i;
+
+	sess->init(&session->gdata_key, &sess->gdata_ctx, iv, aad,
+			(uint64_t)session->aad_length);
+
+	for (i = 0; i < buf->num; i++) {
+		struct iovec *vec = &buf->vec[i];
+
+		sess->update(&session->gdata_key, &sess->gdata_ctx,
+				vec->iov_base, vec->iov_base, vec->iov_len);
+	}
+
+	switch (session->op) {
+	case AESNI_GCM_OP_AUTHENTICATED_ENCRYPTION:
+		if (session->req_digest_length != session->gen_digest_length)
+			tag = sess->temp_digest;
+		else
+			tag = digest;
+
+		sess->finalize(&session->gdata_key, &sess->gdata_ctx, tag,
+				session->gen_digest_length);
+
+		if (session->req_digest_length != session->gen_digest_length)
+			memcpy(digest, sess->temp_digest,
+					session->req_digest_length);
+		break;
+
+	case AESNI_GCM_OP_AUTHENTICATED_DECRYPTION:
+		tag = sess->temp_digest;
+
+		sess->finalize(&session->gdata_key, &sess->gdata_ctx, tag,
+				session->gen_digest_length);
+
+		if (memcmp(tag, digest,	session->req_digest_length) != 0)
+			return -1;
+		break;
+	default:
+		return -1;
+	}
+
+	return 0;
+}
+
 /**
  * Process a crypto operation, calling
  * the GCM API from the multi buffer library.
@@ -488,8 +539,10 @@ aesni_gcm_create(const char *name,
 {
 	struct rte_cryptodev *dev;
 	struct aesni_gcm_private *internals;
+	struct rte_security_ctx *sec_ctx;
 	enum aesni_gcm_vector_mode vector_mode;
 	MB_MGR *mb_mgr;
+	char sec_name[RTE_DEV_NAME_MAX_LEN];
 
 	/* Check CPU for support for AES instruction set */
 	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_AES)) {
@@ -524,7 +577,8 @@ aesni_gcm_create(const char *name,
 			RTE_CRYPTODEV_FF_SYM_OPERATION_CHAINING |
 			RTE_CRYPTODEV_FF_CPU_AESNI |
 			RTE_CRYPTODEV_FF_OOP_SGL_IN_LB_OUT |
-			RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT;
+			RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT |
+			RTE_CRYPTODEV_FF_SECURITY;
 
 	mb_mgr = alloc_mb_mgr(0);
 	if (mb_mgr == NULL)
@@ -587,6 +641,21 @@ aesni_gcm_create(const char *name,
 
 	internals->max_nb_queue_pairs = init_params->max_nb_queue_pairs;
 
+	/* setup security operations */
+	snprintf(sec_name, sizeof(sec_name) - 1, "aes_gcm_sec_%u",
+			dev->driver_id);
+	sec_ctx = rte_zmalloc_socket(sec_name,
+			sizeof(struct rte_security_ctx),
+			RTE_CACHE_LINE_SIZE, init_params->socket_id);
+	if (sec_ctx == NULL) {
+		AESNI_GCM_LOG(ERR, "memory allocation failed\n");
+		goto error_exit;
+	}
+
+	sec_ctx->device = (void *)dev;
+	sec_ctx->ops = rte_aesni_gcm_pmd_security_ops;
+	dev->security_ctx = sec_ctx;
+
 #if IMB_VERSION_NUM >= IMB_VERSION(0, 50, 0)
 	AESNI_GCM_LOG(INFO, "IPSec Multi-buffer library version used: %s\n",
 			imb_get_version_str());
@@ -641,6 +710,8 @@ aesni_gcm_remove(struct rte_vdev_device *vdev)
 	if (cryptodev == NULL)
 		return -ENODEV;
 
+	rte_free(cryptodev->security_ctx);
+
 	internals = cryptodev->data->dev_private;
 
 	free_mb_mgr(internals->mb_mgr);
@@ -648,6 +719,24 @@ aesni_gcm_remove(struct rte_vdev_device *vdev)
 	return rte_cryptodev_pmd_destroy(cryptodev);
 }
 
+void
+aesni_gcm_sec_crypto_process_bulk(struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num)
+{
+	struct aesni_gcm_security_session *session =
+			get_sec_session_private_data(sess);
+	uint32_t i;
+
+	if (unlikely(!session))
+		return;
+
+	for (i = 0; i < num; i++)
+		status[i] = process_gcm_security_sgl_buf(session, &buf[i],
+				(uint8_t *)iv[i], (uint8_t *)aad[i],
+				(uint8_t *)digest[i]);
+}
+
 static struct rte_vdev_driver aesni_gcm_pmd_drv = {
 	.probe = aesni_gcm_probe,
 	.remove = aesni_gcm_remove
diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
index 2f66c7c58..cc71dbd60 100644
--- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
+++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
@@ -7,6 +7,7 @@
 #include <rte_common.h>
 #include <rte_malloc.h>
 #include <rte_cryptodev_pmd.h>
+#include <rte_security_driver.h>
 
 #include "aesni_gcm_pmd_private.h"
 
@@ -316,6 +317,85 @@ aesni_gcm_pmd_sym_session_clear(struct rte_cryptodev *dev,
 	}
 }
 
+static int
+aesni_gcm_security_session_create(void *dev,
+		struct rte_security_session_conf *conf,
+		struct rte_security_session *sess,
+		struct rte_mempool *mempool)
+{
+	struct rte_cryptodev *cdev = dev;
+	struct aesni_gcm_private *internals = cdev->data->dev_private;
+	struct aesni_gcm_security_session *sess_priv;
+	int ret;
+
+	if (!conf->crypto_xform) {
+		AESNI_GCM_LOG(ERR, "Invalid security session conf");
+		return -EINVAL;
+	}
+
+	if (conf->crypto_xform->type == RTE_CRYPTO_SYM_XFORM_AUTH) {
+		AESNI_GCM_LOG(ERR, "GMAC is not supported in security session");
+		return -EINVAL;
+	}
+
+
+	if (rte_mempool_get(mempool, (void **)(&sess_priv))) {
+		AESNI_GCM_LOG(ERR,
+				"Couldn't get object from session mempool");
+		return -ENOMEM;
+	}
+
+	ret = aesni_gcm_set_session_parameters(internals->ops,
+				&sess_priv->sess, conf->crypto_xform);
+	if (ret != 0) {
+		AESNI_GCM_LOG(ERR, "Failed configure session parameters");
+
+		/* Return session to mempool */
+		rte_mempool_put(mempool, (void *)sess_priv);
+		return ret;
+	}
+
+	sess_priv->pre = internals->ops[sess_priv->sess.key].pre;
+	sess_priv->init = internals->ops[sess_priv->sess.key].init;
+	if (sess_priv->sess.op == AESNI_GCM_OP_AUTHENTICATED_ENCRYPTION) {
+		sess_priv->update =
+			internals->ops[sess_priv->sess.key].update_enc;
+		sess_priv->finalize =
+			internals->ops[sess_priv->sess.key].finalize_enc;
+	} else {
+		sess_priv->update =
+			internals->ops[sess_priv->sess.key].update_dec;
+		sess_priv->finalize =
+			internals->ops[sess_priv->sess.key].finalize_dec;
+	}
+
+	sess->sess_private_data = sess_priv;
+
+	return 0;
+}
+
+static int
+aesni_gcm_security_session_destroy(void *dev __rte_unused,
+		struct rte_security_session *sess)
+{
+	void *sess_priv = get_sec_session_private_data(sess);
+
+	if (sess_priv) {
+		struct rte_mempool *sess_mp = rte_mempool_from_obj(sess_priv);
+
+		memset(sess, 0, sizeof(struct aesni_gcm_security_session));
+		set_sec_session_private_data(sess, NULL);
+		rte_mempool_put(sess_mp, sess_priv);
+	}
+	return 0;
+}
+
+static unsigned int
+aesni_gcm_sec_session_get_size(__rte_unused void *device)
+{
+	return sizeof(struct aesni_gcm_security_session);
+}
+
 struct rte_cryptodev_ops aesni_gcm_pmd_ops = {
 		.dev_configure		= aesni_gcm_pmd_config,
 		.dev_start		= aesni_gcm_pmd_start,
@@ -336,4 +416,19 @@ struct rte_cryptodev_ops aesni_gcm_pmd_ops = {
 		.sym_session_clear	= aesni_gcm_pmd_sym_session_clear
 };
 
+static struct rte_security_ops aesni_gcm_security_ops = {
+		.session_create = aesni_gcm_security_session_create,
+		.session_get_size = aesni_gcm_sec_session_get_size,
+		.session_update = NULL,
+		.session_stats_get = NULL,
+		.session_destroy = aesni_gcm_security_session_destroy,
+		.set_pkt_metadata = NULL,
+		.capabilities_get = NULL,
+		.process_cpu_crypto_bulk =
+				aesni_gcm_sec_crypto_process_bulk,
+};
+
 struct rte_cryptodev_ops *rte_aesni_gcm_pmd_ops = &aesni_gcm_pmd_ops;
+
+struct rte_security_ops *rte_aesni_gcm_pmd_security_ops =
+		&aesni_gcm_security_ops;
diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
index 56b29e013..8e490b6ce 100644
--- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
+++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
@@ -114,5 +114,28 @@ aesni_gcm_set_session_parameters(const struct aesni_gcm_ops *ops,
  * Device specific operations function pointer structure */
 extern struct rte_cryptodev_ops *rte_aesni_gcm_pmd_ops;
 
+/**
+ * Security session structure.
+ */
+struct aesni_gcm_security_session {
+	/** Temp digest for decryption */
+	uint8_t temp_digest[DIGEST_LENGTH_MAX];
+	/** GCM operations */
+	aesni_gcm_pre_t pre;
+	aesni_gcm_init_t init;
+	aesni_gcm_update_t update;
+	aesni_gcm_finalize_t finalize;
+	/** AESNI-GCM session */
+	struct aesni_gcm_session sess;
+	/** AESNI-GCM context */
+	struct gcm_context_data gdata_ctx;
+};
+
+extern void
+aesni_gcm_sec_crypto_process_bulk(struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num);
+
+extern struct rte_security_ops *rte_aesni_gcm_pmd_security_ops;
 
 #endif /* _RTE_AESNI_GCM_PMD_PRIVATE_H_ */
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [RFC PATCH 3/9] app/test: add security cpu crypto autotest
  2019-09-03 15:40 [dpdk-dev] [RFC PATCH 0/9] security: add software synchronous crypto process Fan Zhang
  2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API Fan Zhang
  2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 2/9] crypto/aesni_gcm: add rte_security handler Fan Zhang
@ 2019-09-03 15:40 ` Fan Zhang
  2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 4/9] app/test: add security cpu crypto perftest Fan Zhang
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Fan Zhang @ 2019-09-03 15:40 UTC (permalink / raw)
  To: dev
  Cc: akhil.goyal, konstantin.ananyev, declan.doherty,
	pablo.de.lara.guarch, Fan Zhang

This patch adds cpu crypto unit test for AESNI_GCM PMD.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 app/test/Makefile                   |   1 +
 app/test/meson.build                |   1 +
 app/test/test_security_cpu_crypto.c | 564 ++++++++++++++++++++++++++++++++++++
 3 files changed, 566 insertions(+)
 create mode 100644 app/test/test_security_cpu_crypto.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 26ba6fe2b..090c55746 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -196,6 +196,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PMD_RING) += test_pmd_ring_perf.c
 SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev_blockcipher.c
 SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev.c
 SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev_asym.c
+SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_security_cpu_crypto.c
 
 SRCS-$(CONFIG_RTE_LIBRTE_METRICS) += test_metrics.c
 
diff --git a/app/test/meson.build b/app/test/meson.build
index ec40943bd..b7834ff21 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -103,6 +103,7 @@ test_sources = files('commands.c',
 	'test_ring_perf.c',
 	'test_rwlock.c',
 	'test_sched.c',
+	'test_security_cpu_crypto.c',
 	'test_service_cores.c',
 	'test_spinlock.c',
 	'test_stack.c',
diff --git a/app/test/test_security_cpu_crypto.c b/app/test/test_security_cpu_crypto.c
new file mode 100644
index 000000000..d345922b2
--- /dev/null
+++ b/app/test/test_security_cpu_crypto.c
@@ -0,0 +1,564 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation
+ */
+
+#include <rte_common.h>
+#include <rte_hexdump.h>
+#include <rte_mbuf.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_pause.h>
+#include <rte_bus_vdev.h>
+#include <rte_random.h>
+
+#include <rte_security.h>
+
+#include <rte_crypto.h>
+#include <rte_cryptodev.h>
+#include <rte_cryptodev_pmd.h>
+
+#include "test.h"
+#include "test_cryptodev.h"
+#include "test_cryptodev_aead_test_vectors.h"
+
+#define CPU_CRYPTO_TEST_MAX_AAD_LENGTH	16
+#define MAX_NB_SIGMENTS			4
+
+enum buffer_assemble_option {
+	SGL_MAX_SEG,
+	SGL_ONE_SEG,
+};
+
+struct cpu_crypto_test_case {
+	struct {
+		uint8_t seg[MBUF_DATAPAYLOAD_SIZE];
+		uint32_t seg_len;
+	} seg_buf[MAX_NB_SIGMENTS];
+	uint8_t iv[MAXIMUM_IV_LENGTH];
+	uint8_t aad[CPU_CRYPTO_TEST_MAX_AAD_LENGTH];
+	uint8_t digest[DIGEST_BYTE_LENGTH_SHA512];
+} __rte_cache_aligned;
+
+struct cpu_crypto_test_obj {
+	struct iovec vec[MAX_NUM_OPS_INFLIGHT][MAX_NB_SIGMENTS];
+	struct rte_security_vec sec_buf[MAX_NUM_OPS_INFLIGHT];
+	void *iv[MAX_NUM_OPS_INFLIGHT];
+	void *digest[MAX_NUM_OPS_INFLIGHT];
+	void *aad[MAX_NUM_OPS_INFLIGHT];
+	int status[MAX_NUM_OPS_INFLIGHT];
+};
+
+struct cpu_crypto_testsuite_params {
+	struct rte_mempool *buf_pool;
+	struct rte_mempool *session_priv_mpool;
+	struct rte_security_ctx *ctx;
+};
+
+struct cpu_crypto_unittest_params {
+	struct rte_security_session *sess;
+	void *test_datas[MAX_NUM_OPS_INFLIGHT];
+	struct cpu_crypto_test_obj test_obj;
+	uint32_t nb_bufs;
+};
+
+static struct cpu_crypto_testsuite_params testsuite_params = { NULL };
+static struct cpu_crypto_unittest_params unittest_params;
+
+static int gbl_driver_id;
+
+static int
+testsuite_setup(void)
+{
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct rte_cryptodev_info info;
+	uint32_t i;
+	uint32_t nb_devs;
+	uint32_t sess_sz;
+	int ret;
+
+	memset(ts_params, 0, sizeof(*ts_params));
+
+	ts_params->buf_pool = rte_mempool_lookup("CPU_CRYPTO_MBUFPOOL");
+	if (ts_params->buf_pool == NULL) {
+		/* Not already created so create */
+		ts_params->buf_pool = rte_pktmbuf_pool_create(
+				"CRYPTO_MBUFPOOL",
+				NUM_MBUFS, MBUF_CACHE_SIZE, 0,
+				sizeof(struct cpu_crypto_test_case),
+				rte_socket_id());
+		if (ts_params->buf_pool == NULL) {
+			RTE_LOG(ERR, USER1, "Can't create CRYPTO_MBUFPOOL\n");
+			return TEST_FAILED;
+		}
+	}
+
+	/* Create an AESNI MB device if required */
+	if (gbl_driver_id == rte_cryptodev_driver_id_get(
+			RTE_STR(CRYPTODEV_NAME_AESNI_MB_PMD))) {
+		nb_devs = rte_cryptodev_device_count_by_driver(
+				rte_cryptodev_driver_id_get(
+				RTE_STR(CRYPTODEV_NAME_AESNI_MB_PMD)));
+		if (nb_devs < 1) {
+			ret = rte_vdev_init(
+				RTE_STR(CRYPTODEV_NAME_AESNI_MB_PMD), NULL);
+
+			TEST_ASSERT(ret == 0,
+				"Failed to create instance of"
+				" pmd : %s",
+				RTE_STR(CRYPTODEV_NAME_AESNI_MB_PMD));
+		}
+	}
+
+	/* Create an AESNI GCM device if required */
+	if (gbl_driver_id == rte_cryptodev_driver_id_get(
+			RTE_STR(CRYPTODEV_NAME_AESNI_GCM_PMD))) {
+		nb_devs = rte_cryptodev_device_count_by_driver(
+				rte_cryptodev_driver_id_get(
+				RTE_STR(CRYPTODEV_NAME_AESNI_GCM_PMD)));
+		if (nb_devs < 1) {
+			TEST_ASSERT_SUCCESS(rte_vdev_init(
+				RTE_STR(CRYPTODEV_NAME_AESNI_GCM_PMD), NULL),
+				"Failed to create instance of"
+				" pmd : %s",
+				RTE_STR(CRYPTODEV_NAME_AESNI_GCM_PMD));
+		}
+	}
+
+	nb_devs = rte_cryptodev_count();
+	if (nb_devs < 1) {
+		RTE_LOG(ERR, USER1, "No crypto devices found?\n");
+		return TEST_FAILED;
+	}
+
+	/* Get security context */
+	for (i = 0; i < nb_devs; i++) {
+		rte_cryptodev_info_get(i, &info);
+		if (info.driver_id != gbl_driver_id)
+			continue;
+
+		ts_params->ctx = rte_cryptodev_get_sec_ctx(i);
+		if (!ts_params->ctx) {
+			RTE_LOG(ERR, USER1, "Rte_security is not supported\n");
+			return TEST_FAILED;
+		}
+	}
+
+	sess_sz = rte_security_session_get_size(ts_params->ctx);
+	ts_params->session_priv_mpool = rte_mempool_create(
+			"cpu_crypto_test_sess_mp", 2, sess_sz, 0, 0,
+			NULL, NULL, NULL, NULL,
+			SOCKET_ID_ANY, 0);
+	if (!ts_params->session_priv_mpool) {
+		RTE_LOG(ERR, USER1, "Not enough memory\n");
+		return TEST_FAILED;
+	}
+
+	return TEST_SUCCESS;
+}
+
+static void
+testsuite_teardown(void)
+{
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+
+	if (ts_params->buf_pool)
+		rte_mempool_free(ts_params->buf_pool);
+
+	if (ts_params->session_priv_mpool)
+		rte_mempool_free(ts_params->session_priv_mpool);
+}
+
+static int
+ut_setup(void)
+{
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+
+	memset(ut_params, 0, sizeof(*ut_params));
+	return TEST_SUCCESS;
+}
+
+static void
+ut_teardown(void)
+{
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+
+	if (ut_params->sess)
+		rte_security_session_destroy(ts_params->ctx, ut_params->sess);
+
+	if (ut_params->nb_bufs) {
+		uint32_t i;
+
+		for (i = 0; i < ut_params->nb_bufs; i++)
+			memset(ut_params->test_datas[i], 0,
+				sizeof(struct cpu_crypto_test_case));
+
+		rte_mempool_put_bulk(ts_params->buf_pool, ut_params->test_datas,
+				ut_params->nb_bufs);
+	}
+}
+
+static int
+allocate_buf(uint32_t n)
+{
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+	int ret;
+
+	ret = rte_mempool_get_bulk(ts_params->buf_pool, ut_params->test_datas,
+			n);
+
+	if (ret == 0)
+		ut_params->nb_bufs = n;
+
+	return ret;
+}
+
+static int
+check_status(struct cpu_crypto_test_obj *obj, uint32_t n)
+{
+	uint32_t i;
+
+	for (i = 0; i < n; i++)
+		if (obj->status[i] < 0)
+			return -1;
+
+	return 0;
+}
+
+static struct rte_security_session *
+create_aead_session(struct rte_security_ctx *ctx,
+		struct rte_mempool *sess_mp,
+		enum rte_crypto_aead_operation op,
+		const struct aead_test_data *test_data,
+		uint32_t is_unit_test)
+{
+	struct rte_security_session_conf sess_conf = {0};
+	struct rte_crypto_sym_xform xform = {0};
+
+	if (is_unit_test)
+		debug_hexdump(stdout, "key:", test_data->key.data,
+				test_data->key.len);
+
+	/* Setup AEAD Parameters */
+	xform.type = RTE_CRYPTO_SYM_XFORM_AEAD;
+	xform.next = NULL;
+	xform.aead.algo = test_data->algo;
+	xform.aead.op = op;
+	xform.aead.key.data = test_data->key.data;
+	xform.aead.key.length = test_data->key.len;
+	xform.aead.iv.offset = 0;
+	xform.aead.iv.length = test_data->iv.len;
+	xform.aead.digest_length = test_data->auth_tag.len;
+	xform.aead.aad_length = test_data->aad.len;
+
+	sess_conf.action_type = RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO;
+	sess_conf.crypto_xform = &xform;
+
+	return rte_security_session_create(ctx, &sess_conf, sess_mp);
+}
+
+static inline int
+assemble_aead_buf(struct cpu_crypto_test_case *data,
+		struct cpu_crypto_test_obj *obj,
+		uint32_t obj_idx,
+		enum rte_crypto_aead_operation op,
+		const struct aead_test_data *test_data,
+		enum buffer_assemble_option sgl_option,
+		uint32_t is_unit_test)
+{
+	const uint8_t *src;
+	uint32_t src_len;
+	uint32_t seg_idx;
+	uint32_t bytes_per_seg;
+	uint32_t left;
+
+	if (op == RTE_CRYPTO_AEAD_OP_ENCRYPT) {
+		src = test_data->plaintext.data;
+		src_len = test_data->plaintext.len;
+		if (is_unit_test)
+			debug_hexdump(stdout, "plaintext:", src, src_len);
+	} else {
+		src = test_data->ciphertext.data;
+		src_len = test_data->ciphertext.len;
+		memcpy(data->digest, test_data->auth_tag.data,
+				test_data->auth_tag.len);
+		if (is_unit_test) {
+			debug_hexdump(stdout, "ciphertext:", src, src_len);
+			debug_hexdump(stdout, "digest:",
+					test_data->auth_tag.data,
+					test_data->auth_tag.len);
+		}
+	}
+
+	if (src_len > MBUF_DATAPAYLOAD_SIZE)
+		return -ENOMEM;
+
+	switch (sgl_option) {
+	case SGL_MAX_SEG:
+		seg_idx = 0;
+		bytes_per_seg = src_len / MAX_NB_SIGMENTS + 1;
+		left = src_len;
+
+		if (bytes_per_seg > (MBUF_DATAPAYLOAD_SIZE / MAX_NB_SIGMENTS))
+			return -ENOMEM;
+
+		while (left) {
+			uint32_t cp_len = RTE_MIN(left, bytes_per_seg);
+			memcpy(data->seg_buf[seg_idx].seg, src, cp_len);
+			data->seg_buf[seg_idx].seg_len = cp_len;
+			obj->vec[obj_idx][seg_idx].iov_base =
+					(void *)data->seg_buf[seg_idx].seg;
+			obj->vec[obj_idx][seg_idx].iov_len = cp_len;
+			src += cp_len;
+			left -= cp_len;
+			seg_idx++;
+		}
+
+		if (left)
+			return -ENOMEM;
+
+		obj->sec_buf[obj_idx].vec = obj->vec[obj_idx];
+		obj->sec_buf[obj_idx].num = seg_idx;
+
+		break;
+	case SGL_ONE_SEG:
+		memcpy(data->seg_buf[0].seg, src, src_len);
+		data->seg_buf[0].seg_len = src_len;
+		obj->vec[obj_idx][0].iov_base =
+				(void *)data->seg_buf[0].seg;
+		obj->vec[obj_idx][0].iov_len = src_len;
+
+		obj->sec_buf[obj_idx].vec = obj->vec[obj_idx];
+		obj->sec_buf[obj_idx].num = 1;
+		break;
+	default:
+		return -1;
+	}
+
+	if (test_data->algo == RTE_CRYPTO_AEAD_AES_CCM) {
+		memcpy(data->iv + 1, test_data->iv.data, test_data->iv.len);
+		memcpy(data->aad + 18, test_data->aad.data, test_data->aad.len);
+	} else {
+		memcpy(data->iv, test_data->iv.data, test_data->iv.len);
+		memcpy(data->aad, test_data->aad.data, test_data->aad.len);
+	}
+
+	if (is_unit_test) {
+		debug_hexdump(stdout, "iv:", test_data->iv.data,
+				test_data->iv.len);
+		debug_hexdump(stdout, "aad:", test_data->aad.data,
+				test_data->aad.len);
+	}
+
+	obj->iv[obj_idx] = (void *)data->iv;
+	obj->digest[obj_idx] = (void *)data->digest;
+	obj->aad[obj_idx] = (void *)data->aad;
+
+	return 0;
+}
+
+#define CPU_CRYPTO_ERR_EXP_CT	"expect ciphertext:"
+#define CPU_CRYPTO_ERR_GEN_CT	"gen ciphertext:"
+#define CPU_CRYPTO_ERR_EXP_PT	"expect plaintext:"
+#define CPU_CRYPTO_ERR_GEN_PT	"gen plaintext:"
+
+static int
+check_aead_result(struct cpu_crypto_test_case *tcase,
+		enum rte_crypto_aead_operation op,
+		const struct aead_test_data *tdata)
+{
+	const char *err_msg1, *err_msg2;
+	const uint8_t *src_pt_ct;
+	const uint8_t *tmp_src;
+	uint32_t src_len;
+	uint32_t left;
+	uint32_t i = 0;
+	int ret;
+
+	if (op == RTE_CRYPTO_AEAD_OP_ENCRYPT) {
+		err_msg1 = CPU_CRYPTO_ERR_EXP_CT;
+		err_msg2 = CPU_CRYPTO_ERR_GEN_CT;
+
+		src_pt_ct = tdata->ciphertext.data;
+		src_len = tdata->ciphertext.len;
+
+		ret = memcmp(tcase->digest, tdata->auth_tag.data,
+				tdata->auth_tag.len);
+		if (ret != 0) {
+			debug_hexdump(stdout, "expect digest:",
+					tdata->auth_tag.data,
+					tdata->auth_tag.len);
+			debug_hexdump(stdout, "gen digest:",
+					tcase->digest,
+					tdata->auth_tag.len);
+			return -1;
+		}
+	} else {
+		src_pt_ct = tdata->plaintext.data;
+		src_len = tdata->plaintext.len;
+		err_msg1 = CPU_CRYPTO_ERR_EXP_PT;
+		err_msg2 = CPU_CRYPTO_ERR_GEN_PT;
+	}
+
+	tmp_src = src_pt_ct;
+	left = src_len;
+
+	while (left && i < MAX_NB_SIGMENTS) {
+		ret = memcmp(tcase->seg_buf[i].seg, tmp_src,
+				tcase->seg_buf[i].seg_len);
+		if (ret != 0)
+			goto sgl_err_dump;
+		tmp_src += tcase->seg_buf[i].seg_len;
+		left -= tcase->seg_buf[i].seg_len;
+		i++;
+	}
+
+	if (left) {
+		ret = -ENOMEM;
+		goto sgl_err_dump;
+	}
+
+	return 0;
+
+sgl_err_dump:
+	left = src_len;
+	i = 0;
+
+	debug_hexdump(stdout, err_msg1,
+			tdata->ciphertext.data,
+			tdata->ciphertext.len);
+
+	while (left && i < MAX_NB_SIGMENTS) {
+		debug_hexdump(stdout, err_msg2,
+				tcase->seg_buf[i].seg,
+				tcase->seg_buf[i].seg_len);
+		left -= tcase->seg_buf[i].seg_len;
+		i++;
+	}
+	return ret;
+}
+
+static inline void
+run_test(struct rte_security_ctx *ctx, struct rte_security_session *sess,
+		struct cpu_crypto_test_obj *obj, uint32_t n)
+{
+	rte_security_process_cpu_crypto_bulk(ctx, sess, obj->sec_buf,
+			obj->iv, obj->aad, obj->digest, obj->status, n);
+}
+
+static int
+cpu_crypto_test_aead(const struct aead_test_data *tdata,
+		enum rte_crypto_aead_operation dir,
+		enum buffer_assemble_option sgl_option)
+{
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+	struct cpu_crypto_test_obj *obj = &ut_params->test_obj;
+	struct cpu_crypto_test_case *tcase;
+	int ret;
+
+	ut_params->sess = create_aead_session(ts_params->ctx,
+			ts_params->session_priv_mpool,
+			dir,
+			tdata,
+			1);
+	if (!ut_params->sess)
+		return -1;
+
+	ret = allocate_buf(1);
+	if (ret)
+		return ret;
+
+	tcase = ut_params->test_datas[0];
+	ret = assemble_aead_buf(tcase, obj, 0, dir, tdata, sgl_option, 1);
+	if (ret < 0) {
+		printf("Test is not supported by the driver\n");
+		return ret;
+	}
+
+	run_test(ts_params->ctx, ut_params->sess, obj, 1);
+
+	ret = check_status(obj, 1);
+	if (ret < 0)
+		return ret;
+
+	ret = check_aead_result(tcase, dir, tdata);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+/* test-vector/sgl-option */
+#define all_gcm_unit_test_cases(type)		\
+	TEST_EXPAND(gcm_test_case_1, type)	\
+	TEST_EXPAND(gcm_test_case_2, type)	\
+	TEST_EXPAND(gcm_test_case_3, type)	\
+	TEST_EXPAND(gcm_test_case_4, type)	\
+	TEST_EXPAND(gcm_test_case_5, type)	\
+	TEST_EXPAND(gcm_test_case_6, type)	\
+	TEST_EXPAND(gcm_test_case_7, type)	\
+	TEST_EXPAND(gcm_test_case_8, type)	\
+	TEST_EXPAND(gcm_test_case_192_1, type)	\
+	TEST_EXPAND(gcm_test_case_192_2, type)	\
+	TEST_EXPAND(gcm_test_case_192_3, type)	\
+	TEST_EXPAND(gcm_test_case_192_4, type)	\
+	TEST_EXPAND(gcm_test_case_192_5, type)	\
+	TEST_EXPAND(gcm_test_case_192_6, type)	\
+	TEST_EXPAND(gcm_test_case_192_7, type)	\
+	TEST_EXPAND(gcm_test_case_256_1, type)	\
+	TEST_EXPAND(gcm_test_case_256_2, type)	\
+	TEST_EXPAND(gcm_test_case_256_3, type)	\
+	TEST_EXPAND(gcm_test_case_256_4, type)	\
+	TEST_EXPAND(gcm_test_case_256_5, type)	\
+	TEST_EXPAND(gcm_test_case_256_6, type)	\
+	TEST_EXPAND(gcm_test_case_256_7, type)
+
+
+#define TEST_EXPAND(t, o)						\
+static int								\
+cpu_crypto_aead_enc_test_##t##_##o(void)				\
+{									\
+	return cpu_crypto_test_aead(&t, RTE_CRYPTO_AEAD_OP_ENCRYPT, o);	\
+}									\
+static int								\
+cpu_crypto_aead_dec_test_##t##_##o(void)				\
+{									\
+	return cpu_crypto_test_aead(&t, RTE_CRYPTO_AEAD_OP_DECRYPT, o);	\
+}									\
+
+all_gcm_unit_test_cases(SGL_ONE_SEG)
+all_gcm_unit_test_cases(SGL_MAX_SEG)
+#undef TEST_EXPAND
+
+static struct unit_test_suite security_cpu_crypto_aesgcm_testsuite  = {
+	.suite_name = "Security CPU Crypto AESNI-GCM Unit Test Suite",
+	.setup = testsuite_setup,
+	.teardown = testsuite_teardown,
+	.unit_test_cases = {
+#define TEST_EXPAND(t, o)						\
+	TEST_CASE_ST(ut_setup, ut_teardown,				\
+			cpu_crypto_aead_enc_test_##t##_##o),		\
+	TEST_CASE_ST(ut_setup, ut_teardown,				\
+			cpu_crypto_aead_dec_test_##t##_##o),		\
+
+	all_gcm_unit_test_cases(SGL_ONE_SEG)
+	all_gcm_unit_test_cases(SGL_MAX_SEG)
+#undef TEST_EXPAND
+
+	TEST_CASES_END() /**< NULL terminate unit test array */
+	},
+};
+
+static int
+test_security_cpu_crypto_aesni_gcm(void)
+{
+	gbl_driver_id =	rte_cryptodev_driver_id_get(
+			RTE_STR(CRYPTODEV_NAME_AESNI_GCM_PMD));
+
+	return unit_test_suite_runner(&security_cpu_crypto_aesgcm_testsuite);
+}
+
+REGISTER_TEST_COMMAND(security_aesni_gcm_autotest,
+		test_security_cpu_crypto_aesni_gcm);
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [RFC PATCH 4/9] app/test: add security cpu crypto perftest
  2019-09-03 15:40 [dpdk-dev] [RFC PATCH 0/9] security: add software synchronous crypto process Fan Zhang
                   ` (2 preceding siblings ...)
  2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 3/9] app/test: add security cpu crypto autotest Fan Zhang
@ 2019-09-03 15:40 ` Fan Zhang
  2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 5/9] crypto/aesni_mb: add rte_security handler Fan Zhang
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Fan Zhang @ 2019-09-03 15:40 UTC (permalink / raw)
  To: dev
  Cc: akhil.goyal, konstantin.ananyev, declan.doherty,
	pablo.de.lara.guarch, Fan Zhang

Since crypto perf application does not support rte_security, this patch
adds a simple GCM CPU crypto performance test to crypto unittest
application. The test includes different key and data sizes test with
single buffer and SGL buffer test items and will display the throughput
as well as cycle count performance information.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 app/test/test_security_cpu_crypto.c | 201 ++++++++++++++++++++++++++++++++++++
 1 file changed, 201 insertions(+)

diff --git a/app/test/test_security_cpu_crypto.c b/app/test/test_security_cpu_crypto.c
index d345922b2..ca9a8dae6 100644
--- a/app/test/test_security_cpu_crypto.c
+++ b/app/test/test_security_cpu_crypto.c
@@ -23,6 +23,7 @@
 
 #define CPU_CRYPTO_TEST_MAX_AAD_LENGTH	16
 #define MAX_NB_SIGMENTS			4
+#define CACHE_WARM_ITER			2048
 
 enum buffer_assemble_option {
 	SGL_MAX_SEG,
@@ -560,5 +561,205 @@ test_security_cpu_crypto_aesni_gcm(void)
 	return unit_test_suite_runner(&security_cpu_crypto_aesgcm_testsuite);
 }
 
+
+static inline void
+gen_rand(uint8_t *data, uint32_t len)
+{
+	uint32_t i;
+
+	for (i = 0; i < len; i++)
+		data[i] = (uint8_t)rte_rand();
+}
+
+static inline void
+switch_aead_enc_to_dec(struct aead_test_data *tdata,
+		struct cpu_crypto_test_case *tcase,
+		enum buffer_assemble_option sgl_option)
+{
+	uint32_t i;
+	uint8_t *dst = tdata->ciphertext.data;
+
+	switch (sgl_option) {
+	case SGL_ONE_SEG:
+		memcpy(dst, tcase->seg_buf[0].seg, tcase->seg_buf[0].seg_len);
+		tdata->ciphertext.len = tcase->seg_buf[0].seg_len;
+		break;
+	case SGL_MAX_SEG:
+		tdata->ciphertext.len = 0;
+		for (i = 0; i < MAX_NB_SIGMENTS; i++) {
+			memcpy(dst, tcase->seg_buf[i].seg,
+					tcase->seg_buf[i].seg_len);
+			tdata->ciphertext.len += tcase->seg_buf[i].seg_len;
+		}
+		break;
+	}
+
+	memcpy(tdata->auth_tag.data, tcase->digest, tdata->auth_tag.len);
+}
+
+static int
+cpu_crypto_test_aead_perf(enum buffer_assemble_option sgl_option,
+		uint32_t key_sz)
+{
+	struct aead_test_data tdata = {0};
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+	struct cpu_crypto_test_obj *obj = &ut_params->test_obj;
+	struct cpu_crypto_test_case *tcase;
+	uint64_t hz = rte_get_tsc_hz(), time_start, time_now;
+	double rate, cycles_per_buf;
+	uint32_t test_data_szs[] = {64, 128, 256, 512, 1024, 2048};
+	uint32_t i, j;
+	uint8_t aad[16];
+	int ret;
+
+	tdata.key.len = key_sz;
+	gen_rand(tdata.key.data, tdata.key.len);
+	tdata.algo = RTE_CRYPTO_AEAD_AES_GCM;
+	tdata.aad.data = aad;
+
+	ut_params->sess = create_aead_session(ts_params->ctx,
+			ts_params->session_priv_mpool,
+			RTE_CRYPTO_AEAD_OP_DECRYPT,
+			&tdata,
+			0);
+	if (!ut_params->sess)
+		return -1;
+
+	ret = allocate_buf(MAX_NUM_OPS_INFLIGHT);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < RTE_DIM(test_data_szs); i++) {
+		for (j = 0; j < MAX_NUM_OPS_INFLIGHT; j++) {
+			tdata.plaintext.len = test_data_szs[i];
+			gen_rand(tdata.plaintext.data,
+					tdata.plaintext.len);
+
+			tdata.aad.len = 12;
+			gen_rand(tdata.aad.data, tdata.aad.len);
+
+			tdata.auth_tag.len = 16;
+
+			tdata.iv.len = 16;
+			gen_rand(tdata.iv.data, tdata.iv.len);
+
+			tcase = ut_params->test_datas[j];
+			ret = assemble_aead_buf(tcase, obj, j,
+					RTE_CRYPTO_AEAD_OP_ENCRYPT,
+					&tdata, sgl_option, 0);
+			if (ret < 0) {
+				printf("Test is not supported by the driver\n");
+				return ret;
+			}
+		}
+
+		/* warm up cache */
+		for (j = 0; j < CACHE_WARM_ITER; j++)
+			run_test(ts_params->ctx, ut_params->sess, obj,
+					MAX_NUM_OPS_INFLIGHT);
+
+		time_start = rte_rdtsc();
+
+		run_test(ts_params->ctx, ut_params->sess, obj,
+				MAX_NUM_OPS_INFLIGHT);
+
+		time_now = rte_rdtsc();
+
+		rate = time_now - time_start;
+		cycles_per_buf = rate / MAX_NUM_OPS_INFLIGHT;
+
+		rate = ((hz / cycles_per_buf)) / 1000000;
+
+		printf("AES-GCM-%u(%4uB) Enc %03.3fMpps (%03.3fGbps) ",
+				key_sz * 8, test_data_szs[i], rate,
+				rate  * test_data_szs[i] * 8 / 1000);
+		printf("cycles per buf %03.3f per byte %03.3f\n",
+				cycles_per_buf,
+				cycles_per_buf / test_data_szs[i]);
+
+		for (j = 0; j < MAX_NUM_OPS_INFLIGHT; j++) {
+			tcase = ut_params->test_datas[j];
+
+			switch_aead_enc_to_dec(&tdata, tcase, sgl_option);
+			ret = assemble_aead_buf(tcase, obj, j,
+					RTE_CRYPTO_AEAD_OP_DECRYPT,
+					&tdata, sgl_option, 0);
+			if (ret < 0) {
+				printf("Test is not supported by the driver\n");
+				return ret;
+			}
+		}
+
+		time_start = rte_get_timer_cycles();
+
+		run_test(ts_params->ctx, ut_params->sess, obj,
+				MAX_NUM_OPS_INFLIGHT);
+
+		time_now = rte_get_timer_cycles();
+
+		rate = time_now - time_start;
+		cycles_per_buf = rate / MAX_NUM_OPS_INFLIGHT;
+
+		rate = ((hz / cycles_per_buf)) / 1000000;
+
+		printf("AES-GCM-%u(%4uB) Dec %03.3fMpps (%03.3fGbps) ",
+				key_sz * 8, test_data_szs[i], rate,
+				rate  * test_data_szs[i] * 8 / 1000);
+		printf("cycles per buf %03.3f per byte %03.3f\n",
+				cycles_per_buf,
+				cycles_per_buf / test_data_szs[i]);
+	}
+
+	return 0;
+}
+
+/* test-perfix/key-size/sgl-type */
+#define all_gcm_perf_test_cases(type)					\
+	TEST_EXPAND(_128, 16, type)					\
+	TEST_EXPAND(_192, 24, type)					\
+	TEST_EXPAND(_256, 32, type)
+
+#define TEST_EXPAND(a, b, c)						\
+static int								\
+cpu_crypto_gcm_perf##a##_##c(void)					\
+{									\
+	return cpu_crypto_test_aead_perf(c, b);				\
+}									\
+
+all_gcm_perf_test_cases(SGL_ONE_SEG)
+all_gcm_perf_test_cases(SGL_MAX_SEG)
+#undef TEST_EXPAND
+
+static struct unit_test_suite security_cpu_crypto_aesgcm_perf_testsuite  = {
+		.suite_name = "Security CPU Crypto AESNI-GCM Perf Test Suite",
+		.setup = testsuite_setup,
+		.teardown = testsuite_teardown,
+		.unit_test_cases = {
+#define TEST_EXPAND(a, b, c)						\
+		TEST_CASE_ST(ut_setup, ut_teardown,			\
+				cpu_crypto_gcm_perf##a##_##c),		\
+
+		all_gcm_perf_test_cases(SGL_ONE_SEG)
+		all_gcm_perf_test_cases(SGL_MAX_SEG)
+#undef TEST_EXPAND
+
+		TEST_CASES_END() /**< NULL terminate unit test array */
+		},
+};
+
+static int
+test_security_cpu_crypto_aesni_gcm_perf(void)
+{
+	gbl_driver_id =	rte_cryptodev_driver_id_get(
+			RTE_STR(CRYPTODEV_NAME_AESNI_GCM_PMD));
+
+	return unit_test_suite_runner(
+			&security_cpu_crypto_aesgcm_perf_testsuite);
+}
+
 REGISTER_TEST_COMMAND(security_aesni_gcm_autotest,
 		test_security_cpu_crypto_aesni_gcm);
+
+REGISTER_TEST_COMMAND(security_aesni_gcm_perftest,
+		test_security_cpu_crypto_aesni_gcm_perf);
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [RFC PATCH 5/9] crypto/aesni_mb: add rte_security handler
  2019-09-03 15:40 [dpdk-dev] [RFC PATCH 0/9] security: add software synchronous crypto process Fan Zhang
                   ` (3 preceding siblings ...)
  2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 4/9] app/test: add security cpu crypto perftest Fan Zhang
@ 2019-09-03 15:40 ` Fan Zhang
  2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 6/9] app/test: add aesni_mb security cpu crypto autotest Fan Zhang
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Fan Zhang @ 2019-09-03 15:40 UTC (permalink / raw)
  To: dev
  Cc: akhil.goyal, konstantin.ananyev, declan.doherty,
	pablo.de.lara.guarch, Fan Zhang

This patch add rte_security support support to AESNI-MB PMD. The PMD now
initialize security context instance, create/delete PMD specific security
sessions, and process crypto workloads in synchronous mode.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c         | 291 ++++++++++++++++++++-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c     |  91 ++++++-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h |  21 +-
 3 files changed, 398 insertions(+), 5 deletions(-)

diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
index b495a9679..68767c04e 100644
--- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
+++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
@@ -8,6 +8,8 @@
 #include <rte_hexdump.h>
 #include <rte_cryptodev.h>
 #include <rte_cryptodev_pmd.h>
+#include <rte_security.h>
+#include <rte_security_driver.h>
 #include <rte_bus_vdev.h>
 #include <rte_malloc.h>
 #include <rte_cpuflags.h>
@@ -789,6 +791,167 @@ auth_start_offset(struct rte_crypto_op *op, struct aesni_mb_session *session,
 			(UINT64_MAX - u_src + u_dst + 1);
 }
 
+union sec_userdata_field {
+	int status;
+	struct {
+		uint16_t is_gen_digest;
+		uint16_t digest_len;
+	};
+};
+
+struct sec_udata_digest_field {
+	uint32_t is_digest_gen;
+	uint32_t digest_len;
+};
+
+static inline int
+set_mb_job_params_sec(JOB_AES_HMAC *job, struct aesni_mb_sec_session *sec_sess,
+		void *buf, uint32_t buf_len, void *iv, void *aad, void *digest,
+		int *status, uint8_t *digest_idx)
+{
+	struct aesni_mb_session *session = &sec_sess->sess;
+	uint32_t cipher_offset = sec_sess->cipher_offset;
+	void *user_digest = NULL;
+	union sec_userdata_field udata;
+
+	if (unlikely(cipher_offset > buf_len))
+		return -EINVAL;
+
+	/* Set crypto operation */
+	job->chain_order = session->chain_order;
+
+	/* Set cipher parameters */
+	job->cipher_direction = session->cipher.direction;
+	job->cipher_mode = session->cipher.mode;
+
+	job->aes_key_len_in_bytes = session->cipher.key_length_in_bytes;
+
+	/* Set authentication parameters */
+	job->hash_alg = session->auth.algo;
+	job->iv = iv;
+
+	switch (job->hash_alg) {
+	case AES_XCBC:
+		job->u.XCBC._k1_expanded = session->auth.xcbc.k1_expanded;
+		job->u.XCBC._k2 = session->auth.xcbc.k2;
+		job->u.XCBC._k3 = session->auth.xcbc.k3;
+
+		job->aes_enc_key_expanded =
+				session->cipher.expanded_aes_keys.encode;
+		job->aes_dec_key_expanded =
+				session->cipher.expanded_aes_keys.decode;
+		break;
+
+	case AES_CCM:
+		job->u.CCM.aad = (uint8_t *)aad + 18;
+		job->u.CCM.aad_len_in_bytes = session->aead.aad_len;
+		job->aes_enc_key_expanded =
+				session->cipher.expanded_aes_keys.encode;
+		job->aes_dec_key_expanded =
+				session->cipher.expanded_aes_keys.decode;
+		job->iv++;
+		break;
+
+	case AES_CMAC:
+		job->u.CMAC._key_expanded = session->auth.cmac.expkey;
+		job->u.CMAC._skey1 = session->auth.cmac.skey1;
+		job->u.CMAC._skey2 = session->auth.cmac.skey2;
+		job->aes_enc_key_expanded =
+				session->cipher.expanded_aes_keys.encode;
+		job->aes_dec_key_expanded =
+				session->cipher.expanded_aes_keys.decode;
+		break;
+
+	case AES_GMAC:
+		if (session->cipher.mode == GCM) {
+			job->u.GCM.aad = aad;
+			job->u.GCM.aad_len_in_bytes = session->aead.aad_len;
+		} else {
+			/* For GMAC */
+			job->u.GCM.aad = aad;
+			job->u.GCM.aad_len_in_bytes = buf_len;
+			job->cipher_mode = GCM;
+		}
+		job->aes_enc_key_expanded = &session->cipher.gcm_key;
+		job->aes_dec_key_expanded = &session->cipher.gcm_key;
+		break;
+
+	default:
+		job->u.HMAC._hashed_auth_key_xor_ipad =
+				session->auth.pads.inner;
+		job->u.HMAC._hashed_auth_key_xor_opad =
+				session->auth.pads.outer;
+
+		if (job->cipher_mode == DES3) {
+			job->aes_enc_key_expanded =
+				session->cipher.exp_3des_keys.ks_ptr;
+			job->aes_dec_key_expanded =
+				session->cipher.exp_3des_keys.ks_ptr;
+		} else {
+			job->aes_enc_key_expanded =
+				session->cipher.expanded_aes_keys.encode;
+			job->aes_dec_key_expanded =
+				session->cipher.expanded_aes_keys.decode;
+		}
+	}
+
+	/* Set digest output location */
+	if (job->hash_alg != NULL_HASH &&
+			session->auth.operation == RTE_CRYPTO_AUTH_OP_VERIFY) {
+		job->auth_tag_output = sec_sess->temp_digests[*digest_idx];
+		*digest_idx = (*digest_idx + 1) % MAX_JOBS;
+
+		udata.is_gen_digest = 0;
+		udata.digest_len = session->auth.req_digest_len;
+		user_digest = (void *)digest;
+	} else {
+		udata.is_gen_digest = 1;
+		udata.digest_len = session->auth.req_digest_len;
+
+		if (session->auth.req_digest_len !=
+				session->auth.gen_digest_len) {
+			job->auth_tag_output =
+					sec_sess->temp_digests[*digest_idx];
+			*digest_idx = (*digest_idx + 1) % MAX_JOBS;
+
+			user_digest = (void *)digest;
+		} else
+			job->auth_tag_output = digest;
+
+		/* A bit of hack here, since job structure only supports
+		 * 2 user data fields and we need 4 params to be passed
+		 * (status, direction, digest for verify, and length of
+		 * digest), we set the status value as digest length +
+		 * direction here temporarily to avoid creating longer
+		 * buffer to store all 4 params.
+		 */
+		*status = udata.status;
+	}
+	/*
+	 * Multi-buffer library current only support returning a truncated
+	 * digest length as specified in the relevant IPsec RFCs
+	 */
+
+	/* Set digest length */
+	job->auth_tag_output_len_in_bytes = session->auth.gen_digest_len;
+
+	/* Set IV parameters */
+	job->iv_len_in_bytes = session->iv.length;
+
+	/* Data Parameters */
+	job->src = buf;
+	job->dst = buf;
+	job->cipher_start_src_offset_in_bytes = cipher_offset;
+	job->msg_len_to_cipher_in_bytes = buf_len - cipher_offset;
+	job->hash_start_src_offset_in_bytes = 0;
+	job->msg_len_to_hash_in_bytes = buf_len;
+
+	job->user_data = (void *)status;
+	job->user_data2 = user_digest;
+
+	return 0;
+}
+
 /**
  * Process a crypto operation and complete a JOB_AES_HMAC job structure for
  * submission to the multi buffer library for processing.
@@ -1081,6 +1244,37 @@ post_process_mb_job(struct aesni_mb_qp *qp, JOB_AES_HMAC *job)
 	return op;
 }
 
+static inline void
+post_process_mb_sec_job(JOB_AES_HMAC *job)
+{
+	void *user_digest = job->user_data2;
+	int *status = job->user_data;
+	union sec_userdata_field udata;
+
+	switch (job->status) {
+	case STS_COMPLETED:
+		if (user_digest) {
+			udata.status = *status;
+
+			if (udata.is_gen_digest) {
+				*status = RTE_CRYPTO_OP_STATUS_SUCCESS;
+				memcpy(user_digest, job->auth_tag_output,
+						udata.digest_len);
+			} else {
+				verify_digest(job, user_digest,
+					udata.digest_len, (uint8_t *)status);
+
+				if (*status == RTE_CRYPTO_OP_STATUS_AUTH_FAILED)
+					*status = -1;
+			}
+		} else
+			*status = RTE_CRYPTO_OP_STATUS_SUCCESS;
+		break;
+	default:
+		*status = RTE_CRYPTO_OP_STATUS_ERROR;
+	}
+}
+
 /**
  * Process a completed JOB_AES_HMAC job and keep processing jobs until
  * get_completed_job return NULL
@@ -1117,6 +1311,32 @@ handle_completed_jobs(struct aesni_mb_qp *qp, JOB_AES_HMAC *job,
 	return processed_jobs;
 }
 
+static inline uint32_t
+handle_completed_sec_jobs(JOB_AES_HMAC *job, MB_MGR *mb_mgr)
+{
+	uint32_t processed = 0;
+
+	while (job != NULL) {
+		post_process_mb_sec_job(job);
+		job = IMB_GET_COMPLETED_JOB(mb_mgr);
+		processed++;
+	}
+
+	return processed;
+}
+
+static inline uint32_t
+flush_mb_sec_mgr(MB_MGR *mb_mgr)
+{
+	JOB_AES_HMAC *job = IMB_FLUSH_JOB(mb_mgr);
+	uint32_t processed = 0;
+
+	if (job)
+		processed = handle_completed_sec_jobs(job, mb_mgr);
+
+	return processed;
+}
+
 static inline uint16_t
 flush_mb_mgr(struct aesni_mb_qp *qp, struct rte_crypto_op **ops,
 		uint16_t nb_ops)
@@ -1220,6 +1440,55 @@ aesni_mb_pmd_dequeue_burst(void *queue_pair, struct rte_crypto_op **ops,
 	return processed_jobs;
 }
 
+void
+aesni_mb_sec_crypto_process_bulk(struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num)
+{
+	struct aesni_mb_sec_session *sec_sess = sess->sess_private_data;
+	JOB_AES_HMAC *job;
+	uint8_t digest_idx = sec_sess->digest_idx;
+	uint32_t i, processed = 0;
+	int ret;
+
+	for (i = 0; i < num; i++) {
+		void *seg_buf = buf[i].vec[0].iov_base;
+		uint32_t buf_len = buf[i].vec[0].iov_len;
+
+		job = IMB_GET_NEXT_JOB(sec_sess->mb_mgr);
+		if (unlikely(job == NULL)) {
+			processed += flush_mb_sec_mgr(sec_sess->mb_mgr);
+
+			job = IMB_GET_NEXT_JOB(sec_sess->mb_mgr);
+			if (!job)
+				return;
+		}
+
+		ret = set_mb_job_params_sec(job, sec_sess, seg_buf, buf_len,
+				iv[i], aad[i], digest[i], &status[i],
+				&digest_idx);
+				/* Submit job to multi-buffer for processing */
+		if (ret) {
+			processed++;
+			status[i] = ret;
+			continue;
+		}
+
+#ifdef RTE_LIBRTE_PMD_AESNI_MB_DEBUG
+		job = IMB_SUBMIT_JOB(sec_sess->mb_mgr);
+#else
+		job = IMB_SUBMIT_JOB_NOCHECK(sec_sess->mb_mgr);
+#endif
+
+		if (job)
+			processed += handle_completed_sec_jobs(job,
+					sec_sess->mb_mgr);
+	}
+
+	while (processed < num)
+		processed += flush_mb_sec_mgr(sec_sess->mb_mgr);
+}
+
 static int cryptodev_aesni_mb_remove(struct rte_vdev_device *vdev);
 
 static int
@@ -1229,8 +1498,10 @@ cryptodev_aesni_mb_create(const char *name,
 {
 	struct rte_cryptodev *dev;
 	struct aesni_mb_private *internals;
+	struct rte_security_ctx *sec_ctx;
 	enum aesni_mb_vector_mode vector_mode;
 	MB_MGR *mb_mgr;
+	char sec_name[RTE_DEV_NAME_MAX_LEN];
 
 	/* Check CPU for support for AES instruction set */
 	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_AES)) {
@@ -1264,7 +1535,8 @@ cryptodev_aesni_mb_create(const char *name,
 	dev->feature_flags = RTE_CRYPTODEV_FF_SYMMETRIC_CRYPTO |
 			RTE_CRYPTODEV_FF_SYM_OPERATION_CHAINING |
 			RTE_CRYPTODEV_FF_CPU_AESNI |
-			RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT;
+			RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT |
+			RTE_CRYPTODEV_FF_SECURITY;
 
 
 	mb_mgr = alloc_mb_mgr(0);
@@ -1303,11 +1575,28 @@ cryptodev_aesni_mb_create(const char *name,
 	AESNI_MB_LOG(INFO, "IPSec Multi-buffer library version used: %s\n",
 			imb_get_version_str());
 
+	/* setup security operations */
+	snprintf(sec_name, sizeof(sec_name) - 1, "aes_mb_sec_%u",
+			dev->driver_id);
+	sec_ctx = rte_zmalloc_socket(sec_name,
+			sizeof(struct rte_security_ctx),
+			RTE_CACHE_LINE_SIZE, init_params->socket_id);
+	if (sec_ctx == NULL) {
+		AESNI_MB_LOG(ERR, "memory allocation failed\n");
+		goto error_exit;
+	}
+
+	sec_ctx->device = (void *)dev;
+	sec_ctx->ops = rte_aesni_mb_pmd_security_ops;
+	dev->security_ctx = sec_ctx;
+
 	return 0;
 
 error_exit:
 	if (mb_mgr)
 		free_mb_mgr(mb_mgr);
+	if (sec_ctx)
+		rte_free(sec_ctx);
 
 	rte_cryptodev_pmd_destroy(dev);
 
diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
index 8d15b99d4..ca6cea775 100644
--- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
+++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
@@ -8,6 +8,7 @@
 #include <rte_common.h>
 #include <rte_malloc.h>
 #include <rte_cryptodev_pmd.h>
+#include <rte_security_driver.h>
 
 #include "rte_aesni_mb_pmd_private.h"
 
@@ -732,7 +733,8 @@ aesni_mb_pmd_qp_count(struct rte_cryptodev *dev)
 static unsigned
 aesni_mb_pmd_sym_session_get_size(struct rte_cryptodev *dev __rte_unused)
 {
-	return sizeof(struct aesni_mb_session);
+	return RTE_ALIGN_CEIL(sizeof(struct aesni_mb_session),
+			RTE_CACHE_LINE_SIZE);
 }
 
 /** Configure a aesni multi-buffer session from a crypto xform chain */
@@ -810,4 +812,91 @@ struct rte_cryptodev_ops aesni_mb_pmd_ops = {
 		.sym_session_clear	= aesni_mb_pmd_sym_session_clear
 };
 
+/** Set session authentication parameters */
+
+static int
+aesni_mb_security_session_create(void *dev,
+		struct rte_security_session_conf *conf,
+		struct rte_security_session *sess,
+		struct rte_mempool *mempool)
+{
+	struct rte_cryptodev *cdev = dev;
+	struct aesni_mb_private *internals = cdev->data->dev_private;
+	struct aesni_mb_sec_session *sess_priv;
+	int ret;
+
+	if (!conf->crypto_xform) {
+		AESNI_MB_LOG(ERR, "Invalid security session conf");
+		return -EINVAL;
+	}
+
+	if (rte_mempool_get(mempool, (void **)(&sess_priv))) {
+		AESNI_MB_LOG(ERR,
+				"Couldn't get object from session mempool");
+		return -ENOMEM;
+	}
+
+	sess_priv->mb_mgr = internals->mb_mgr;
+	if (sess_priv->mb_mgr == NULL)
+		return -ENOMEM;
+
+	sess_priv->cipher_offset = conf->cpucrypto.cipher_offset;
+
+	ret = aesni_mb_set_session_parameters(sess_priv->mb_mgr,
+			&sess_priv->sess, conf->crypto_xform);
+	if (ret != 0) {
+		AESNI_MB_LOG(ERR, "failed configure session parameters");
+
+		rte_mempool_put(mempool, sess_priv);
+	}
+
+	sess->sess_private_data = (void *)sess_priv;
+
+	return ret;
+}
+
+static int
+aesni_mb_security_session_destroy(void *dev __rte_unused,
+		struct rte_security_session *sess)
+{
+	struct aesni_mb_sec_session *sess_priv =
+			get_sec_session_private_data(sess);
+
+	if (sess_priv) {
+		struct rte_mempool *sess_mp = rte_mempool_from_obj(
+				(void *)sess_priv);
+
+		memset(sess, 0, sizeof(struct aesni_mb_sec_session));
+		set_sec_session_private_data(sess, NULL);
+
+		if (sess_mp == NULL) {
+			AESNI_MB_LOG(ERR, "failed fetch session mempool");
+			return -EINVAL;
+		}
+
+		rte_mempool_put(sess_mp, sess_priv);
+	}
+
+	return 0;
+}
+
+static unsigned int
+aesni_mb_sec_session_get_size(__rte_unused void *device)
+{
+	return RTE_ALIGN_CEIL(sizeof(struct aesni_mb_sec_session),
+			RTE_CACHE_LINE_SIZE);
+}
+
+static struct rte_security_ops aesni_mb_security_ops = {
+		.session_create = aesni_mb_security_session_create,
+		.session_get_size = aesni_mb_sec_session_get_size,
+		.session_update = NULL,
+		.session_stats_get = NULL,
+		.session_destroy = aesni_mb_security_session_destroy,
+		.set_pkt_metadata = NULL,
+		.capabilities_get = NULL,
+		.process_cpu_crypto_bulk = aesni_mb_sec_crypto_process_bulk,
+};
+
 struct rte_cryptodev_ops *rte_aesni_mb_pmd_ops = &aesni_mb_pmd_ops;
+struct rte_security_ops *rte_aesni_mb_pmd_security_ops = &aesni_mb_security_ops;
diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h
index b794d4bc1..d1cf416ab 100644
--- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h
+++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h
@@ -176,7 +176,6 @@ struct aesni_mb_qp {
 	 */
 } __rte_cache_aligned;
 
-/** AES-NI multi-buffer private session structure */
 struct aesni_mb_session {
 	JOB_CHAIN_ORDER chain_order;
 	struct {
@@ -265,16 +264,32 @@ struct aesni_mb_session {
 		/** AAD data length */
 		uint16_t aad_len;
 	} aead;
-} __rte_cache_aligned;
+};
+
+/** AES-NI multi-buffer private security session structure */
+struct aesni_mb_sec_session {
+	/**< Unique Queue Pair Name */
+	struct aesni_mb_session sess;
+	uint8_t temp_digests[MAX_JOBS][DIGEST_LENGTH_MAX];
+	uint16_t digest_idx;
+	uint32_t cipher_offset;
+	MB_MGR *mb_mgr;
+};
 
 extern int
 aesni_mb_set_session_parameters(const MB_MGR *mb_mgr,
 		struct aesni_mb_session *sess,
 		const struct rte_crypto_sym_xform *xform);
 
+extern void
+aesni_mb_sec_crypto_process_bulk(struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num);
+
 /** device specific operations function pointer structure */
 extern struct rte_cryptodev_ops *rte_aesni_mb_pmd_ops;
 
-
+/** device specific operations function pointer structure for rte_security */
+extern struct rte_security_ops *rte_aesni_mb_pmd_security_ops;
 
 #endif /* _RTE_AESNI_MB_PMD_PRIVATE_H_ */
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [RFC PATCH 6/9] app/test: add aesni_mb security cpu crypto autotest
  2019-09-03 15:40 [dpdk-dev] [RFC PATCH 0/9] security: add software synchronous crypto process Fan Zhang
                   ` (4 preceding siblings ...)
  2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 5/9] crypto/aesni_mb: add rte_security handler Fan Zhang
@ 2019-09-03 15:40 ` Fan Zhang
  2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 7/9] app/test: add aesni_mb security cpu crypto perftest Fan Zhang
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Fan Zhang @ 2019-09-03 15:40 UTC (permalink / raw)
  To: dev
  Cc: akhil.goyal, konstantin.ananyev, declan.doherty,
	pablo.de.lara.guarch, Fan Zhang

This patch adds cpu crypto unit test for AESNI_MB PMD.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 app/test/test_security_cpu_crypto.c | 367 ++++++++++++++++++++++++++++++++++++
 1 file changed, 367 insertions(+)

diff --git a/app/test/test_security_cpu_crypto.c b/app/test/test_security_cpu_crypto.c
index ca9a8dae6..0ea406390 100644
--- a/app/test/test_security_cpu_crypto.c
+++ b/app/test/test_security_cpu_crypto.c
@@ -19,12 +19,23 @@
 
 #include "test.h"
 #include "test_cryptodev.h"
+#include "test_cryptodev_blockcipher.h"
+#include "test_cryptodev_aes_test_vectors.h"
 #include "test_cryptodev_aead_test_vectors.h"
+#include "test_cryptodev_des_test_vectors.h"
+#include "test_cryptodev_hash_test_vectors.h"
 
 #define CPU_CRYPTO_TEST_MAX_AAD_LENGTH	16
 #define MAX_NB_SIGMENTS			4
 #define CACHE_WARM_ITER			2048
 
+#define TOP_ENC		BLOCKCIPHER_TEST_OP_ENCRYPT
+#define TOP_DEC		BLOCKCIPHER_TEST_OP_DECRYPT
+#define TOP_AUTH_GEN	BLOCKCIPHER_TEST_OP_AUTH_GEN
+#define TOP_AUTH_VER	BLOCKCIPHER_TEST_OP_AUTH_VERIFY
+#define TOP_ENC_AUTH	BLOCKCIPHER_TEST_OP_ENC_AUTH_GEN
+#define TOP_AUTH_DEC	BLOCKCIPHER_TEST_OP_AUTH_VERIFY_DEC
+
 enum buffer_assemble_option {
 	SGL_MAX_SEG,
 	SGL_ONE_SEG,
@@ -516,6 +527,11 @@ cpu_crypto_test_aead(const struct aead_test_data *tdata,
 	TEST_EXPAND(gcm_test_case_256_6, type)	\
 	TEST_EXPAND(gcm_test_case_256_7, type)
 
+/* test-vector/sgl-option */
+#define all_ccm_unit_test_cases \
+	TEST_EXPAND(ccm_test_case_128_1, SGL_ONE_SEG) \
+	TEST_EXPAND(ccm_test_case_128_2, SGL_ONE_SEG) \
+	TEST_EXPAND(ccm_test_case_128_3, SGL_ONE_SEG)
 
 #define TEST_EXPAND(t, o)						\
 static int								\
@@ -531,6 +547,7 @@ cpu_crypto_aead_dec_test_##t##_##o(void)				\
 
 all_gcm_unit_test_cases(SGL_ONE_SEG)
 all_gcm_unit_test_cases(SGL_MAX_SEG)
+all_ccm_unit_test_cases
 #undef TEST_EXPAND
 
 static struct unit_test_suite security_cpu_crypto_aesgcm_testsuite  = {
@@ -758,8 +775,358 @@ test_security_cpu_crypto_aesni_gcm_perf(void)
 			&security_cpu_crypto_aesgcm_perf_testsuite);
 }
 
+static struct rte_security_session *
+create_blockcipher_session(struct rte_security_ctx *ctx,
+		struct rte_mempool *sess_mp,
+		uint32_t op_mask,
+		const struct blockcipher_test_data *test_data,
+		uint32_t is_unit_test)
+{
+	struct rte_security_session_conf sess_conf = {0};
+	struct rte_crypto_sym_xform xforms[2] = { {0} };
+	struct rte_crypto_sym_xform *cipher_xform = NULL;
+	struct rte_crypto_sym_xform *auth_xform = NULL;
+	struct rte_crypto_sym_xform *xform;
+
+	if (op_mask & BLOCKCIPHER_TEST_OP_CIPHER) {
+		cipher_xform = &xforms[0];
+		cipher_xform->type = RTE_CRYPTO_SYM_XFORM_CIPHER;
+
+		if (op_mask & TOP_ENC)
+			cipher_xform->cipher.op =
+				RTE_CRYPTO_CIPHER_OP_ENCRYPT;
+		else
+			cipher_xform->cipher.op =
+				RTE_CRYPTO_CIPHER_OP_DECRYPT;
+
+		cipher_xform->cipher.algo = test_data->crypto_algo;
+		cipher_xform->cipher.key.data = test_data->cipher_key.data;
+		cipher_xform->cipher.key.length = test_data->cipher_key.len;
+		cipher_xform->cipher.iv.offset = 0;
+		cipher_xform->cipher.iv.length = test_data->iv.len;
+
+		if (is_unit_test)
+			debug_hexdump(stdout, "cipher key:",
+					test_data->cipher_key.data,
+					test_data->cipher_key.len);
+	}
+
+	if (op_mask & BLOCKCIPHER_TEST_OP_AUTH) {
+		auth_xform = &xforms[1];
+		auth_xform->type = RTE_CRYPTO_SYM_XFORM_AUTH;
+
+		if (op_mask & TOP_AUTH_GEN)
+			auth_xform->auth.op = RTE_CRYPTO_AUTH_OP_GENERATE;
+		else
+			auth_xform->auth.op = RTE_CRYPTO_AUTH_OP_VERIFY;
+
+		auth_xform->auth.algo = test_data->auth_algo;
+		auth_xform->auth.key.length = test_data->auth_key.len;
+		auth_xform->auth.key.data = test_data->auth_key.data;
+		auth_xform->auth.digest_length = test_data->digest.len;
+
+		if (is_unit_test)
+			debug_hexdump(stdout, "auth key:",
+					test_data->auth_key.data,
+					test_data->auth_key.len);
+	}
+
+	if (op_mask == TOP_ENC ||
+			op_mask == TOP_DEC)
+		xform = cipher_xform;
+	else if (op_mask == TOP_AUTH_GEN ||
+			op_mask == TOP_AUTH_VER)
+		xform = auth_xform;
+	else if (op_mask == TOP_ENC_AUTH) {
+		xform = cipher_xform;
+		xform->next = auth_xform;
+	} else if (op_mask == TOP_AUTH_DEC) {
+		xform = auth_xform;
+		xform->next = cipher_xform;
+	} else
+		return NULL;
+
+	if (test_data->cipher_offset < test_data->auth_offset)
+		return NULL;
+
+	sess_conf.action_type = RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO;
+	sess_conf.crypto_xform = xform;
+	sess_conf.cpucrypto.cipher_offset = test_data->cipher_offset -
+			test_data->auth_offset;
+
+	return rte_security_session_create(ctx, &sess_conf, sess_mp);
+}
+
+static inline int
+assemble_blockcipher_buf(struct cpu_crypto_test_case *data,
+		struct cpu_crypto_test_obj *obj,
+		uint32_t obj_idx,
+		uint32_t op_mask,
+		const struct blockcipher_test_data *test_data,
+		uint32_t is_unit_test)
+{
+	const uint8_t *src;
+	uint32_t src_len;
+	uint32_t offset;
+
+	if (op_mask == TOP_ENC_AUTH ||
+			op_mask == TOP_AUTH_GEN ||
+			op_mask == BLOCKCIPHER_TEST_OP_AUTH_VERIFY)
+		offset = test_data->auth_offset;
+	else
+		offset = test_data->cipher_offset;
+
+	if (op_mask & TOP_ENC_AUTH) {
+		src = test_data->plaintext.data;
+		src_len = test_data->plaintext.len;
+		if (is_unit_test)
+			debug_hexdump(stdout, "plaintext:", src, src_len);
+	} else {
+		src = test_data->ciphertext.data;
+		src_len = test_data->ciphertext.len;
+		memcpy(data->digest, test_data->digest.data,
+				test_data->digest.len);
+		if (is_unit_test) {
+			debug_hexdump(stdout, "ciphertext:", src, src_len);
+			debug_hexdump(stdout, "digest:", test_data->digest.data,
+					test_data->digest.len);
+		}
+	}
+
+	if (src_len > MBUF_DATAPAYLOAD_SIZE)
+		return -ENOMEM;
+
+	memcpy(data->seg_buf[0].seg, src, src_len);
+	data->seg_buf[0].seg_len = src_len;
+	obj->vec[obj_idx][0].iov_base =
+			(void *)(data->seg_buf[0].seg + offset);
+	obj->vec[obj_idx][0].iov_len = src_len - offset;
+
+	obj->sec_buf[obj_idx].vec = obj->vec[obj_idx];
+	obj->sec_buf[obj_idx].num = 1;
+
+	memcpy(data->iv, test_data->iv.data, test_data->iv.len);
+	if (is_unit_test)
+		debug_hexdump(stdout, "iv:", test_data->iv.data,
+				test_data->iv.len);
+
+	obj->iv[obj_idx] = (void *)data->iv;
+	obj->digest[obj_idx] = (void *)data->digest;
+
+	return 0;
+}
+
+static int
+check_blockcipher_result(struct cpu_crypto_test_case *tcase,
+		uint32_t op_mask,
+		const struct blockcipher_test_data *test_data)
+{
+	int ret;
+
+	if (op_mask & BLOCKCIPHER_TEST_OP_CIPHER) {
+		const char *err_msg1, *err_msg2;
+		const uint8_t *src_pt_ct;
+		uint32_t src_len;
+
+		if (op_mask & TOP_ENC) {
+			src_pt_ct = test_data->ciphertext.data;
+			src_len = test_data->ciphertext.len;
+			err_msg1 = CPU_CRYPTO_ERR_EXP_CT;
+			err_msg2 = CPU_CRYPTO_ERR_GEN_CT;
+		} else {
+			src_pt_ct = test_data->plaintext.data;
+			src_len = test_data->plaintext.len;
+			err_msg1 = CPU_CRYPTO_ERR_EXP_PT;
+			err_msg2 = CPU_CRYPTO_ERR_GEN_PT;
+		}
+
+		ret = memcmp(tcase->seg_buf[0].seg, src_pt_ct, src_len);
+		if (ret != 0) {
+			debug_hexdump(stdout, err_msg1, src_pt_ct, src_len);
+			debug_hexdump(stdout, err_msg2,
+					tcase->seg_buf[0].seg,
+					test_data->ciphertext.len);
+			return -1;
+		}
+	}
+
+	if (op_mask & TOP_AUTH_GEN) {
+		ret = memcmp(tcase->digest, test_data->digest.data,
+				test_data->digest.len);
+		if (ret != 0) {
+			debug_hexdump(stdout, "expect digest:",
+					test_data->digest.data,
+					test_data->digest.len);
+			debug_hexdump(stdout, "gen digest:",
+					tcase->digest,
+					test_data->digest.len);
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
+static int
+cpu_crypto_test_blockcipher(const struct blockcipher_test_data *tdata,
+		uint32_t op_mask)
+{
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+	struct cpu_crypto_test_obj *obj = &ut_params->test_obj;
+	struct cpu_crypto_test_case *tcase;
+	int ret;
+
+	ut_params->sess = create_blockcipher_session(ts_params->ctx,
+			ts_params->session_priv_mpool,
+			op_mask,
+			tdata,
+			1);
+	if (!ut_params->sess)
+		return -1;
+
+	ret = allocate_buf(1);
+	if (ret)
+		return ret;
+
+	tcase = ut_params->test_datas[0];
+	ret = assemble_blockcipher_buf(tcase, obj, 0, op_mask, tdata, 1);
+	if (ret < 0) {
+		printf("Test is not supported by the driver\n");
+		return ret;
+	}
+
+	run_test(ts_params->ctx, ut_params->sess, obj, 1);
+
+	ret = check_status(obj, 1);
+	if (ret < 0)
+		return ret;
+
+	ret = check_blockcipher_result(tcase, op_mask, tdata);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+/* Macro to save code for defining BlockCipher test cases */
+/* test-vector-name/op */
+#define all_blockcipher_test_cases \
+	TEST_EXPAND(aes_test_data_1, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_1, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_1, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_1, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_2, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_2, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_2, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_2, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_3, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_3, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_3, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_3, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_4, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_4, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_4, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_4, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_5, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_5, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_5, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_5, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_6, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_6, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_6, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_6, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_7, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_7, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_7, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_7, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_8, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_8, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_8, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_8, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_9, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_9, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_9, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_9, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_10, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_10, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_11, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_11, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_12, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_12, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_12, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_12, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_13, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_13, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_13, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_13, TOP_AUTH_DEC) \
+	TEST_EXPAND(des_test_data_1, TOP_ENC) \
+	TEST_EXPAND(des_test_data_1, TOP_DEC) \
+	TEST_EXPAND(des_test_data_2, TOP_ENC) \
+	TEST_EXPAND(des_test_data_2, TOP_DEC) \
+	TEST_EXPAND(des_test_data_3, TOP_ENC) \
+	TEST_EXPAND(des_test_data_3, TOP_DEC) \
+	TEST_EXPAND(triple_des128cbc_hmac_sha1_test_vector, TOP_ENC) \
+	TEST_EXPAND(triple_des128cbc_hmac_sha1_test_vector, TOP_DEC) \
+	TEST_EXPAND(triple_des128cbc_hmac_sha1_test_vector, TOP_ENC_AUTH) \
+	TEST_EXPAND(triple_des128cbc_hmac_sha1_test_vector, TOP_AUTH_DEC) \
+	TEST_EXPAND(triple_des64cbc_test_vector, TOP_ENC) \
+	TEST_EXPAND(triple_des64cbc_test_vector, TOP_DEC) \
+	TEST_EXPAND(triple_des128cbc_test_vector, TOP_ENC) \
+	TEST_EXPAND(triple_des128cbc_test_vector, TOP_DEC) \
+	TEST_EXPAND(triple_des192cbc_test_vector, TOP_ENC) \
+	TEST_EXPAND(triple_des192cbc_test_vector, TOP_DEC) \
+
+#define TEST_EXPAND(t, o)						\
+static int								\
+cpu_crypto_blockcipher_test_##t##_##o(void)				\
+{									\
+	return cpu_crypto_test_blockcipher(&t, o);			\
+}
+
+all_blockcipher_test_cases
+#undef TEST_EXPAND
+
+static struct unit_test_suite security_cpu_crypto_aesni_mb_testsuite  = {
+	.suite_name = "Security CPU Crypto AESNI-MB Unit Test Suite",
+	.setup = testsuite_setup,
+	.teardown = testsuite_teardown,
+	.unit_test_cases = {
+#define TEST_EXPAND(t, o)						\
+	TEST_CASE_ST(ut_setup, ut_teardown,				\
+			cpu_crypto_aead_enc_test_##t##_##o),		\
+	TEST_CASE_ST(ut_setup, ut_teardown,				\
+			cpu_crypto_aead_dec_test_##t##_##o),		\
+
+	all_gcm_unit_test_cases(SGL_ONE_SEG)
+	all_ccm_unit_test_cases
+#undef TEST_EXPAND
+
+#define TEST_EXPAND(t, o)						\
+	TEST_CASE_ST(ut_setup, ut_teardown,				\
+			cpu_crypto_blockcipher_test_##t##_##o),		\
+
+	all_blockcipher_test_cases
+#undef TEST_EXPAND
+
+	TEST_CASES_END() /**< NULL terminate unit test array */
+	},
+};
+
+static int
+test_security_cpu_crypto_aesni_mb(void)
+{
+	gbl_driver_id =	rte_cryptodev_driver_id_get(
+			RTE_STR(CRYPTODEV_NAME_AESNI_MB_PMD));
+
+	return unit_test_suite_runner(&security_cpu_crypto_aesni_mb_testsuite);
+}
+
 REGISTER_TEST_COMMAND(security_aesni_gcm_autotest,
 		test_security_cpu_crypto_aesni_gcm);
 
 REGISTER_TEST_COMMAND(security_aesni_gcm_perftest,
 		test_security_cpu_crypto_aesni_gcm_perf);
+
+REGISTER_TEST_COMMAND(security_aesni_mb_autotest,
+		test_security_cpu_crypto_aesni_mb);
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [RFC PATCH 7/9] app/test: add aesni_mb security cpu crypto perftest
  2019-09-03 15:40 [dpdk-dev] [RFC PATCH 0/9] security: add software synchronous crypto process Fan Zhang
                   ` (5 preceding siblings ...)
  2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 6/9] app/test: add aesni_mb security cpu crypto autotest Fan Zhang
@ 2019-09-03 15:40 ` Fan Zhang
  2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 8/9] ipsec: add rte_security cpu_crypto action support Fan Zhang
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Fan Zhang @ 2019-09-03 15:40 UTC (permalink / raw)
  To: dev
  Cc: akhil.goyal, konstantin.ananyev, declan.doherty,
	pablo.de.lara.guarch, Fan Zhang

Since crypto perf application does not support rte_security, this patch
adds a simple AES-CBC-SHA1-HMAC CPU crypto performance test to crypto
unittest application. The test includes different key and data sizes test
with single buffer test items and will display the throughput as well as
cycle count performance information.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 app/test/test_security_cpu_crypto.c | 194 ++++++++++++++++++++++++++++++++++++
 1 file changed, 194 insertions(+)

diff --git a/app/test/test_security_cpu_crypto.c b/app/test/test_security_cpu_crypto.c
index 0ea406390..6e012672e 100644
--- a/app/test/test_security_cpu_crypto.c
+++ b/app/test/test_security_cpu_crypto.c
@@ -1122,6 +1122,197 @@ test_security_cpu_crypto_aesni_mb(void)
 	return unit_test_suite_runner(&security_cpu_crypto_aesni_mb_testsuite);
 }
 
+static inline void
+switch_blockcipher_enc_to_dec(struct blockcipher_test_data *tdata,
+		struct cpu_crypto_test_case *tcase, uint8_t *dst)
+{
+	memcpy(dst, tcase->seg_buf[0].seg, tcase->seg_buf[0].seg_len);
+	tdata->ciphertext.len = tcase->seg_buf[0].seg_len;
+	memcpy(tdata->digest.data, tcase->digest, tdata->digest.len);
+}
+
+static int
+cpu_crypto_test_blockcipher_perf(
+		const enum rte_crypto_cipher_algorithm cipher_algo,
+		uint32_t cipher_key_sz,
+		const enum rte_crypto_auth_algorithm auth_algo,
+		uint32_t auth_key_sz, uint32_t digest_sz,
+		uint32_t op_mask)
+{
+	struct blockcipher_test_data tdata = {0};
+	uint8_t plaintext[3000], ciphertext[3000];
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+	struct cpu_crypto_test_obj *obj = &ut_params->test_obj;
+	struct cpu_crypto_test_case *tcase;
+	uint64_t hz = rte_get_tsc_hz(), time_start, time_now;
+	double rate, cycles_per_buf;
+	uint32_t test_data_szs[] = {64, 128, 256, 512, 1024, 2048};
+	uint32_t i, j;
+	uint32_t op_mask_opp = 0;
+	int ret;
+
+	if (op_mask & BLOCKCIPHER_TEST_OP_CIPHER)
+		op_mask_opp |= (~op_mask & BLOCKCIPHER_TEST_OP_CIPHER);
+	if (op_mask & BLOCKCIPHER_TEST_OP_AUTH)
+		op_mask_opp |= (~op_mask & BLOCKCIPHER_TEST_OP_AUTH);
+
+	tdata.plaintext.data = plaintext;
+	tdata.ciphertext.data = ciphertext;
+
+	tdata.cipher_key.len = cipher_key_sz;
+	tdata.auth_key.len = auth_key_sz;
+
+	gen_rand(tdata.cipher_key.data, cipher_key_sz / 8);
+	gen_rand(tdata.auth_key.data, auth_key_sz / 8);
+
+	tdata.crypto_algo = cipher_algo;
+	tdata.auth_algo = auth_algo;
+
+	tdata.digest.len = digest_sz;
+
+	ut_params->sess = create_blockcipher_session(ts_params->ctx,
+			ts_params->session_priv_mpool,
+			op_mask,
+			&tdata,
+			0);
+	if (!ut_params->sess)
+		return -1;
+
+	ret = allocate_buf(MAX_NUM_OPS_INFLIGHT);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < RTE_DIM(test_data_szs); i++) {
+		for (j = 0; j < MAX_NUM_OPS_INFLIGHT; j++) {
+			tdata.plaintext.len = test_data_szs[i];
+			gen_rand(plaintext, tdata.plaintext.len);
+
+			tdata.iv.len = 16;
+			gen_rand(tdata.iv.data, tdata.iv.len);
+
+			tcase = ut_params->test_datas[j];
+			ret = assemble_blockcipher_buf(tcase, obj, j,
+					op_mask,
+					&tdata,
+					0);
+			if (ret < 0) {
+				printf("Test is not supported by the driver\n");
+				return ret;
+			}
+		}
+
+		/* warm up cache */
+		for (j = 0; j < CACHE_WARM_ITER; j++)
+			run_test(ts_params->ctx, ut_params->sess, obj,
+					MAX_NUM_OPS_INFLIGHT);
+
+		time_start = rte_rdtsc();
+
+		run_test(ts_params->ctx, ut_params->sess, obj,
+				MAX_NUM_OPS_INFLIGHT);
+
+		time_now = rte_rdtsc();
+
+		rate = time_now - time_start;
+		cycles_per_buf = rate / MAX_NUM_OPS_INFLIGHT;
+
+		rate = ((hz / cycles_per_buf)) / 1000000;
+
+		printf("%s-%u-%s(%4uB) Enc %03.3fMpps (%03.3fGbps) ",
+			rte_crypto_cipher_algorithm_strings[cipher_algo],
+			cipher_key_sz * 8,
+			rte_crypto_auth_algorithm_strings[auth_algo],
+			test_data_szs[i],
+			rate, rate  * test_data_szs[i] * 8 / 1000);
+		printf("cycles per buf %03.3f per byte %03.3f\n",
+			cycles_per_buf, cycles_per_buf / test_data_szs[i]);
+
+		for (j = 0; j < MAX_NUM_OPS_INFLIGHT; j++) {
+			tcase = ut_params->test_datas[j];
+
+			switch_blockcipher_enc_to_dec(&tdata, tcase,
+					ciphertext);
+			ret = assemble_blockcipher_buf(tcase, obj, j,
+					op_mask_opp,
+					&tdata,
+					0);
+			if (ret < 0) {
+				printf("Test is not supported by the driver\n");
+				return ret;
+			}
+		}
+
+		time_start = rte_get_timer_cycles();
+
+		run_test(ts_params->ctx, ut_params->sess, obj,
+				MAX_NUM_OPS_INFLIGHT);
+
+		time_now = rte_get_timer_cycles();
+
+		rate = time_now - time_start;
+		cycles_per_buf = rate / MAX_NUM_OPS_INFLIGHT;
+
+		rate = ((hz / cycles_per_buf)) / 1000000;
+
+		printf("%s-%u-%s(%4uB) Dec %03.3fMpps (%03.3fGbps) ",
+			rte_crypto_cipher_algorithm_strings[cipher_algo],
+			cipher_key_sz * 8,
+			rte_crypto_auth_algorithm_strings[auth_algo],
+			test_data_szs[i],
+			rate, rate  * test_data_szs[i] * 8 / 1000);
+		printf("cycles per buf %03.3f per byte %03.3f\n",
+				cycles_per_buf,
+				cycles_per_buf / test_data_szs[i]);
+	}
+
+	return 0;
+}
+
+/* cipher-algo/cipher-key-len/auth-algo/auth-key-len/digest-len/op */
+#define all_block_cipher_perf_test_cases				\
+	TEST_EXPAND(_AES_CBC, 128, _NULL, 0, 0, TOP_ENC)		\
+	TEST_EXPAND(_NULL, 0, _SHA1_HMAC, 160, 20, TOP_AUTH_GEN)	\
+	TEST_EXPAND(_AES_CBC, 128, _SHA1_HMAC, 160, 20, TOP_ENC_AUTH)
+
+#define TEST_EXPAND(a, b, c, d, e, f)					\
+static int								\
+cpu_crypto_blockcipher_perf##a##_##b##c##_##f(void)			\
+{									\
+	return cpu_crypto_test_blockcipher_perf(RTE_CRYPTO_CIPHER##a,	\
+			b / 8, RTE_CRYPTO_AUTH##c, d / 8, e, f);	\
+}									\
+
+all_block_cipher_perf_test_cases
+#undef TEST_EXPAND
+
+static struct unit_test_suite security_cpu_crypto_aesni_mb_perf_testsuite  = {
+	.suite_name = "Security CPU Crypto AESNI-MB Perf Test Suite",
+	.setup = testsuite_setup,
+	.teardown = testsuite_teardown,
+	.unit_test_cases = {
+#define TEST_EXPAND(a, b, c, d, e, f)					\
+	TEST_CASE_ST(ut_setup, ut_teardown,				\
+		cpu_crypto_blockcipher_perf##a##_##b##c##_##f),	\
+
+	all_block_cipher_perf_test_cases
+#undef TEST_EXPAND
+
+	TEST_CASES_END() /**< NULL terminate unit test array */
+	},
+};
+
+static int
+test_security_cpu_crypto_aesni_mb_perf(void)
+{
+	gbl_driver_id =	rte_cryptodev_driver_id_get(
+			RTE_STR(CRYPTODEV_NAME_AESNI_MB_PMD));
+
+	return unit_test_suite_runner(
+			&security_cpu_crypto_aesni_mb_perf_testsuite);
+}
+
+
 REGISTER_TEST_COMMAND(security_aesni_gcm_autotest,
 		test_security_cpu_crypto_aesni_gcm);
 
@@ -1130,3 +1321,6 @@ REGISTER_TEST_COMMAND(security_aesni_gcm_perftest,
 
 REGISTER_TEST_COMMAND(security_aesni_mb_autotest,
 		test_security_cpu_crypto_aesni_mb);
+
+REGISTER_TEST_COMMAND(security_aesni_mb_perftest,
+		test_security_cpu_crypto_aesni_mb_perf);
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [RFC PATCH 8/9] ipsec: add rte_security cpu_crypto action support
  2019-09-03 15:40 [dpdk-dev] [RFC PATCH 0/9] security: add software synchronous crypto process Fan Zhang
                   ` (6 preceding siblings ...)
  2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 7/9] app/test: add aesni_mb security cpu crypto perftest Fan Zhang
@ 2019-09-03 15:40 ` Fan Zhang
  2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 9/9] examples/ipsec-secgw: add security " Fan Zhang
  2019-09-06 13:13 ` [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process Fan Zhang
  9 siblings, 0 replies; 87+ messages in thread
From: Fan Zhang @ 2019-09-03 15:40 UTC (permalink / raw)
  To: dev
  Cc: akhil.goyal, konstantin.ananyev, declan.doherty,
	pablo.de.lara.guarch, Fan Zhang

This patch updates the ipsec library to handle the newly introduced
RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO action.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 lib/librte_ipsec/esp_inb.c  | 174 +++++++++++++++++++++++++-
 lib/librte_ipsec/esp_outb.c | 290 +++++++++++++++++++++++++++++++++++++++++++-
 lib/librte_ipsec/sa.c       |  53 ++++++--
 lib/librte_ipsec/sa.h       |  29 +++++
 lib/librte_ipsec/ses.c      |   4 +-
 5 files changed, 539 insertions(+), 11 deletions(-)

diff --git a/lib/librte_ipsec/esp_inb.c b/lib/librte_ipsec/esp_inb.c
index 8e3ecbc64..2220df0f6 100644
--- a/lib/librte_ipsec/esp_inb.c
+++ b/lib/librte_ipsec/esp_inb.c
@@ -105,6 +105,73 @@ inb_cop_prepare(struct rte_crypto_op *cop,
 	}
 }
 
+static inline int
+inb_sync_crypto_proc_prepare(const struct rte_ipsec_sa *sa, struct rte_mbuf *mb,
+	const union sym_op_data *icv, uint32_t pofs, uint32_t plen,
+	struct rte_security_vec *buf, struct iovec *cur_vec,
+	void *iv, void **aad, void **digest)
+{
+	struct rte_mbuf *ms;
+	struct iovec *vec = cur_vec;
+	struct aead_gcm_iv *gcm;
+	struct aesctr_cnt_blk *ctr;
+	uint64_t *ivp;
+	uint32_t algo, left, off = 0, n_seg = 0;
+
+	ivp = rte_pktmbuf_mtod_offset(mb, uint64_t *,
+		pofs + sizeof(struct rte_esp_hdr));
+	algo = sa->algo_type;
+
+	switch (algo) {
+	case ALGO_TYPE_AES_GCM:
+		gcm = (struct aead_gcm_iv *)iv;
+		aead_gcm_iv_fill(gcm, ivp[0], sa->salt);
+		*aad = icv->va + sa->icv_len;
+		off = sa->ctp.cipher.offset + pofs;
+		break;
+	case ALGO_TYPE_AES_CBC:
+	case ALGO_TYPE_3DES_CBC:
+		off = sa->ctp.auth.offset + pofs;
+		break;
+	case ALGO_TYPE_AES_CTR:
+		off = sa->ctp.auth.offset + pofs;
+		ctr = (struct aesctr_cnt_blk *)iv;
+		aes_ctr_cnt_blk_fill(ctr, ivp[0], sa->salt);
+		break;
+	case ALGO_TYPE_NULL:
+		break;
+	}
+
+	*digest = icv->va;
+
+	left = plen - sa->ctp.cipher.length;
+
+	ms = mbuf_get_seg_ofs(mb, &off);
+	if (!ms)
+		return -1;
+
+	while (n_seg < RTE_LIBRTE_IP_FRAG_MAX_FRAG && left && ms) {
+		uint32_t len = RTE_MIN(left, ms->data_len - off);
+
+		vec->iov_base = rte_pktmbuf_mtod_offset(ms, void *, off);
+		vec->iov_len = len;
+
+		left -= len;
+		vec++;
+		n_seg++;
+		ms = ms->next;
+		off = 0;
+	}
+
+	if (left)
+		return -1;
+
+	buf->vec = cur_vec;
+	buf->num = n_seg;
+
+	return n_seg;
+}
+
 /*
  * Helper function for prepare() to deal with situation when
  * ICV is spread by two segments. Tries to move ICV completely into the
@@ -512,7 +579,6 @@ tun_process(const struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
 	return k;
 }
 
-
 /*
  * *process* function for tunnel packets
  */
@@ -625,6 +691,112 @@ esp_inb_pkt_process(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
 	return n;
 }
 
+/*
+ * process packets using sync crypto engine
+ */
+static uint16_t
+esp_inb_sync_crypto_pkt_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num, uint8_t sqh_len,
+		esp_inb_process_t process)
+{
+	int32_t rc;
+	uint32_t i, k, hl, n, p;
+	struct rte_ipsec_sa *sa;
+	struct replay_sqn *rsn;
+	union sym_op_data icv;
+	uint32_t sqn[num];
+	uint32_t dr[num];
+	struct rte_security_vec buf[num];
+	struct iovec vec[RTE_LIBRTE_IP_FRAG_MAX_FRAG * num];
+	uint32_t vec_idx = 0;
+	uint8_t ivs[num][IPSEC_MAX_IV_SIZE];
+	void *iv[num];
+	void *aad[num];
+	void *digest[num];
+	int status[num];
+
+	sa = ss->sa;
+	rsn = rsn_acquire(sa);
+
+	k = 0;
+	for (i = 0; i != num; i++) {
+		hl = mb[i]->l2_len + mb[i]->l3_len;
+		rc = inb_pkt_prepare(sa, rsn, mb[i], hl, &icv);
+		if (rc >= 0) {
+			iv[k] = (void *)ivs[k];
+			rc = inb_sync_crypto_proc_prepare(sa, mb[i], &icv, hl,
+					rc, buf + k, vec + vec_idx, iv + k,
+					&aad[k], &digest[k]);
+			if (rc < 0) {
+				dr[i - k] = i;
+				continue;
+			}
+
+			vec_idx += rc;
+			k++;
+		} else
+			dr[i - k] = i;
+	}
+
+	/* copy not prepared mbufs beyond good ones */
+	if (k != num) {
+		rte_errno = EBADMSG;
+
+		if (unlikely(k == 0))
+			return 0;
+
+		move_bad_mbufs(mb, dr, num, num - k);
+	}
+
+	/* process the packets */
+	n = 0;
+	rte_security_process_cpu_crypto_bulk(ss->security.ctx,
+			ss->security.ses, buf, iv, aad, digest, status,
+			k);
+	/* move failed process packets to dr */
+	for (i = 0; i < k; i++) {
+		if (status[i]) {
+			dr[n++] = i;
+			rte_errno = EBADMSG;
+		}
+	}
+
+	/* move bad packets to the back */
+	if (n)
+		move_bad_mbufs(mb, dr, k, n);
+
+	/* process packets */
+	p = process(sa, mb, sqn, dr, k - n, sqh_len);
+
+	if (p != k - n && p != 0)
+		move_bad_mbufs(mb, dr, k - n, k - n - p);
+
+	if (p != num)
+		rte_errno = EBADMSG;
+
+	return p;
+}
+
+uint16_t
+esp_inb_tun_sync_crypto_pkt_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
+{
+	struct rte_ipsec_sa *sa = ss->sa;
+
+	return esp_inb_sync_crypto_pkt_process(ss, mb, num, sa->sqh_len,
+			tun_process);
+}
+
+uint16_t
+esp_inb_trs_sync_crypto_pkt_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
+{
+	struct rte_ipsec_sa *sa = ss->sa;
+
+	return esp_inb_sync_crypto_pkt_process(ss, mb, num, sa->sqh_len,
+			trs_process);
+}
+
 /*
  * process group of ESP inbound tunnel packets.
  */
diff --git a/lib/librte_ipsec/esp_outb.c b/lib/librte_ipsec/esp_outb.c
index 55799a867..a3d18eefd 100644
--- a/lib/librte_ipsec/esp_outb.c
+++ b/lib/librte_ipsec/esp_outb.c
@@ -403,6 +403,292 @@ esp_outb_trs_prepare(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
 	return k;
 }
 
+
+static inline int
+outb_sync_crypto_proc_prepare(struct rte_mbuf *m, const struct rte_ipsec_sa *sa,
+		const uint64_t ivp[IPSEC_MAX_IV_QWORD],
+		const union sym_op_data *icv, uint32_t hlen, uint32_t plen,
+		struct rte_security_vec *buf, struct iovec *cur_vec, void *iv,
+		void **aad, void **digest)
+{
+	struct rte_mbuf *ms;
+	struct aead_gcm_iv *gcm;
+	struct aesctr_cnt_blk *ctr;
+	struct iovec *vec = cur_vec;
+	uint32_t left, off = 0, n_seg = 0;
+	uint32_t algo;
+
+	algo = sa->algo_type;
+
+	switch (algo) {
+	case ALGO_TYPE_AES_GCM:
+		gcm = iv;
+		aead_gcm_iv_fill(gcm, ivp[0], sa->salt);
+		*aad = (void *)(icv->va + sa->icv_len);
+		off = sa->ctp.cipher.offset + hlen;
+		break;
+	case ALGO_TYPE_AES_CBC:
+	case ALGO_TYPE_3DES_CBC:
+		off = sa->ctp.auth.offset + hlen;
+		break;
+	case ALGO_TYPE_AES_CTR:
+		ctr = iv;
+		aes_ctr_cnt_blk_fill(ctr, ivp[0], sa->salt);
+		break;
+	case ALGO_TYPE_NULL:
+		break;
+	}
+
+	*digest = (void *)icv->va;
+
+	left = sa->ctp.cipher.length + plen;
+
+	ms = mbuf_get_seg_ofs(m, &off);
+	if (!ms)
+		return -1;
+
+	while (n_seg < RTE_LIBRTE_IP_FRAG_MAX_FRAG && left && ms) {
+		uint32_t len = RTE_MIN(left, ms->data_len - off);
+
+		vec->iov_base = rte_pktmbuf_mtod_offset(ms, void *, off);
+		vec->iov_len = len;
+
+		left -= len;
+		vec++;
+		n_seg++;
+		ms = ms->next;
+		off = 0;
+	}
+
+	if (left)
+		return -1;
+
+	buf->vec = cur_vec;
+	buf->num = n_seg;
+
+	return n_seg;
+}
+
+/**
+ * Local post process function prototype that same as process function prototype
+ * as rte_ipsec_sa_pkt_func's process().
+ */
+typedef uint16_t (*sync_crypto_post_process)(const struct rte_ipsec_session *ss,
+				struct rte_mbuf *mb[],
+				uint16_t num);
+static uint16_t
+esp_outb_tun_sync_crypto_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num,
+		sync_crypto_post_process post_process)
+{
+	uint64_t sqn;
+	rte_be64_t sqc;
+	struct rte_ipsec_sa *sa;
+	struct rte_security_ctx *ctx;
+	struct rte_security_session *rss;
+	union sym_op_data icv;
+	struct rte_security_vec buf[num];
+	struct iovec vec[RTE_LIBRTE_IP_FRAG_MAX_FRAG * num];
+	uint32_t vec_idx = 0;
+	void *aad[num];
+	void *digest[num];
+	void *iv[num];
+	uint8_t ivs[num][IPSEC_MAX_IV_SIZE];
+	uint64_t ivp[IPSEC_MAX_IV_QWORD];
+	int status[num];
+	uint32_t dr[num];
+	uint32_t i, n, k;
+	int32_t rc;
+
+	sa = ss->sa;
+	ctx = ss->security.ctx;
+	rss = ss->security.ses;
+
+	k = 0;
+	n = num;
+	sqn = esn_outb_update_sqn(sa, &n);
+	if (n != num)
+		rte_errno = EOVERFLOW;
+
+	for (i = 0; i != n; i++) {
+		sqc = rte_cpu_to_be_64(sqn + i);
+		gen_iv(ivp, sqc);
+
+		/* try to update the packet itself */
+		rc = outb_tun_pkt_prepare(sa, sqc, ivp, mb[i], &icv,
+				sa->sqh_len);
+
+		/* success, setup crypto op */
+		if (rc >= 0) {
+			outb_pkt_xprepare(sa, sqc, &icv);
+
+			iv[k] = (void *)ivs[k];
+			rc = outb_sync_crypto_proc_prepare(mb[i], sa, ivp, &icv,
+					0, rc, buf + k, vec + vec_idx, iv + k,
+					&aad[k], &digest[k]);
+			if (rc < 0) {
+				dr[i - k] = i;
+				rte_errno = -rc;
+				continue;
+			}
+
+			vec_idx += rc;
+			k++;
+		/* failure, put packet into the death-row */
+		} else {
+			dr[i - k] = i;
+			rte_errno = -rc;
+		}
+	}
+
+	 /* copy not prepared mbufs beyond good ones */
+	if (k != n && k != 0)
+		move_bad_mbufs(mb, dr, n, n - k);
+
+	if (unlikely(k == 0)) {
+		rte_errno = EBADMSG;
+		return 0;
+	}
+
+	/* process the packets */
+	n = 0;
+	rte_security_process_cpu_crypto_bulk(ctx, rss, buf, (void **)iv,
+			(void **)aad, (void **)digest, status, k);
+	/* move failed process packets to dr */
+	for (i = 0; i < n; i++) {
+		if (status[i])
+			dr[n++] = i;
+	}
+
+	if (n)
+		move_bad_mbufs(mb, dr, k, n);
+
+	return post_process(ss, mb, k - n);
+}
+
+static uint16_t
+esp_outb_trs_sync_crypto_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num,
+		sync_crypto_post_process post_process)
+
+{
+	uint64_t sqn;
+	rte_be64_t sqc;
+	struct rte_ipsec_sa *sa;
+	struct rte_security_ctx *ctx;
+	struct rte_security_session *rss;
+	union sym_op_data icv;
+	struct rte_security_vec buf[num];
+	struct iovec vec[RTE_LIBRTE_IP_FRAG_MAX_FRAG * num];
+	uint32_t vec_idx = 0;
+	void *aad[num];
+	void *digest[num];
+	uint8_t ivs[num][IPSEC_MAX_IV_SIZE];
+	void *iv[num];
+	int status[num];
+	uint64_t ivp[IPSEC_MAX_IV_QWORD];
+	uint32_t dr[num];
+	uint32_t i, n, k;
+	uint32_t l2, l3;
+	int32_t rc;
+
+	sa = ss->sa;
+	ctx = ss->security.ctx;
+	rss = ss->security.ses;
+
+	k = 0;
+	n = num;
+	sqn = esn_outb_update_sqn(sa, &n);
+	if (n != num)
+		rte_errno = EOVERFLOW;
+
+	for (i = 0; i != n; i++) {
+		l2 = mb[i]->l2_len;
+		l3 = mb[i]->l3_len;
+
+		sqc = rte_cpu_to_be_64(sqn + i);
+		gen_iv(ivp, sqc);
+
+		/* try to update the packet itself */
+		rc = outb_trs_pkt_prepare(sa, sqc, ivp, mb[i], l2, l3, &icv,
+				sa->sqh_len);
+
+		/* success, setup crypto op */
+		if (rc >= 0) {
+			outb_pkt_xprepare(sa, sqc, &icv);
+
+			iv[k] = (void *)ivs[k];
+
+			rc = outb_sync_crypto_proc_prepare(mb[i], sa, ivp, &icv,
+					l2 + l3, rc, buf + k, vec + vec_idx,
+					iv + k, &aad[k], &digest[k]);
+			if (rc < 0) {
+				dr[i - k] = i;
+				rte_errno = -rc;
+				continue;
+			}
+
+			vec_idx += rc;
+			k++;
+		/* failure, put packet into the death-row */
+		} else {
+			dr[i - k] = i;
+			rte_errno = -rc;
+		}
+	}
+
+	 /* copy not prepared mbufs beyond good ones */
+	if (k != n && k != 0)
+		move_bad_mbufs(mb, dr, n, n - k);
+
+	/* process the packets */
+	n = 0;
+	rte_security_process_cpu_crypto_bulk(ctx, rss, buf, (void **)iv,
+			(void **)aad, (void **)digest, status, k);
+	/* move failed process packets to dr */
+	for (i = 0; i < k; i++) {
+		if (status[i])
+			dr[n++] = i;
+	}
+
+	if (n)
+		move_bad_mbufs(mb, dr, k, n);
+
+	return post_process(ss, mb, k - n);
+}
+
+uint16_t
+esp_outb_tun_sync_crpyto_sqh_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
+{
+	return esp_outb_tun_sync_crypto_process(ss, mb, num,
+			esp_outb_sqh_process);
+}
+
+uint16_t
+esp_outb_tun_sync_crpyto_flag_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
+{
+	return esp_outb_tun_sync_crypto_process(ss, mb, num,
+			esp_outb_pkt_flag_process);
+}
+
+uint16_t
+esp_outb_trs_sync_crpyto_sqh_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
+{
+	return esp_outb_trs_sync_crypto_process(ss, mb, num,
+			esp_outb_sqh_process);
+}
+
+uint16_t
+esp_outb_trs_sync_crpyto_flag_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
+{
+	return esp_outb_trs_sync_crypto_process(ss, mb, num,
+			esp_outb_pkt_flag_process);
+}
+
 /*
  * process outbound packets for SA with ESN support,
  * for algorithms that require SQN.hibits to be implictly included
@@ -410,8 +696,8 @@ esp_outb_trs_prepare(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
  * In that case we have to move ICV bytes back to their proper place.
  */
 uint16_t
-esp_outb_sqh_process(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
-	uint16_t num)
+esp_outb_sqh_process(const struct rte_ipsec_session *ss,
+	struct rte_mbuf *mb[], uint16_t num)
 {
 	uint32_t i, k, icv_len, *icv;
 	struct rte_mbuf *ml;
diff --git a/lib/librte_ipsec/sa.c b/lib/librte_ipsec/sa.c
index 23d394b46..31ffbce2c 100644
--- a/lib/librte_ipsec/sa.c
+++ b/lib/librte_ipsec/sa.c
@@ -544,9 +544,9 @@ lksd_proto_prepare(const struct rte_ipsec_session *ss,
  * - inbound/outbound for RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
  * - outbound for RTE_SECURITY_ACTION_TYPE_NONE when ESN is disabled
  */
-static uint16_t
-pkt_flag_process(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
-	uint16_t num)
+uint16_t
+esp_outb_pkt_flag_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
 {
 	uint32_t i, k;
 	uint32_t dr[num];
@@ -599,12 +599,48 @@ lksd_none_pkt_func_select(const struct rte_ipsec_sa *sa,
 	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TUNLV6):
 		pf->prepare = esp_outb_tun_prepare;
 		pf->process = (sa->sqh_len != 0) ?
-			esp_outb_sqh_process : pkt_flag_process;
+			esp_outb_sqh_process : esp_outb_pkt_flag_process;
 		break;
 	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TRANS):
 		pf->prepare = esp_outb_trs_prepare;
 		pf->process = (sa->sqh_len != 0) ?
-			esp_outb_sqh_process : pkt_flag_process;
+			esp_outb_sqh_process : esp_outb_pkt_flag_process;
+		break;
+	default:
+		rc = -ENOTSUP;
+	}
+
+	return rc;
+}
+
+static int
+lksd_sync_crypto_pkt_func_select(const struct rte_ipsec_sa *sa,
+		struct rte_ipsec_sa_pkt_func *pf)
+{
+	int32_t rc;
+
+	static const uint64_t msk = RTE_IPSEC_SATP_DIR_MASK |
+			RTE_IPSEC_SATP_MODE_MASK;
+
+	rc = 0;
+	switch (sa->type & msk) {
+	case (RTE_IPSEC_SATP_DIR_IB | RTE_IPSEC_SATP_MODE_TUNLV4):
+	case (RTE_IPSEC_SATP_DIR_IB | RTE_IPSEC_SATP_MODE_TUNLV6):
+		pf->process = esp_inb_tun_sync_crypto_pkt_process;
+		break;
+	case (RTE_IPSEC_SATP_DIR_IB | RTE_IPSEC_SATP_MODE_TRANS):
+		pf->process = esp_inb_trs_sync_crypto_pkt_process;
+		break;
+	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TUNLV4):
+	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TUNLV6):
+		pf->process = (sa->sqh_len != 0) ?
+			esp_outb_tun_sync_crpyto_sqh_process :
+			esp_outb_tun_sync_crpyto_flag_process;
+		break;
+	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TRANS):
+		pf->process = (sa->sqh_len != 0) ?
+			esp_outb_trs_sync_crpyto_sqh_process :
+			esp_outb_trs_sync_crpyto_flag_process;
 		break;
 	default:
 		rc = -ENOTSUP;
@@ -672,13 +708,16 @@ ipsec_sa_pkt_func_select(const struct rte_ipsec_session *ss,
 	case RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL:
 		if ((sa->type & RTE_IPSEC_SATP_DIR_MASK) ==
 				RTE_IPSEC_SATP_DIR_IB)
-			pf->process = pkt_flag_process;
+			pf->process = esp_outb_pkt_flag_process;
 		else
 			pf->process = inline_proto_outb_pkt_process;
 		break;
 	case RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL:
 		pf->prepare = lksd_proto_prepare;
-		pf->process = pkt_flag_process;
+		pf->process = esp_outb_pkt_flag_process;
+		break;
+	case RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO:
+		rc = lksd_sync_crypto_pkt_func_select(sa, pf);
 		break;
 	default:
 		rc = -ENOTSUP;
diff --git a/lib/librte_ipsec/sa.h b/lib/librte_ipsec/sa.h
index 51e69ad05..02c7abc60 100644
--- a/lib/librte_ipsec/sa.h
+++ b/lib/librte_ipsec/sa.h
@@ -156,6 +156,14 @@ uint16_t
 inline_inb_trs_pkt_process(const struct rte_ipsec_session *ss,
 	struct rte_mbuf *mb[], uint16_t num);
 
+uint16_t
+esp_inb_tun_sync_crypto_pkt_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num);
+
+uint16_t
+esp_inb_trs_sync_crypto_pkt_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num);
+
 /* outbound processing */
 
 uint16_t
@@ -170,6 +178,10 @@ uint16_t
 esp_outb_sqh_process(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
 	uint16_t num);
 
+uint16_t
+esp_outb_pkt_flag_process(const struct rte_ipsec_session *ss,
+	struct rte_mbuf *mb[], uint16_t num);
+
 uint16_t
 inline_outb_tun_pkt_process(const struct rte_ipsec_session *ss,
 	struct rte_mbuf *mb[], uint16_t num);
@@ -182,4 +194,21 @@ uint16_t
 inline_proto_outb_pkt_process(const struct rte_ipsec_session *ss,
 	struct rte_mbuf *mb[], uint16_t num);
 
+uint16_t
+esp_outb_tun_sync_crpyto_sqh_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num);
+
+uint16_t
+esp_outb_tun_sync_crpyto_flag_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num);
+
+uint16_t
+esp_outb_trs_sync_crpyto_sqh_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num);
+
+uint16_t
+esp_outb_trs_sync_crpyto_flag_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num);
+
+
 #endif /* _SA_H_ */
diff --git a/lib/librte_ipsec/ses.c b/lib/librte_ipsec/ses.c
index 82c765a33..eaa8c17b7 100644
--- a/lib/librte_ipsec/ses.c
+++ b/lib/librte_ipsec/ses.c
@@ -19,7 +19,9 @@ session_check(struct rte_ipsec_session *ss)
 			return -EINVAL;
 		if ((ss->type == RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO ||
 				ss->type ==
-				RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL) &&
+				RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL ||
+				ss->type ==
+				RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO) &&
 				ss->security.ctx == NULL)
 			return -EINVAL;
 	}
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [RFC PATCH 9/9] examples/ipsec-secgw: add security cpu_crypto action support
  2019-09-03 15:40 [dpdk-dev] [RFC PATCH 0/9] security: add software synchronous crypto process Fan Zhang
                   ` (7 preceding siblings ...)
  2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 8/9] ipsec: add rte_security cpu_crypto action support Fan Zhang
@ 2019-09-03 15:40 ` Fan Zhang
  2019-09-06 13:13 ` [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process Fan Zhang
  9 siblings, 0 replies; 87+ messages in thread
From: Fan Zhang @ 2019-09-03 15:40 UTC (permalink / raw)
  To: dev
  Cc: akhil.goyal, konstantin.ananyev, declan.doherty,
	pablo.de.lara.guarch, Fan Zhang

Since ipsec library is added cpu_crypto security action type support,
this patch updates ipsec-secgw sample application with added action type
"cpu-crypto". The patch also includes a number of test scripts to
prove the correctness of the implementation.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 examples/ipsec-secgw/ipsec.c                       | 22 ++++++++++++++++++++++
 examples/ipsec-secgw/ipsec_process.c               |  4 ++--
 examples/ipsec-secgw/sa.c                          | 13 +++++++++++--
 examples/ipsec-secgw/test/run_test.sh              | 10 ++++++++++
 .../test/trs_3descbc_sha1_cpu_crypto_defs.sh       |  5 +++++
 .../test/trs_aescbc_sha1_cpu_crypto_defs.sh        |  5 +++++
 .../test/trs_aesctr_sha1_cpu_crypto_defs.sh        |  5 +++++
 .../ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh |  5 +++++
 .../test/trs_aesgcm_mb_cpu_crypto_defs.sh          |  7 +++++++
 .../test/tun_3descbc_sha1_cpu_crypto_defs.sh       |  5 +++++
 .../test/tun_aescbc_sha1_cpu_crypto_defs.sh        |  5 +++++
 .../test/tun_aesctr_sha1_cpu_crypto_defs.sh        |  5 +++++
 .../ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh |  5 +++++
 .../test/tun_aesgcm_mb_cpu_crypto_defs.sh          |  7 +++++++
 14 files changed, 99 insertions(+), 4 deletions(-)
 create mode 100644 examples/ipsec-secgw/test/trs_3descbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aescbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aesctr_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aesgcm_mb_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_3descbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aescbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aesctr_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aesgcm_mb_cpu_crypto_defs.sh

diff --git a/examples/ipsec-secgw/ipsec.c b/examples/ipsec-secgw/ipsec.c
index dc85adfe5..4c39a7de6 100644
--- a/examples/ipsec-secgw/ipsec.c
+++ b/examples/ipsec-secgw/ipsec.c
@@ -10,6 +10,7 @@
 #include <rte_crypto.h>
 #include <rte_security.h>
 #include <rte_cryptodev.h>
+#include <rte_ipsec.h>
 #include <rte_ethdev.h>
 #include <rte_mbuf.h>
 #include <rte_hash.h>
@@ -105,6 +106,26 @@ create_lookaside_session(struct ipsec_ctx *ipsec_ctx, struct ipsec_sa *sa)
 				"SEC Session init failed: err: %d\n", ret);
 				return -1;
 			}
+		} else if (sa->type == RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO) {
+			struct rte_security_ctx *ctx =
+				(struct rte_security_ctx *)
+				rte_cryptodev_get_sec_ctx(
+					ipsec_ctx->tbl[cdev_id_qp].id);
+			int32_t offset = sizeof(struct rte_esp_hdr) +
+					sa->iv_len;
+
+			/* Set IPsec parameters in conf */
+			sess_conf.cpucrypto.cipher_offset = offset;
+
+			set_ipsec_conf(sa, &(sess_conf.ipsec));
+			sa->security_ctx = ctx;
+			sa->sec_session = rte_security_session_create(ctx,
+				&sess_conf, ipsec_ctx->session_priv_pool);
+			if (sa->sec_session == NULL) {
+				RTE_LOG(ERR, IPSEC,
+				"SEC Session init failed: err: %d\n", ret);
+				return -1;
+			}
 		} else {
 			RTE_LOG(ERR, IPSEC, "Inline not supported\n");
 			return -1;
@@ -473,6 +494,7 @@ ipsec_enqueue(ipsec_xform_fn xform_func, struct ipsec_ctx *ipsec_ctx,
 						sa->sec_session, pkts[i], NULL);
 			continue;
 		case RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO:
+		case RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO:
 			RTE_ASSERT(sa->sec_session != NULL);
 			priv->cop.type = RTE_CRYPTO_OP_TYPE_SYMMETRIC;
 			priv->cop.status = RTE_CRYPTO_OP_STATUS_NOT_PROCESSED;
diff --git a/examples/ipsec-secgw/ipsec_process.c b/examples/ipsec-secgw/ipsec_process.c
index 868f1a28d..73bfb314e 100644
--- a/examples/ipsec-secgw/ipsec_process.c
+++ b/examples/ipsec-secgw/ipsec_process.c
@@ -227,8 +227,8 @@ ipsec_process(struct ipsec_ctx *ctx, struct ipsec_traffic *trf)
 
 		/* process packets inline */
 		else if (sa->type == RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO ||
-				sa->type ==
-				RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL) {
+			sa->type == RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL ||
+			sa->type == RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO) {
 
 			satp = rte_ipsec_sa_type(ips->sa);
 
diff --git a/examples/ipsec-secgw/sa.c b/examples/ipsec-secgw/sa.c
index c3cf3bd1f..ba773346f 100644
--- a/examples/ipsec-secgw/sa.c
+++ b/examples/ipsec-secgw/sa.c
@@ -570,6 +570,9 @@ parse_sa_tokens(char **tokens, uint32_t n_tokens,
 				RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL;
 			else if (strcmp(tokens[ti], "no-offload") == 0)
 				rule->type = RTE_SECURITY_ACTION_TYPE_NONE;
+			else if (strcmp(tokens[ti], "cpu-crypto") == 0)
+				rule->type =
+					RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO;
 			else {
 				APP_CHECK(0, status, "Invalid input \"%s\"",
 						tokens[ti]);
@@ -624,10 +627,13 @@ parse_sa_tokens(char **tokens, uint32_t n_tokens,
 	if (status->status < 0)
 		return;
 
-	if ((rule->type != RTE_SECURITY_ACTION_TYPE_NONE) && (portid_p == 0))
+	if ((rule->type != RTE_SECURITY_ACTION_TYPE_NONE && rule->type !=
+			RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO) &&
+			(portid_p == 0))
 		printf("Missing portid option, falling back to non-offload\n");
 
-	if (!type_p || !portid_p) {
+	if (!type_p || (!portid_p && rule->type !=
+			RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO)) {
 		rule->type = RTE_SECURITY_ACTION_TYPE_NONE;
 		rule->portid = -1;
 	}
@@ -709,6 +715,9 @@ print_one_sa_rule(const struct ipsec_sa *sa, int inbound)
 	case RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL:
 		printf("lookaside-protocol-offload ");
 		break;
+	case RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO:
+		printf("cpu-crypto-accelerated");
+		break;
 	}
 	printf("\n");
 }
diff --git a/examples/ipsec-secgw/test/run_test.sh b/examples/ipsec-secgw/test/run_test.sh
index 8055a4c04..f322aa785 100755
--- a/examples/ipsec-secgw/test/run_test.sh
+++ b/examples/ipsec-secgw/test/run_test.sh
@@ -32,15 +32,21 @@ usage()
 }
 
 LINUX_TEST="tun_aescbc_sha1 \
+tun_aescbc_sha1_cpu_crypto \
 tun_aescbc_sha1_esn \
 tun_aescbc_sha1_esn_atom \
 tun_aesgcm \
+tun_aesgcm_cpu_crypto \
+tun_aesgcm_mb_cpu_crypto \
 tun_aesgcm_esn \
 tun_aesgcm_esn_atom \
 trs_aescbc_sha1 \
+trs_aescbc_sha1_cpu_crypto \
 trs_aescbc_sha1_esn \
 trs_aescbc_sha1_esn_atom \
 trs_aesgcm \
+trs_aesgcm_cpu_crypto \
+trs_aesgcm_mb_cpu_crypto \
 trs_aesgcm_esn \
 trs_aesgcm_esn_atom \
 tun_aescbc_sha1_old \
@@ -49,17 +55,21 @@ trs_aescbc_sha1_old \
 trs_aesgcm_old \
 tun_aesctr_sha1 \
 tun_aesctr_sha1_old \
+tun_aesctr_cpu_crypto \
 tun_aesctr_sha1_esn \
 tun_aesctr_sha1_esn_atom \
 trs_aesctr_sha1 \
+trs_aesctr_sha1_cpu_crypto \
 trs_aesctr_sha1_old \
 trs_aesctr_sha1_esn \
 trs_aesctr_sha1_esn_atom \
 tun_3descbc_sha1 \
+tun_3descbc_sha1_cpu_crypto \
 tun_3descbc_sha1_old \
 tun_3descbc_sha1_esn \
 tun_3descbc_sha1_esn_atom \
 trs_3descbc_sha1 \
+trs_3descbc_sha1 \
 trs_3descbc_sha1_old \
 trs_3descbc_sha1_esn \
 trs_3descbc_sha1_esn_atom"
diff --git a/examples/ipsec-secgw/test/trs_3descbc_sha1_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/trs_3descbc_sha1_cpu_crypto_defs.sh
new file mode 100644
index 000000000..a864a8886
--- /dev/null
+++ b/examples/ipsec-secgw/test/trs_3descbc_sha1_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/trs_3descbc_sha1_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/trs_aescbc_sha1_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/trs_aescbc_sha1_cpu_crypto_defs.sh
new file mode 100644
index 000000000..a4d83e9c4
--- /dev/null
+++ b/examples/ipsec-secgw/test/trs_aescbc_sha1_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/trs_aescbc_sha1_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/trs_aesctr_sha1_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/trs_aesctr_sha1_cpu_crypto_defs.sh
new file mode 100644
index 000000000..745a2a02b
--- /dev/null
+++ b/examples/ipsec-secgw/test/trs_aesctr_sha1_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/trs_aesctr_sha1_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh
new file mode 100644
index 000000000..8917122da
--- /dev/null
+++ b/examples/ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/trs_aesgcm_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/trs_aesgcm_mb_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/trs_aesgcm_mb_cpu_crypto_defs.sh
new file mode 100644
index 000000000..26943321f
--- /dev/null
+++ b/examples/ipsec-secgw/test/trs_aesgcm_mb_cpu_crypto_defs.sh
@@ -0,0 +1,7 @@
+#! /bin/bash
+
+. ${DIR}/trs_aesgcm_defs.sh
+
+CRYPTO_DEV=${CRYPTO_DEV:-'--vdev="crypto_aesni_mb0"'}
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/tun_3descbc_sha1_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/tun_3descbc_sha1_cpu_crypto_defs.sh
new file mode 100644
index 000000000..747141f62
--- /dev/null
+++ b/examples/ipsec-secgw/test/tun_3descbc_sha1_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/tun_3descbc_sha1_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/tun_aescbc_sha1_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/tun_aescbc_sha1_cpu_crypto_defs.sh
new file mode 100644
index 000000000..56076fa50
--- /dev/null
+++ b/examples/ipsec-secgw/test/tun_aescbc_sha1_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/tun_aescbc_sha1_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/tun_aesctr_sha1_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/tun_aesctr_sha1_cpu_crypto_defs.sh
new file mode 100644
index 000000000..3af680533
--- /dev/null
+++ b/examples/ipsec-secgw/test/tun_aesctr_sha1_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/tun_aesctr_sha1_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh
new file mode 100644
index 000000000..5bf1c0ae5
--- /dev/null
+++ b/examples/ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/tun_aesgcm_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/tun_aesgcm_mb_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/tun_aesgcm_mb_cpu_crypto_defs.sh
new file mode 100644
index 000000000..039b8095e
--- /dev/null
+++ b/examples/ipsec-secgw/test/tun_aesgcm_mb_cpu_crypto_defs.sh
@@ -0,0 +1,7 @@
+#! /bin/bash
+
+. ${DIR}/tun_aesgcm_defs.sh
+
+CRYPTO_DEV=${CRYPTO_DEV:-'--vdev="crypto_aesni_mb0"'}
+
+SGW_CFG_XPRM='type cpu-crypto'
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API Fan Zhang
@ 2019-09-04 10:32   ` Akhil Goyal
  2019-09-04 13:06     ` Zhang, Roy Fan
  0 siblings, 1 reply; 87+ messages in thread
From: Akhil Goyal @ 2019-09-04 10:32 UTC (permalink / raw)
  To: Fan Zhang, dev; +Cc: konstantin.ananyev, declan.doherty, pablo.de.lara.guarch

Hi Fan,

> 
> This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO action
> type to
> security library. The type represents performing crypto operation with CPU
> cycles. The patch also includes a new API to process crypto operations in
> bulk and the function pointers for PMDs.
> 
I am not able to get the flow of execution for this action type. Could you please elaborate
the flow in the documentation. If not in documentation right now, then please elaborate the
flow in cover letter.
Also I see that there are new APIs for processing crypto operations in bulk.
What does that mean. How are they different from the existing APIs which are also
handling bulk crypto ops depending on the budget.


-Akhil


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-09-04 10:32   ` Akhil Goyal
@ 2019-09-04 13:06     ` Zhang, Roy Fan
  2019-09-06  9:01       ` Akhil Goyal
  0 siblings, 1 reply; 87+ messages in thread
From: Zhang, Roy Fan @ 2019-09-04 13:06 UTC (permalink / raw)
  To: Akhil Goyal, dev
  Cc: Ananyev, Konstantin, Doherty, Declan, De Lara Guarch, Pablo

Hi Akhil,

This action type allows the burst of symmetric crypto workload using the same
algorithm, key, and direction being processed by CPU cycles synchronously. 
This flexible action type does not require external hardware involvement,
having the crypto workload processed synchronously, and is more performant
than Cryptodev SW PMD due to the saved cycles on removed "async mode
simulation" as well as 3 cacheline access of the crypto ops. 

AESNI-GCM and AESNI-MB PMDs are updated with this support. There is a small
performance test app under app/test/security_aesni_gcm(mb)_perftest to
prove.

For the new API
The packet is sent to the crypto device for symmetric crypto
processing. The device will encrypt or decrypt the buffer based on the session
data specified and preprocessed in the security session. Different
than the inline or lookaside modes, when the function exits, the user will
expect the buffers are either processed successfully, or having the error number
assigned to the appropriate index of the status array.

Will update the program's guide in the v1 patch.

Regards,
Fan

> -----Original Message-----
> From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> Sent: Wednesday, September 4, 2019 11:33 AM
> To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty, Declan
> <declan.doherty@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type and
> API
> 
> Hi Fan,
> 
> >
> > This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> action
> > type to security library. The type represents performing crypto
> > operation with CPU cycles. The patch also includes a new API to
> > process crypto operations in bulk and the function pointers for PMDs.
> >
> I am not able to get the flow of execution for this action type. Could you
> please elaborate the flow in the documentation. If not in documentation
> right now, then please elaborate the flow in cover letter.
> Also I see that there are new APIs for processing crypto operations in bulk.
> What does that mean. How are they different from the existing APIs which
> are also handling bulk crypto ops depending on the budget.
> 
> 
> -Akhil


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-09-04 13:06     ` Zhang, Roy Fan
@ 2019-09-06  9:01       ` Akhil Goyal
  2019-09-06 13:12         ` Zhang, Roy Fan
  2019-09-06 13:27         ` Ananyev, Konstantin
  0 siblings, 2 replies; 87+ messages in thread
From: Akhil Goyal @ 2019-09-06  9:01 UTC (permalink / raw)
  To: Zhang, Roy Fan, dev
  Cc: Ananyev, Konstantin, Doherty, Declan, De Lara Guarch, Pablo


Hi Fan,
> 
> Hi Akhil,
> 
> This action type allows the burst of symmetric crypto workload using the same
> algorithm, key, and direction being processed by CPU cycles synchronously.
> This flexible action type does not require external hardware involvement,
> having the crypto workload processed synchronously, and is more performant
> than Cryptodev SW PMD due to the saved cycles on removed "async mode
> simulation" as well as 3 cacheline access of the crypto ops.

Does that mean application will not call the cryptodev_enqueue_burst and corresponding dequeue burst.
It would be a new API something like process_packets and it will have the crypto processed packets while returning from the API?

I still do not understand why we cannot do with the conventional crypto lib only.
As far as I can understand, you are not doing any protocol processing or any value add
To the crypto processing. IMO, you just need a synchronous crypto processing API which
Can be defined in cryptodev, you don't need to re-create a crypto session in the name of
Security session in the driver just to do a synchronous processing.

> 
> AESNI-GCM and AESNI-MB PMDs are updated with this support. There is a small
> performance test app under app/test/security_aesni_gcm(mb)_perftest to
> prove.
> 
> For the new API
> The packet is sent to the crypto device for symmetric crypto
> processing. The device will encrypt or decrypt the buffer based on the session
> data specified and preprocessed in the security session. Different
> than the inline or lookaside modes, when the function exits, the user will
> expect the buffers are either processed successfully, or having the error number
> assigned to the appropriate index of the status array.
> 
> Will update the program's guide in the v1 patch.
> 
> Regards,
> Fan
> 
> > -----Original Message-----
> > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > Sent: Wednesday, September 4, 2019 11:33 AM
> > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty, Declan
> > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > <pablo.de.lara.guarch@intel.com>
> > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type and
> > API
> >
> > Hi Fan,
> >
> > >
> > > This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > action
> > > type to security library. The type represents performing crypto
> > > operation with CPU cycles. The patch also includes a new API to
> > > process crypto operations in bulk and the function pointers for PMDs.
> > >
> > I am not able to get the flow of execution for this action type. Could you
> > please elaborate the flow in the documentation. If not in documentation
> > right now, then please elaborate the flow in cover letter.
> > Also I see that there are new APIs for processing crypto operations in bulk.
> > What does that mean. How are they different from the existing APIs which
> > are also handling bulk crypto ops depending on the budget.
> >
> >
> > -Akhil


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-09-06  9:01       ` Akhil Goyal
@ 2019-09-06 13:12         ` Zhang, Roy Fan
  2019-09-10 11:25           ` Akhil Goyal
  2019-09-06 13:27         ` Ananyev, Konstantin
  1 sibling, 1 reply; 87+ messages in thread
From: Zhang, Roy Fan @ 2019-09-06 13:12 UTC (permalink / raw)
  To: Akhil Goyal, dev
  Cc: Ananyev, Konstantin, Doherty, Declan, De Lara Guarch, Pablo

Hi Akhil,

You are right, the new API will process the crypto workload, no heavy enqueue
Dequeue operations required. 

Cryptodev tends to support multiple crypto devices, including HW and SW. 
The 3-cache line access, iova address computation and assignment, simulation
of async enqueue/dequeue operations, allocate and free crypto ops, even the
mbuf linked-list for scatter-gather buffers are too heavy for SW crypto PMDs.

To create this new synchronous API in cryptodev cannot avoid the problem
listed above:  first the API shall not serve only to part of the crypto (SW) PMDs -
as you know, it is Cryptodev. The users can expect some PMD only support part
of the overall algorithms, but not the workload processing API. 

Another reason is, there is assumption made, first when creating a crypto op
we have to allocate the memory to hold crypto op + sym op + iv, - we cannot
simply declare an array of crypto ops in the run-time and discard it when processing
is done. Also we need to fill aad and digest HW address, which is not required for
SW at all. 

Bottom line: using crypto op will still have 3 cache-line access performance problem.

So if we to create the new API in Cryptodev instead of rte_security, we need to
create new crypto op structure only for the SW PMDs, carefully document them
to not confuse with existing cryptodev APIs, make new device feature flags to
indicate the API is not supported by some PMDs, and again carefully document
them of these device feature flags.

So, to push these changes to rte_security instead the above problem can be resolved,
and the performance improvement because of this change is big for smaller packets
- I attached a performance test app in the patchset.

For rte_security, we already have inline-crypto type that works quite close to the this
new API, the only difference is that it is processed by the CPU cycles. As you may
have already seen the ipsec-library has wrapped these changes, and ipsec-secgw
has only minimum updates to adopt this change too. So to the end user, if they 
use IPSec this patchset can seamlessly enabled with just commandline update when
creating an SA.

Regards,
Fan
 

> -----Original Message-----
> From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> Sent: Friday, September 6, 2019 10:01 AM
> To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty, Declan
> <declan.doherty@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type and
> API
> 
> 
> Hi Fan,
> >
> > Hi Akhil,
> >
> > This action type allows the burst of symmetric crypto workload using
> > the same algorithm, key, and direction being processed by CPU cycles
> synchronously.
> > This flexible action type does not require external hardware
> > involvement, having the crypto workload processed synchronously, and
> > is more performant than Cryptodev SW PMD due to the saved cycles on
> > removed "async mode simulation" as well as 3 cacheline access of the
> crypto ops.
> 
> Does that mean application will not call the cryptodev_enqueue_burst and
> corresponding dequeue burst.
> It would be a new API something like process_packets and it will have the
> crypto processed packets while returning from the API?
> 
> I still do not understand why we cannot do with the conventional crypto lib
> only.
> As far as I can understand, you are not doing any protocol processing or any
> value add To the crypto processing. IMO, you just need a synchronous crypto
> processing API which Can be defined in cryptodev, you don't need to re-
> create a crypto session in the name of Security session in the driver just to do
> a synchronous processing.
> 
> >
> > AESNI-GCM and AESNI-MB PMDs are updated with this support. There is a
> > small performance test app under
> > app/test/security_aesni_gcm(mb)_perftest to prove.
> >
> > For the new API
> > The packet is sent to the crypto device for symmetric crypto
> > processing. The device will encrypt or decrypt the buffer based on the
> > session data specified and preprocessed in the security session.
> > Different than the inline or lookaside modes, when the function exits,
> > the user will expect the buffers are either processed successfully, or
> > having the error number assigned to the appropriate index of the status
> array.
> >
> > Will update the program's guide in the v1 patch.
> >
> > Regards,
> > Fan
> >
> > > -----Original Message-----
> > > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > > Sent: Wednesday, September 4, 2019 11:33 AM
> > > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty,
> > > Declan <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > > <pablo.de.lara.guarch@intel.com>
> > > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action
> > > type and API
> > >
> > > Hi Fan,
> > >
> > > >
> > > > This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > action
> > > > type to security library. The type represents performing crypto
> > > > operation with CPU cycles. The patch also includes a new API to
> > > > process crypto operations in bulk and the function pointers for PMDs.
> > > >
> > > I am not able to get the flow of execution for this action type.
> > > Could you please elaborate the flow in the documentation. If not in
> > > documentation right now, then please elaborate the flow in cover letter.
> > > Also I see that there are new APIs for processing crypto operations in
> bulk.
> > > What does that mean. How are they different from the existing APIs
> > > which are also handling bulk crypto ops depending on the budget.
> > >
> > >
> > > -Akhil


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process
  2019-09-03 15:40 [dpdk-dev] [RFC PATCH 0/9] security: add software synchronous crypto process Fan Zhang
                   ` (8 preceding siblings ...)
  2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 9/9] examples/ipsec-secgw: add security " Fan Zhang
@ 2019-09-06 13:13 ` Fan Zhang
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 01/10] security: introduce CPU Crypto action type and API Fan Zhang
                     ` (11 more replies)
  9 siblings, 12 replies; 87+ messages in thread
From: Fan Zhang @ 2019-09-06 13:13 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

This RFC patch adds a way to rte_security to process symmetric crypto
workload in bulk synchronously for SW crypto devices.

Originally both SW and HW crypto PMDs works under rte_cryptodev to
process the crypto workload asynchronously. This way provides uniformity
to both PMD types but also introduce unnecessary performance penalty to
SW PMDs such as extra SW ring enqueue/dequeue steps to "simulate"
asynchronous working manner and unnecessary HW addresses computation.

We introduce a new way for SW crypto devices that perform crypto operation
synchronously with only fields required for the computation as input.

In rte_security, a new action type "RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO"
is introduced. This action type allows the burst of symmetric crypto
workload using the same algorithm, key, and direction being processed by
CPU cycles synchronously. This flexible action type does not require
external hardware involvement.

This patch also includes the announcement of a new API
"rte_security_process_cpu_crypto_bulk". With this API the packet is sent to
the crypto device for symmetric crypto processing. The device will encrypt
or decrypt the buffer based on the session data specified and preprocessed
in the security session. Different than the inline or lookaside modes, when
the function exits, the user will expect the buffers are either processed
successfully, or having the error number assigned to the appropriate index
of the status array.

The proof-of-concept AESNI-GCM and AESNI-MB SW PMDs are updated with the
support of this new method. To demonstrate the performance gain with
this method 2 simple performance evaluation apps under unit-test are added
"app/test: security_aesni_gcm_perftest/security_aesni_mb_perftest". The
users can freely compare their results against crypto perf application
results.

In the end, the ipsec library and ipsec-secgw sample application are also
updated to support this feature. Several test scripts are added to the
ipsec-secgw test-suite to prove the correctness of the implementation.

Fan Zhang (10):
  security: introduce CPU Crypto action type and API
  crypto/aesni_gcm: add rte_security handler
  app/test: add security cpu crypto autotest
  app/test: add security cpu crypto perftest
  crypto/aesni_mb: add rte_security handler
  app/test: add aesni_mb security cpu crypto autotest
  app/test: add aesni_mb security cpu crypto perftest
  ipsec: add rte_security cpu_crypto action support
  examples/ipsec-secgw: add security cpu_crypto action support
  doc: update security cpu process description

 app/test/Makefile                                  |    1 +
 app/test/meson.build                               |    1 +
 app/test/test_security_cpu_crypto.c                | 1326 ++++++++++++++++++++
 doc/guides/cryptodevs/aesni_gcm.rst                |    6 +
 doc/guides/cryptodevs/aesni_mb.rst                 |    7 +
 doc/guides/prog_guide/rte_security.rst             |  112 +-
 doc/guides/rel_notes/release_19_11.rst             |    7 +
 drivers/crypto/aesni_gcm/aesni_gcm_pmd.c           |   91 +-
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c       |   95 ++
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h   |   23 +
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c         |  291 ++++-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c     |   91 +-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h |   21 +-
 examples/ipsec-secgw/ipsec.c                       |   22 +
 examples/ipsec-secgw/ipsec_process.c               |    7 +-
 examples/ipsec-secgw/sa.c                          |   13 +-
 examples/ipsec-secgw/test/run_test.sh              |   10 +
 .../test/trs_3descbc_sha1_cpu_crypto_defs.sh       |    5 +
 .../test/trs_aescbc_sha1_cpu_crypto_defs.sh        |    5 +
 .../test/trs_aesctr_sha1_cpu_crypto_defs.sh        |    5 +
 .../ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh |    5 +
 .../test/trs_aesgcm_mb_cpu_crypto_defs.sh          |    7 +
 .../test/tun_3descbc_sha1_cpu_crypto_defs.sh       |    5 +
 .../test/tun_aescbc_sha1_cpu_crypto_defs.sh        |    5 +
 .../test/tun_aesctr_sha1_cpu_crypto_defs.sh        |    5 +
 .../ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh |    5 +
 .../test/tun_aesgcm_mb_cpu_crypto_defs.sh          |    7 +
 lib/librte_ipsec/esp_inb.c                         |  174 ++-
 lib/librte_ipsec/esp_outb.c                        |  290 ++++-
 lib/librte_ipsec/sa.c                              |   53 +-
 lib/librte_ipsec/sa.h                              |   29 +
 lib/librte_ipsec/ses.c                             |    4 +-
 lib/librte_security/rte_security.c                 |   16 +
 lib/librte_security/rte_security.h                 |   51 +-
 lib/librte_security/rte_security_driver.h          |   19 +
 lib/librte_security/rte_security_version.map       |    1 +
 36 files changed, 2791 insertions(+), 24 deletions(-)
 create mode 100644 app/test/test_security_cpu_crypto.c
 create mode 100644 examples/ipsec-secgw/test/trs_3descbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aescbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aesctr_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aesgcm_mb_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_3descbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aescbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aesctr_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aesgcm_mb_cpu_crypto_defs.sh

-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH 01/10] security: introduce CPU Crypto action type and API
  2019-09-06 13:13 ` [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process Fan Zhang
@ 2019-09-06 13:13   ` Fan Zhang
  2019-09-18 12:45     ` Ananyev, Konstantin
  2019-09-29  6:00     ` Hemant Agrawal
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 02/10] crypto/aesni_gcm: add rte_security handler Fan Zhang
                     ` (10 subsequent siblings)
  11 siblings, 2 replies; 87+ messages in thread
From: Fan Zhang @ 2019-09-06 13:13 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO action type to
security library. The type represents performing crypto operation with CPU
cycles. The patch also includes a new API to process crypto operations in
bulk and the function pointers for PMDs.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 lib/librte_security/rte_security.c           | 16 +++++++++
 lib/librte_security/rte_security.h           | 51 +++++++++++++++++++++++++++-
 lib/librte_security/rte_security_driver.h    | 19 +++++++++++
 lib/librte_security/rte_security_version.map |  1 +
 4 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/lib/librte_security/rte_security.c b/lib/librte_security/rte_security.c
index bc81ce15d..0f85c1b59 100644
--- a/lib/librte_security/rte_security.c
+++ b/lib/librte_security/rte_security.c
@@ -141,3 +141,19 @@ rte_security_capability_get(struct rte_security_ctx *instance,
 
 	return NULL;
 }
+
+void
+rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
+		struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num)
+{
+	uint32_t i;
+
+	for (i = 0; i < num; i++)
+		status[i] = -1;
+
+	RTE_FUNC_PTR_OR_RET(*instance->ops->process_cpu_crypto_bulk);
+	instance->ops->process_cpu_crypto_bulk(sess, buf, iv,
+			aad, digest, status, num);
+}
diff --git a/lib/librte_security/rte_security.h b/lib/librte_security/rte_security.h
index 96806e3a2..5a0f8901b 100644
--- a/lib/librte_security/rte_security.h
+++ b/lib/librte_security/rte_security.h
@@ -18,6 +18,7 @@ extern "C" {
 #endif
 
 #include <sys/types.h>
+#include <sys/uio.h>
 
 #include <netinet/in.h>
 #include <netinet/ip.h>
@@ -272,6 +273,20 @@ struct rte_security_pdcp_xform {
 	uint32_t hfn_threshold;
 };
 
+struct rte_security_cpu_crypto_xform {
+	/** For cipher/authentication crypto operation the authentication may
+	 * cover more content then the cipher. E.g., for IPSec ESP encryption
+	 * with AES-CBC and SHA1-HMAC, the encryption happens after the ESP
+	 * header but whole packet (apart from MAC header) is authenticated.
+	 * The cipher_offset field is used to deduct the cipher data pointer
+	 * from the buffer to be processed.
+	 *
+	 * NOTE this parameter shall be ignored by AEAD algorithms, since it
+	 * uses the same offset for cipher and authentication.
+	 */
+	int32_t cipher_offset;
+};
+
 /**
  * Security session action type.
  */
@@ -286,10 +301,14 @@ enum rte_security_session_action_type {
 	/**< All security protocol processing is performed inline during
 	 * transmission
 	 */
-	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
+	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL,
 	/**< All security protocol processing including crypto is performed
 	 * on a lookaside accelerator
 	 */
+	RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
+	/**< Crypto processing for security protocol is processed by CPU
+	 * synchronously
+	 */
 };
 
 /** Security session protocol definition */
@@ -315,6 +334,7 @@ struct rte_security_session_conf {
 		struct rte_security_ipsec_xform ipsec;
 		struct rte_security_macsec_xform macsec;
 		struct rte_security_pdcp_xform pdcp;
+		struct rte_security_cpu_crypto_xform cpucrypto;
 	};
 	/**< Configuration parameters for security session */
 	struct rte_crypto_sym_xform *crypto_xform;
@@ -639,6 +659,35 @@ const struct rte_security_capability *
 rte_security_capability_get(struct rte_security_ctx *instance,
 			    struct rte_security_capability_idx *idx);
 
+/**
+ * Security vector structure, contains pointer to vector array and the length
+ * of the array
+ */
+struct rte_security_vec {
+	struct iovec *vec;
+	uint32_t num;
+};
+
+/**
+ * Processing bulk crypto workload with CPU
+ *
+ * @param	instance	security instance.
+ * @param	sess		security session
+ * @param	buf		array of buffer SGL vectors
+ * @param	iv		array of IV pointers
+ * @param	aad		array of AAD pointers
+ * @param	digest		array of digest pointers
+ * @param	status		array of status for the function to return
+ * @param	num		number of elements in each array
+ *
+ */
+__rte_experimental
+void
+rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
+		struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_security/rte_security_driver.h b/lib/librte_security/rte_security_driver.h
index 1b561f852..70fcb0c26 100644
--- a/lib/librte_security/rte_security_driver.h
+++ b/lib/librte_security/rte_security_driver.h
@@ -132,6 +132,23 @@ typedef int (*security_get_userdata_t)(void *device,
 typedef const struct rte_security_capability *(*security_capabilities_get_t)(
 		void *device);
 
+/**
+ * Process security operations in bulk using CPU accelerated method.
+ *
+ * @param	sess		Security session structure.
+ * @param	buf		Buffer to the vectors to be processed.
+ * @param	iv		IV pointers.
+ * @param	aad		AAD pointers.
+ * @param	digest		Digest pointers.
+ * @param	status		Array of status value.
+ * @param	num		Number of elements in each array.
+ */
+
+typedef void (*security_process_cpu_crypto_bulk_t)(
+		struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num);
+
 /** Security operations function pointer table */
 struct rte_security_ops {
 	security_session_create_t session_create;
@@ -150,6 +167,8 @@ struct rte_security_ops {
 	/**< Get userdata associated with session which processed the packet. */
 	security_capabilities_get_t capabilities_get;
 	/**< Get security capabilities. */
+	security_process_cpu_crypto_bulk_t process_cpu_crypto_bulk;
+	/**< Process data in bulk. */
 };
 
 #ifdef __cplusplus
diff --git a/lib/librte_security/rte_security_version.map b/lib/librte_security/rte_security_version.map
index 53267bf3c..2132e7a00 100644
--- a/lib/librte_security/rte_security_version.map
+++ b/lib/librte_security/rte_security_version.map
@@ -18,4 +18,5 @@ EXPERIMENTAL {
 	rte_security_get_userdata;
 	rte_security_session_stats_get;
 	rte_security_session_update;
+	rte_security_process_cpu_crypto_bulk;
 };
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH 02/10] crypto/aesni_gcm: add rte_security handler
  2019-09-06 13:13 ` [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process Fan Zhang
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 01/10] security: introduce CPU Crypto action type and API Fan Zhang
@ 2019-09-06 13:13   ` Fan Zhang
  2019-09-18 10:24     ` Ananyev, Konstantin
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 03/10] app/test: add security cpu crypto autotest Fan Zhang
                     ` (9 subsequent siblings)
  11 siblings, 1 reply; 87+ messages in thread
From: Fan Zhang @ 2019-09-06 13:13 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

This patch add rte_security support support to AESNI-GCM PMD. The PMD now
initialize security context instance, create/delete PMD specific security
sessions, and process crypto workloads in synchronous mode with
scatter-gather list buffer supported.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 drivers/crypto/aesni_gcm/aesni_gcm_pmd.c         | 91 ++++++++++++++++++++++-
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c     | 95 ++++++++++++++++++++++++
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h | 23 ++++++
 3 files changed, 208 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
index 1006a5c4d..0a346eddd 100644
--- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
+++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
@@ -6,6 +6,7 @@
 #include <rte_hexdump.h>
 #include <rte_cryptodev.h>
 #include <rte_cryptodev_pmd.h>
+#include <rte_security_driver.h>
 #include <rte_bus_vdev.h>
 #include <rte_malloc.h>
 #include <rte_cpuflags.h>
@@ -174,6 +175,56 @@ aesni_gcm_get_session(struct aesni_gcm_qp *qp, struct rte_crypto_op *op)
 	return sess;
 }
 
+static __rte_always_inline int
+process_gcm_security_sgl_buf(struct aesni_gcm_security_session *sess,
+		struct rte_security_vec *buf, uint8_t *iv,
+		uint8_t *aad, uint8_t *digest)
+{
+	struct aesni_gcm_session *session = &sess->sess;
+	uint8_t *tag;
+	uint32_t i;
+
+	sess->init(&session->gdata_key, &sess->gdata_ctx, iv, aad,
+			(uint64_t)session->aad_length);
+
+	for (i = 0; i < buf->num; i++) {
+		struct iovec *vec = &buf->vec[i];
+
+		sess->update(&session->gdata_key, &sess->gdata_ctx,
+				vec->iov_base, vec->iov_base, vec->iov_len);
+	}
+
+	switch (session->op) {
+	case AESNI_GCM_OP_AUTHENTICATED_ENCRYPTION:
+		if (session->req_digest_length != session->gen_digest_length)
+			tag = sess->temp_digest;
+		else
+			tag = digest;
+
+		sess->finalize(&session->gdata_key, &sess->gdata_ctx, tag,
+				session->gen_digest_length);
+
+		if (session->req_digest_length != session->gen_digest_length)
+			memcpy(digest, sess->temp_digest,
+					session->req_digest_length);
+		break;
+
+	case AESNI_GCM_OP_AUTHENTICATED_DECRYPTION:
+		tag = sess->temp_digest;
+
+		sess->finalize(&session->gdata_key, &sess->gdata_ctx, tag,
+				session->gen_digest_length);
+
+		if (memcmp(tag, digest,	session->req_digest_length) != 0)
+			return -1;
+		break;
+	default:
+		return -1;
+	}
+
+	return 0;
+}
+
 /**
  * Process a crypto operation, calling
  * the GCM API from the multi buffer library.
@@ -488,8 +539,10 @@ aesni_gcm_create(const char *name,
 {
 	struct rte_cryptodev *dev;
 	struct aesni_gcm_private *internals;
+	struct rte_security_ctx *sec_ctx;
 	enum aesni_gcm_vector_mode vector_mode;
 	MB_MGR *mb_mgr;
+	char sec_name[RTE_DEV_NAME_MAX_LEN];
 
 	/* Check CPU for support for AES instruction set */
 	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_AES)) {
@@ -524,7 +577,8 @@ aesni_gcm_create(const char *name,
 			RTE_CRYPTODEV_FF_SYM_OPERATION_CHAINING |
 			RTE_CRYPTODEV_FF_CPU_AESNI |
 			RTE_CRYPTODEV_FF_OOP_SGL_IN_LB_OUT |
-			RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT;
+			RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT |
+			RTE_CRYPTODEV_FF_SECURITY;
 
 	mb_mgr = alloc_mb_mgr(0);
 	if (mb_mgr == NULL)
@@ -587,6 +641,21 @@ aesni_gcm_create(const char *name,
 
 	internals->max_nb_queue_pairs = init_params->max_nb_queue_pairs;
 
+	/* setup security operations */
+	snprintf(sec_name, sizeof(sec_name) - 1, "aes_gcm_sec_%u",
+			dev->driver_id);
+	sec_ctx = rte_zmalloc_socket(sec_name,
+			sizeof(struct rte_security_ctx),
+			RTE_CACHE_LINE_SIZE, init_params->socket_id);
+	if (sec_ctx == NULL) {
+		AESNI_GCM_LOG(ERR, "memory allocation failed\n");
+		goto error_exit;
+	}
+
+	sec_ctx->device = (void *)dev;
+	sec_ctx->ops = rte_aesni_gcm_pmd_security_ops;
+	dev->security_ctx = sec_ctx;
+
 #if IMB_VERSION_NUM >= IMB_VERSION(0, 50, 0)
 	AESNI_GCM_LOG(INFO, "IPSec Multi-buffer library version used: %s\n",
 			imb_get_version_str());
@@ -641,6 +710,8 @@ aesni_gcm_remove(struct rte_vdev_device *vdev)
 	if (cryptodev == NULL)
 		return -ENODEV;
 
+	rte_free(cryptodev->security_ctx);
+
 	internals = cryptodev->data->dev_private;
 
 	free_mb_mgr(internals->mb_mgr);
@@ -648,6 +719,24 @@ aesni_gcm_remove(struct rte_vdev_device *vdev)
 	return rte_cryptodev_pmd_destroy(cryptodev);
 }
 
+void
+aesni_gcm_sec_crypto_process_bulk(struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num)
+{
+	struct aesni_gcm_security_session *session =
+			get_sec_session_private_data(sess);
+	uint32_t i;
+
+	if (unlikely(!session))
+		return;
+
+	for (i = 0; i < num; i++)
+		status[i] = process_gcm_security_sgl_buf(session, &buf[i],
+				(uint8_t *)iv[i], (uint8_t *)aad[i],
+				(uint8_t *)digest[i]);
+}
+
 static struct rte_vdev_driver aesni_gcm_pmd_drv = {
 	.probe = aesni_gcm_probe,
 	.remove = aesni_gcm_remove
diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
index 2f66c7c58..cc71dbd60 100644
--- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
+++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
@@ -7,6 +7,7 @@
 #include <rte_common.h>
 #include <rte_malloc.h>
 #include <rte_cryptodev_pmd.h>
+#include <rte_security_driver.h>
 
 #include "aesni_gcm_pmd_private.h"
 
@@ -316,6 +317,85 @@ aesni_gcm_pmd_sym_session_clear(struct rte_cryptodev *dev,
 	}
 }
 
+static int
+aesni_gcm_security_session_create(void *dev,
+		struct rte_security_session_conf *conf,
+		struct rte_security_session *sess,
+		struct rte_mempool *mempool)
+{
+	struct rte_cryptodev *cdev = dev;
+	struct aesni_gcm_private *internals = cdev->data->dev_private;
+	struct aesni_gcm_security_session *sess_priv;
+	int ret;
+
+	if (!conf->crypto_xform) {
+		AESNI_GCM_LOG(ERR, "Invalid security session conf");
+		return -EINVAL;
+	}
+
+	if (conf->crypto_xform->type == RTE_CRYPTO_SYM_XFORM_AUTH) {
+		AESNI_GCM_LOG(ERR, "GMAC is not supported in security session");
+		return -EINVAL;
+	}
+
+
+	if (rte_mempool_get(mempool, (void **)(&sess_priv))) {
+		AESNI_GCM_LOG(ERR,
+				"Couldn't get object from session mempool");
+		return -ENOMEM;
+	}
+
+	ret = aesni_gcm_set_session_parameters(internals->ops,
+				&sess_priv->sess, conf->crypto_xform);
+	if (ret != 0) {
+		AESNI_GCM_LOG(ERR, "Failed configure session parameters");
+
+		/* Return session to mempool */
+		rte_mempool_put(mempool, (void *)sess_priv);
+		return ret;
+	}
+
+	sess_priv->pre = internals->ops[sess_priv->sess.key].pre;
+	sess_priv->init = internals->ops[sess_priv->sess.key].init;
+	if (sess_priv->sess.op == AESNI_GCM_OP_AUTHENTICATED_ENCRYPTION) {
+		sess_priv->update =
+			internals->ops[sess_priv->sess.key].update_enc;
+		sess_priv->finalize =
+			internals->ops[sess_priv->sess.key].finalize_enc;
+	} else {
+		sess_priv->update =
+			internals->ops[sess_priv->sess.key].update_dec;
+		sess_priv->finalize =
+			internals->ops[sess_priv->sess.key].finalize_dec;
+	}
+
+	sess->sess_private_data = sess_priv;
+
+	return 0;
+}
+
+static int
+aesni_gcm_security_session_destroy(void *dev __rte_unused,
+		struct rte_security_session *sess)
+{
+	void *sess_priv = get_sec_session_private_data(sess);
+
+	if (sess_priv) {
+		struct rte_mempool *sess_mp = rte_mempool_from_obj(sess_priv);
+
+		memset(sess, 0, sizeof(struct aesni_gcm_security_session));
+		set_sec_session_private_data(sess, NULL);
+		rte_mempool_put(sess_mp, sess_priv);
+	}
+	return 0;
+}
+
+static unsigned int
+aesni_gcm_sec_session_get_size(__rte_unused void *device)
+{
+	return sizeof(struct aesni_gcm_security_session);
+}
+
 struct rte_cryptodev_ops aesni_gcm_pmd_ops = {
 		.dev_configure		= aesni_gcm_pmd_config,
 		.dev_start		= aesni_gcm_pmd_start,
@@ -336,4 +416,19 @@ struct rte_cryptodev_ops aesni_gcm_pmd_ops = {
 		.sym_session_clear	= aesni_gcm_pmd_sym_session_clear
 };
 
+static struct rte_security_ops aesni_gcm_security_ops = {
+		.session_create = aesni_gcm_security_session_create,
+		.session_get_size = aesni_gcm_sec_session_get_size,
+		.session_update = NULL,
+		.session_stats_get = NULL,
+		.session_destroy = aesni_gcm_security_session_destroy,
+		.set_pkt_metadata = NULL,
+		.capabilities_get = NULL,
+		.process_cpu_crypto_bulk =
+				aesni_gcm_sec_crypto_process_bulk,
+};
+
 struct rte_cryptodev_ops *rte_aesni_gcm_pmd_ops = &aesni_gcm_pmd_ops;
+
+struct rte_security_ops *rte_aesni_gcm_pmd_security_ops =
+		&aesni_gcm_security_ops;
diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
index 56b29e013..8e490b6ce 100644
--- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
+++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
@@ -114,5 +114,28 @@ aesni_gcm_set_session_parameters(const struct aesni_gcm_ops *ops,
  * Device specific operations function pointer structure */
 extern struct rte_cryptodev_ops *rte_aesni_gcm_pmd_ops;
 
+/**
+ * Security session structure.
+ */
+struct aesni_gcm_security_session {
+	/** Temp digest for decryption */
+	uint8_t temp_digest[DIGEST_LENGTH_MAX];
+	/** GCM operations */
+	aesni_gcm_pre_t pre;
+	aesni_gcm_init_t init;
+	aesni_gcm_update_t update;
+	aesni_gcm_finalize_t finalize;
+	/** AESNI-GCM session */
+	struct aesni_gcm_session sess;
+	/** AESNI-GCM context */
+	struct gcm_context_data gdata_ctx;
+};
+
+extern void
+aesni_gcm_sec_crypto_process_bulk(struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num);
+
+extern struct rte_security_ops *rte_aesni_gcm_pmd_security_ops;
 
 #endif /* _RTE_AESNI_GCM_PMD_PRIVATE_H_ */
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH 03/10] app/test: add security cpu crypto autotest
  2019-09-06 13:13 ` [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process Fan Zhang
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 01/10] security: introduce CPU Crypto action type and API Fan Zhang
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 02/10] crypto/aesni_gcm: add rte_security handler Fan Zhang
@ 2019-09-06 13:13   ` Fan Zhang
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 04/10] app/test: add security cpu crypto perftest Fan Zhang
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 87+ messages in thread
From: Fan Zhang @ 2019-09-06 13:13 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

This patch adds cpu crypto unit test for AESNI_GCM PMD.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 app/test/Makefile                   |   1 +
 app/test/meson.build                |   1 +
 app/test/test_security_cpu_crypto.c | 564 ++++++++++++++++++++++++++++++++++++
 3 files changed, 566 insertions(+)
 create mode 100644 app/test/test_security_cpu_crypto.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 26ba6fe2b..090c55746 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -196,6 +196,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PMD_RING) += test_pmd_ring_perf.c
 SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev_blockcipher.c
 SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev.c
 SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev_asym.c
+SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_security_cpu_crypto.c
 
 SRCS-$(CONFIG_RTE_LIBRTE_METRICS) += test_metrics.c
 
diff --git a/app/test/meson.build b/app/test/meson.build
index ec40943bd..b7834ff21 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -103,6 +103,7 @@ test_sources = files('commands.c',
 	'test_ring_perf.c',
 	'test_rwlock.c',
 	'test_sched.c',
+	'test_security_cpu_crypto.c',
 	'test_service_cores.c',
 	'test_spinlock.c',
 	'test_stack.c',
diff --git a/app/test/test_security_cpu_crypto.c b/app/test/test_security_cpu_crypto.c
new file mode 100644
index 000000000..d345922b2
--- /dev/null
+++ b/app/test/test_security_cpu_crypto.c
@@ -0,0 +1,564 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation
+ */
+
+#include <rte_common.h>
+#include <rte_hexdump.h>
+#include <rte_mbuf.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_pause.h>
+#include <rte_bus_vdev.h>
+#include <rte_random.h>
+
+#include <rte_security.h>
+
+#include <rte_crypto.h>
+#include <rte_cryptodev.h>
+#include <rte_cryptodev_pmd.h>
+
+#include "test.h"
+#include "test_cryptodev.h"
+#include "test_cryptodev_aead_test_vectors.h"
+
+#define CPU_CRYPTO_TEST_MAX_AAD_LENGTH	16
+#define MAX_NB_SIGMENTS			4
+
+enum buffer_assemble_option {
+	SGL_MAX_SEG,
+	SGL_ONE_SEG,
+};
+
+struct cpu_crypto_test_case {
+	struct {
+		uint8_t seg[MBUF_DATAPAYLOAD_SIZE];
+		uint32_t seg_len;
+	} seg_buf[MAX_NB_SIGMENTS];
+	uint8_t iv[MAXIMUM_IV_LENGTH];
+	uint8_t aad[CPU_CRYPTO_TEST_MAX_AAD_LENGTH];
+	uint8_t digest[DIGEST_BYTE_LENGTH_SHA512];
+} __rte_cache_aligned;
+
+struct cpu_crypto_test_obj {
+	struct iovec vec[MAX_NUM_OPS_INFLIGHT][MAX_NB_SIGMENTS];
+	struct rte_security_vec sec_buf[MAX_NUM_OPS_INFLIGHT];
+	void *iv[MAX_NUM_OPS_INFLIGHT];
+	void *digest[MAX_NUM_OPS_INFLIGHT];
+	void *aad[MAX_NUM_OPS_INFLIGHT];
+	int status[MAX_NUM_OPS_INFLIGHT];
+};
+
+struct cpu_crypto_testsuite_params {
+	struct rte_mempool *buf_pool;
+	struct rte_mempool *session_priv_mpool;
+	struct rte_security_ctx *ctx;
+};
+
+struct cpu_crypto_unittest_params {
+	struct rte_security_session *sess;
+	void *test_datas[MAX_NUM_OPS_INFLIGHT];
+	struct cpu_crypto_test_obj test_obj;
+	uint32_t nb_bufs;
+};
+
+static struct cpu_crypto_testsuite_params testsuite_params = { NULL };
+static struct cpu_crypto_unittest_params unittest_params;
+
+static int gbl_driver_id;
+
+static int
+testsuite_setup(void)
+{
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct rte_cryptodev_info info;
+	uint32_t i;
+	uint32_t nb_devs;
+	uint32_t sess_sz;
+	int ret;
+
+	memset(ts_params, 0, sizeof(*ts_params));
+
+	ts_params->buf_pool = rte_mempool_lookup("CPU_CRYPTO_MBUFPOOL");
+	if (ts_params->buf_pool == NULL) {
+		/* Not already created so create */
+		ts_params->buf_pool = rte_pktmbuf_pool_create(
+				"CRYPTO_MBUFPOOL",
+				NUM_MBUFS, MBUF_CACHE_SIZE, 0,
+				sizeof(struct cpu_crypto_test_case),
+				rte_socket_id());
+		if (ts_params->buf_pool == NULL) {
+			RTE_LOG(ERR, USER1, "Can't create CRYPTO_MBUFPOOL\n");
+			return TEST_FAILED;
+		}
+	}
+
+	/* Create an AESNI MB device if required */
+	if (gbl_driver_id == rte_cryptodev_driver_id_get(
+			RTE_STR(CRYPTODEV_NAME_AESNI_MB_PMD))) {
+		nb_devs = rte_cryptodev_device_count_by_driver(
+				rte_cryptodev_driver_id_get(
+				RTE_STR(CRYPTODEV_NAME_AESNI_MB_PMD)));
+		if (nb_devs < 1) {
+			ret = rte_vdev_init(
+				RTE_STR(CRYPTODEV_NAME_AESNI_MB_PMD), NULL);
+
+			TEST_ASSERT(ret == 0,
+				"Failed to create instance of"
+				" pmd : %s",
+				RTE_STR(CRYPTODEV_NAME_AESNI_MB_PMD));
+		}
+	}
+
+	/* Create an AESNI GCM device if required */
+	if (gbl_driver_id == rte_cryptodev_driver_id_get(
+			RTE_STR(CRYPTODEV_NAME_AESNI_GCM_PMD))) {
+		nb_devs = rte_cryptodev_device_count_by_driver(
+				rte_cryptodev_driver_id_get(
+				RTE_STR(CRYPTODEV_NAME_AESNI_GCM_PMD)));
+		if (nb_devs < 1) {
+			TEST_ASSERT_SUCCESS(rte_vdev_init(
+				RTE_STR(CRYPTODEV_NAME_AESNI_GCM_PMD), NULL),
+				"Failed to create instance of"
+				" pmd : %s",
+				RTE_STR(CRYPTODEV_NAME_AESNI_GCM_PMD));
+		}
+	}
+
+	nb_devs = rte_cryptodev_count();
+	if (nb_devs < 1) {
+		RTE_LOG(ERR, USER1, "No crypto devices found?\n");
+		return TEST_FAILED;
+	}
+
+	/* Get security context */
+	for (i = 0; i < nb_devs; i++) {
+		rte_cryptodev_info_get(i, &info);
+		if (info.driver_id != gbl_driver_id)
+			continue;
+
+		ts_params->ctx = rte_cryptodev_get_sec_ctx(i);
+		if (!ts_params->ctx) {
+			RTE_LOG(ERR, USER1, "Rte_security is not supported\n");
+			return TEST_FAILED;
+		}
+	}
+
+	sess_sz = rte_security_session_get_size(ts_params->ctx);
+	ts_params->session_priv_mpool = rte_mempool_create(
+			"cpu_crypto_test_sess_mp", 2, sess_sz, 0, 0,
+			NULL, NULL, NULL, NULL,
+			SOCKET_ID_ANY, 0);
+	if (!ts_params->session_priv_mpool) {
+		RTE_LOG(ERR, USER1, "Not enough memory\n");
+		return TEST_FAILED;
+	}
+
+	return TEST_SUCCESS;
+}
+
+static void
+testsuite_teardown(void)
+{
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+
+	if (ts_params->buf_pool)
+		rte_mempool_free(ts_params->buf_pool);
+
+	if (ts_params->session_priv_mpool)
+		rte_mempool_free(ts_params->session_priv_mpool);
+}
+
+static int
+ut_setup(void)
+{
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+
+	memset(ut_params, 0, sizeof(*ut_params));
+	return TEST_SUCCESS;
+}
+
+static void
+ut_teardown(void)
+{
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+
+	if (ut_params->sess)
+		rte_security_session_destroy(ts_params->ctx, ut_params->sess);
+
+	if (ut_params->nb_bufs) {
+		uint32_t i;
+
+		for (i = 0; i < ut_params->nb_bufs; i++)
+			memset(ut_params->test_datas[i], 0,
+				sizeof(struct cpu_crypto_test_case));
+
+		rte_mempool_put_bulk(ts_params->buf_pool, ut_params->test_datas,
+				ut_params->nb_bufs);
+	}
+}
+
+static int
+allocate_buf(uint32_t n)
+{
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+	int ret;
+
+	ret = rte_mempool_get_bulk(ts_params->buf_pool, ut_params->test_datas,
+			n);
+
+	if (ret == 0)
+		ut_params->nb_bufs = n;
+
+	return ret;
+}
+
+static int
+check_status(struct cpu_crypto_test_obj *obj, uint32_t n)
+{
+	uint32_t i;
+
+	for (i = 0; i < n; i++)
+		if (obj->status[i] < 0)
+			return -1;
+
+	return 0;
+}
+
+static struct rte_security_session *
+create_aead_session(struct rte_security_ctx *ctx,
+		struct rte_mempool *sess_mp,
+		enum rte_crypto_aead_operation op,
+		const struct aead_test_data *test_data,
+		uint32_t is_unit_test)
+{
+	struct rte_security_session_conf sess_conf = {0};
+	struct rte_crypto_sym_xform xform = {0};
+
+	if (is_unit_test)
+		debug_hexdump(stdout, "key:", test_data->key.data,
+				test_data->key.len);
+
+	/* Setup AEAD Parameters */
+	xform.type = RTE_CRYPTO_SYM_XFORM_AEAD;
+	xform.next = NULL;
+	xform.aead.algo = test_data->algo;
+	xform.aead.op = op;
+	xform.aead.key.data = test_data->key.data;
+	xform.aead.key.length = test_data->key.len;
+	xform.aead.iv.offset = 0;
+	xform.aead.iv.length = test_data->iv.len;
+	xform.aead.digest_length = test_data->auth_tag.len;
+	xform.aead.aad_length = test_data->aad.len;
+
+	sess_conf.action_type = RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO;
+	sess_conf.crypto_xform = &xform;
+
+	return rte_security_session_create(ctx, &sess_conf, sess_mp);
+}
+
+static inline int
+assemble_aead_buf(struct cpu_crypto_test_case *data,
+		struct cpu_crypto_test_obj *obj,
+		uint32_t obj_idx,
+		enum rte_crypto_aead_operation op,
+		const struct aead_test_data *test_data,
+		enum buffer_assemble_option sgl_option,
+		uint32_t is_unit_test)
+{
+	const uint8_t *src;
+	uint32_t src_len;
+	uint32_t seg_idx;
+	uint32_t bytes_per_seg;
+	uint32_t left;
+
+	if (op == RTE_CRYPTO_AEAD_OP_ENCRYPT) {
+		src = test_data->plaintext.data;
+		src_len = test_data->plaintext.len;
+		if (is_unit_test)
+			debug_hexdump(stdout, "plaintext:", src, src_len);
+	} else {
+		src = test_data->ciphertext.data;
+		src_len = test_data->ciphertext.len;
+		memcpy(data->digest, test_data->auth_tag.data,
+				test_data->auth_tag.len);
+		if (is_unit_test) {
+			debug_hexdump(stdout, "ciphertext:", src, src_len);
+			debug_hexdump(stdout, "digest:",
+					test_data->auth_tag.data,
+					test_data->auth_tag.len);
+		}
+	}
+
+	if (src_len > MBUF_DATAPAYLOAD_SIZE)
+		return -ENOMEM;
+
+	switch (sgl_option) {
+	case SGL_MAX_SEG:
+		seg_idx = 0;
+		bytes_per_seg = src_len / MAX_NB_SIGMENTS + 1;
+		left = src_len;
+
+		if (bytes_per_seg > (MBUF_DATAPAYLOAD_SIZE / MAX_NB_SIGMENTS))
+			return -ENOMEM;
+
+		while (left) {
+			uint32_t cp_len = RTE_MIN(left, bytes_per_seg);
+			memcpy(data->seg_buf[seg_idx].seg, src, cp_len);
+			data->seg_buf[seg_idx].seg_len = cp_len;
+			obj->vec[obj_idx][seg_idx].iov_base =
+					(void *)data->seg_buf[seg_idx].seg;
+			obj->vec[obj_idx][seg_idx].iov_len = cp_len;
+			src += cp_len;
+			left -= cp_len;
+			seg_idx++;
+		}
+
+		if (left)
+			return -ENOMEM;
+
+		obj->sec_buf[obj_idx].vec = obj->vec[obj_idx];
+		obj->sec_buf[obj_idx].num = seg_idx;
+
+		break;
+	case SGL_ONE_SEG:
+		memcpy(data->seg_buf[0].seg, src, src_len);
+		data->seg_buf[0].seg_len = src_len;
+		obj->vec[obj_idx][0].iov_base =
+				(void *)data->seg_buf[0].seg;
+		obj->vec[obj_idx][0].iov_len = src_len;
+
+		obj->sec_buf[obj_idx].vec = obj->vec[obj_idx];
+		obj->sec_buf[obj_idx].num = 1;
+		break;
+	default:
+		return -1;
+	}
+
+	if (test_data->algo == RTE_CRYPTO_AEAD_AES_CCM) {
+		memcpy(data->iv + 1, test_data->iv.data, test_data->iv.len);
+		memcpy(data->aad + 18, test_data->aad.data, test_data->aad.len);
+	} else {
+		memcpy(data->iv, test_data->iv.data, test_data->iv.len);
+		memcpy(data->aad, test_data->aad.data, test_data->aad.len);
+	}
+
+	if (is_unit_test) {
+		debug_hexdump(stdout, "iv:", test_data->iv.data,
+				test_data->iv.len);
+		debug_hexdump(stdout, "aad:", test_data->aad.data,
+				test_data->aad.len);
+	}
+
+	obj->iv[obj_idx] = (void *)data->iv;
+	obj->digest[obj_idx] = (void *)data->digest;
+	obj->aad[obj_idx] = (void *)data->aad;
+
+	return 0;
+}
+
+#define CPU_CRYPTO_ERR_EXP_CT	"expect ciphertext:"
+#define CPU_CRYPTO_ERR_GEN_CT	"gen ciphertext:"
+#define CPU_CRYPTO_ERR_EXP_PT	"expect plaintext:"
+#define CPU_CRYPTO_ERR_GEN_PT	"gen plaintext:"
+
+static int
+check_aead_result(struct cpu_crypto_test_case *tcase,
+		enum rte_crypto_aead_operation op,
+		const struct aead_test_data *tdata)
+{
+	const char *err_msg1, *err_msg2;
+	const uint8_t *src_pt_ct;
+	const uint8_t *tmp_src;
+	uint32_t src_len;
+	uint32_t left;
+	uint32_t i = 0;
+	int ret;
+
+	if (op == RTE_CRYPTO_AEAD_OP_ENCRYPT) {
+		err_msg1 = CPU_CRYPTO_ERR_EXP_CT;
+		err_msg2 = CPU_CRYPTO_ERR_GEN_CT;
+
+		src_pt_ct = tdata->ciphertext.data;
+		src_len = tdata->ciphertext.len;
+
+		ret = memcmp(tcase->digest, tdata->auth_tag.data,
+				tdata->auth_tag.len);
+		if (ret != 0) {
+			debug_hexdump(stdout, "expect digest:",
+					tdata->auth_tag.data,
+					tdata->auth_tag.len);
+			debug_hexdump(stdout, "gen digest:",
+					tcase->digest,
+					tdata->auth_tag.len);
+			return -1;
+		}
+	} else {
+		src_pt_ct = tdata->plaintext.data;
+		src_len = tdata->plaintext.len;
+		err_msg1 = CPU_CRYPTO_ERR_EXP_PT;
+		err_msg2 = CPU_CRYPTO_ERR_GEN_PT;
+	}
+
+	tmp_src = src_pt_ct;
+	left = src_len;
+
+	while (left && i < MAX_NB_SIGMENTS) {
+		ret = memcmp(tcase->seg_buf[i].seg, tmp_src,
+				tcase->seg_buf[i].seg_len);
+		if (ret != 0)
+			goto sgl_err_dump;
+		tmp_src += tcase->seg_buf[i].seg_len;
+		left -= tcase->seg_buf[i].seg_len;
+		i++;
+	}
+
+	if (left) {
+		ret = -ENOMEM;
+		goto sgl_err_dump;
+	}
+
+	return 0;
+
+sgl_err_dump:
+	left = src_len;
+	i = 0;
+
+	debug_hexdump(stdout, err_msg1,
+			tdata->ciphertext.data,
+			tdata->ciphertext.len);
+
+	while (left && i < MAX_NB_SIGMENTS) {
+		debug_hexdump(stdout, err_msg2,
+				tcase->seg_buf[i].seg,
+				tcase->seg_buf[i].seg_len);
+		left -= tcase->seg_buf[i].seg_len;
+		i++;
+	}
+	return ret;
+}
+
+static inline void
+run_test(struct rte_security_ctx *ctx, struct rte_security_session *sess,
+		struct cpu_crypto_test_obj *obj, uint32_t n)
+{
+	rte_security_process_cpu_crypto_bulk(ctx, sess, obj->sec_buf,
+			obj->iv, obj->aad, obj->digest, obj->status, n);
+}
+
+static int
+cpu_crypto_test_aead(const struct aead_test_data *tdata,
+		enum rte_crypto_aead_operation dir,
+		enum buffer_assemble_option sgl_option)
+{
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+	struct cpu_crypto_test_obj *obj = &ut_params->test_obj;
+	struct cpu_crypto_test_case *tcase;
+	int ret;
+
+	ut_params->sess = create_aead_session(ts_params->ctx,
+			ts_params->session_priv_mpool,
+			dir,
+			tdata,
+			1);
+	if (!ut_params->sess)
+		return -1;
+
+	ret = allocate_buf(1);
+	if (ret)
+		return ret;
+
+	tcase = ut_params->test_datas[0];
+	ret = assemble_aead_buf(tcase, obj, 0, dir, tdata, sgl_option, 1);
+	if (ret < 0) {
+		printf("Test is not supported by the driver\n");
+		return ret;
+	}
+
+	run_test(ts_params->ctx, ut_params->sess, obj, 1);
+
+	ret = check_status(obj, 1);
+	if (ret < 0)
+		return ret;
+
+	ret = check_aead_result(tcase, dir, tdata);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+/* test-vector/sgl-option */
+#define all_gcm_unit_test_cases(type)		\
+	TEST_EXPAND(gcm_test_case_1, type)	\
+	TEST_EXPAND(gcm_test_case_2, type)	\
+	TEST_EXPAND(gcm_test_case_3, type)	\
+	TEST_EXPAND(gcm_test_case_4, type)	\
+	TEST_EXPAND(gcm_test_case_5, type)	\
+	TEST_EXPAND(gcm_test_case_6, type)	\
+	TEST_EXPAND(gcm_test_case_7, type)	\
+	TEST_EXPAND(gcm_test_case_8, type)	\
+	TEST_EXPAND(gcm_test_case_192_1, type)	\
+	TEST_EXPAND(gcm_test_case_192_2, type)	\
+	TEST_EXPAND(gcm_test_case_192_3, type)	\
+	TEST_EXPAND(gcm_test_case_192_4, type)	\
+	TEST_EXPAND(gcm_test_case_192_5, type)	\
+	TEST_EXPAND(gcm_test_case_192_6, type)	\
+	TEST_EXPAND(gcm_test_case_192_7, type)	\
+	TEST_EXPAND(gcm_test_case_256_1, type)	\
+	TEST_EXPAND(gcm_test_case_256_2, type)	\
+	TEST_EXPAND(gcm_test_case_256_3, type)	\
+	TEST_EXPAND(gcm_test_case_256_4, type)	\
+	TEST_EXPAND(gcm_test_case_256_5, type)	\
+	TEST_EXPAND(gcm_test_case_256_6, type)	\
+	TEST_EXPAND(gcm_test_case_256_7, type)
+
+
+#define TEST_EXPAND(t, o)						\
+static int								\
+cpu_crypto_aead_enc_test_##t##_##o(void)				\
+{									\
+	return cpu_crypto_test_aead(&t, RTE_CRYPTO_AEAD_OP_ENCRYPT, o);	\
+}									\
+static int								\
+cpu_crypto_aead_dec_test_##t##_##o(void)				\
+{									\
+	return cpu_crypto_test_aead(&t, RTE_CRYPTO_AEAD_OP_DECRYPT, o);	\
+}									\
+
+all_gcm_unit_test_cases(SGL_ONE_SEG)
+all_gcm_unit_test_cases(SGL_MAX_SEG)
+#undef TEST_EXPAND
+
+static struct unit_test_suite security_cpu_crypto_aesgcm_testsuite  = {
+	.suite_name = "Security CPU Crypto AESNI-GCM Unit Test Suite",
+	.setup = testsuite_setup,
+	.teardown = testsuite_teardown,
+	.unit_test_cases = {
+#define TEST_EXPAND(t, o)						\
+	TEST_CASE_ST(ut_setup, ut_teardown,				\
+			cpu_crypto_aead_enc_test_##t##_##o),		\
+	TEST_CASE_ST(ut_setup, ut_teardown,				\
+			cpu_crypto_aead_dec_test_##t##_##o),		\
+
+	all_gcm_unit_test_cases(SGL_ONE_SEG)
+	all_gcm_unit_test_cases(SGL_MAX_SEG)
+#undef TEST_EXPAND
+
+	TEST_CASES_END() /**< NULL terminate unit test array */
+	},
+};
+
+static int
+test_security_cpu_crypto_aesni_gcm(void)
+{
+	gbl_driver_id =	rte_cryptodev_driver_id_get(
+			RTE_STR(CRYPTODEV_NAME_AESNI_GCM_PMD));
+
+	return unit_test_suite_runner(&security_cpu_crypto_aesgcm_testsuite);
+}
+
+REGISTER_TEST_COMMAND(security_aesni_gcm_autotest,
+		test_security_cpu_crypto_aesni_gcm);
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH 04/10] app/test: add security cpu crypto perftest
  2019-09-06 13:13 ` [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process Fan Zhang
                     ` (2 preceding siblings ...)
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 03/10] app/test: add security cpu crypto autotest Fan Zhang
@ 2019-09-06 13:13   ` Fan Zhang
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 05/10] crypto/aesni_mb: add rte_security handler Fan Zhang
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 87+ messages in thread
From: Fan Zhang @ 2019-09-06 13:13 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

Since crypto perf application does not support rte_security, this patch
adds a simple GCM CPU crypto performance test to crypto unittest
application. The test includes different key and data sizes test with
single buffer and SGL buffer test items and will display the throughput
as well as cycle count performance information.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 app/test/test_security_cpu_crypto.c | 201 ++++++++++++++++++++++++++++++++++++
 1 file changed, 201 insertions(+)

diff --git a/app/test/test_security_cpu_crypto.c b/app/test/test_security_cpu_crypto.c
index d345922b2..ca9a8dae6 100644
--- a/app/test/test_security_cpu_crypto.c
+++ b/app/test/test_security_cpu_crypto.c
@@ -23,6 +23,7 @@
 
 #define CPU_CRYPTO_TEST_MAX_AAD_LENGTH	16
 #define MAX_NB_SIGMENTS			4
+#define CACHE_WARM_ITER			2048
 
 enum buffer_assemble_option {
 	SGL_MAX_SEG,
@@ -560,5 +561,205 @@ test_security_cpu_crypto_aesni_gcm(void)
 	return unit_test_suite_runner(&security_cpu_crypto_aesgcm_testsuite);
 }
 
+
+static inline void
+gen_rand(uint8_t *data, uint32_t len)
+{
+	uint32_t i;
+
+	for (i = 0; i < len; i++)
+		data[i] = (uint8_t)rte_rand();
+}
+
+static inline void
+switch_aead_enc_to_dec(struct aead_test_data *tdata,
+		struct cpu_crypto_test_case *tcase,
+		enum buffer_assemble_option sgl_option)
+{
+	uint32_t i;
+	uint8_t *dst = tdata->ciphertext.data;
+
+	switch (sgl_option) {
+	case SGL_ONE_SEG:
+		memcpy(dst, tcase->seg_buf[0].seg, tcase->seg_buf[0].seg_len);
+		tdata->ciphertext.len = tcase->seg_buf[0].seg_len;
+		break;
+	case SGL_MAX_SEG:
+		tdata->ciphertext.len = 0;
+		for (i = 0; i < MAX_NB_SIGMENTS; i++) {
+			memcpy(dst, tcase->seg_buf[i].seg,
+					tcase->seg_buf[i].seg_len);
+			tdata->ciphertext.len += tcase->seg_buf[i].seg_len;
+		}
+		break;
+	}
+
+	memcpy(tdata->auth_tag.data, tcase->digest, tdata->auth_tag.len);
+}
+
+static int
+cpu_crypto_test_aead_perf(enum buffer_assemble_option sgl_option,
+		uint32_t key_sz)
+{
+	struct aead_test_data tdata = {0};
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+	struct cpu_crypto_test_obj *obj = &ut_params->test_obj;
+	struct cpu_crypto_test_case *tcase;
+	uint64_t hz = rte_get_tsc_hz(), time_start, time_now;
+	double rate, cycles_per_buf;
+	uint32_t test_data_szs[] = {64, 128, 256, 512, 1024, 2048};
+	uint32_t i, j;
+	uint8_t aad[16];
+	int ret;
+
+	tdata.key.len = key_sz;
+	gen_rand(tdata.key.data, tdata.key.len);
+	tdata.algo = RTE_CRYPTO_AEAD_AES_GCM;
+	tdata.aad.data = aad;
+
+	ut_params->sess = create_aead_session(ts_params->ctx,
+			ts_params->session_priv_mpool,
+			RTE_CRYPTO_AEAD_OP_DECRYPT,
+			&tdata,
+			0);
+	if (!ut_params->sess)
+		return -1;
+
+	ret = allocate_buf(MAX_NUM_OPS_INFLIGHT);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < RTE_DIM(test_data_szs); i++) {
+		for (j = 0; j < MAX_NUM_OPS_INFLIGHT; j++) {
+			tdata.plaintext.len = test_data_szs[i];
+			gen_rand(tdata.plaintext.data,
+					tdata.plaintext.len);
+
+			tdata.aad.len = 12;
+			gen_rand(tdata.aad.data, tdata.aad.len);
+
+			tdata.auth_tag.len = 16;
+
+			tdata.iv.len = 16;
+			gen_rand(tdata.iv.data, tdata.iv.len);
+
+			tcase = ut_params->test_datas[j];
+			ret = assemble_aead_buf(tcase, obj, j,
+					RTE_CRYPTO_AEAD_OP_ENCRYPT,
+					&tdata, sgl_option, 0);
+			if (ret < 0) {
+				printf("Test is not supported by the driver\n");
+				return ret;
+			}
+		}
+
+		/* warm up cache */
+		for (j = 0; j < CACHE_WARM_ITER; j++)
+			run_test(ts_params->ctx, ut_params->sess, obj,
+					MAX_NUM_OPS_INFLIGHT);
+
+		time_start = rte_rdtsc();
+
+		run_test(ts_params->ctx, ut_params->sess, obj,
+				MAX_NUM_OPS_INFLIGHT);
+
+		time_now = rte_rdtsc();
+
+		rate = time_now - time_start;
+		cycles_per_buf = rate / MAX_NUM_OPS_INFLIGHT;
+
+		rate = ((hz / cycles_per_buf)) / 1000000;
+
+		printf("AES-GCM-%u(%4uB) Enc %03.3fMpps (%03.3fGbps) ",
+				key_sz * 8, test_data_szs[i], rate,
+				rate  * test_data_szs[i] * 8 / 1000);
+		printf("cycles per buf %03.3f per byte %03.3f\n",
+				cycles_per_buf,
+				cycles_per_buf / test_data_szs[i]);
+
+		for (j = 0; j < MAX_NUM_OPS_INFLIGHT; j++) {
+			tcase = ut_params->test_datas[j];
+
+			switch_aead_enc_to_dec(&tdata, tcase, sgl_option);
+			ret = assemble_aead_buf(tcase, obj, j,
+					RTE_CRYPTO_AEAD_OP_DECRYPT,
+					&tdata, sgl_option, 0);
+			if (ret < 0) {
+				printf("Test is not supported by the driver\n");
+				return ret;
+			}
+		}
+
+		time_start = rte_get_timer_cycles();
+
+		run_test(ts_params->ctx, ut_params->sess, obj,
+				MAX_NUM_OPS_INFLIGHT);
+
+		time_now = rte_get_timer_cycles();
+
+		rate = time_now - time_start;
+		cycles_per_buf = rate / MAX_NUM_OPS_INFLIGHT;
+
+		rate = ((hz / cycles_per_buf)) / 1000000;
+
+		printf("AES-GCM-%u(%4uB) Dec %03.3fMpps (%03.3fGbps) ",
+				key_sz * 8, test_data_szs[i], rate,
+				rate  * test_data_szs[i] * 8 / 1000);
+		printf("cycles per buf %03.3f per byte %03.3f\n",
+				cycles_per_buf,
+				cycles_per_buf / test_data_szs[i]);
+	}
+
+	return 0;
+}
+
+/* test-perfix/key-size/sgl-type */
+#define all_gcm_perf_test_cases(type)					\
+	TEST_EXPAND(_128, 16, type)					\
+	TEST_EXPAND(_192, 24, type)					\
+	TEST_EXPAND(_256, 32, type)
+
+#define TEST_EXPAND(a, b, c)						\
+static int								\
+cpu_crypto_gcm_perf##a##_##c(void)					\
+{									\
+	return cpu_crypto_test_aead_perf(c, b);				\
+}									\
+
+all_gcm_perf_test_cases(SGL_ONE_SEG)
+all_gcm_perf_test_cases(SGL_MAX_SEG)
+#undef TEST_EXPAND
+
+static struct unit_test_suite security_cpu_crypto_aesgcm_perf_testsuite  = {
+		.suite_name = "Security CPU Crypto AESNI-GCM Perf Test Suite",
+		.setup = testsuite_setup,
+		.teardown = testsuite_teardown,
+		.unit_test_cases = {
+#define TEST_EXPAND(a, b, c)						\
+		TEST_CASE_ST(ut_setup, ut_teardown,			\
+				cpu_crypto_gcm_perf##a##_##c),		\
+
+		all_gcm_perf_test_cases(SGL_ONE_SEG)
+		all_gcm_perf_test_cases(SGL_MAX_SEG)
+#undef TEST_EXPAND
+
+		TEST_CASES_END() /**< NULL terminate unit test array */
+		},
+};
+
+static int
+test_security_cpu_crypto_aesni_gcm_perf(void)
+{
+	gbl_driver_id =	rte_cryptodev_driver_id_get(
+			RTE_STR(CRYPTODEV_NAME_AESNI_GCM_PMD));
+
+	return unit_test_suite_runner(
+			&security_cpu_crypto_aesgcm_perf_testsuite);
+}
+
 REGISTER_TEST_COMMAND(security_aesni_gcm_autotest,
 		test_security_cpu_crypto_aesni_gcm);
+
+REGISTER_TEST_COMMAND(security_aesni_gcm_perftest,
+		test_security_cpu_crypto_aesni_gcm_perf);
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH 05/10] crypto/aesni_mb: add rte_security handler
  2019-09-06 13:13 ` [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process Fan Zhang
                     ` (3 preceding siblings ...)
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 04/10] app/test: add security cpu crypto perftest Fan Zhang
@ 2019-09-06 13:13   ` Fan Zhang
  2019-09-18 15:20     ` Ananyev, Konstantin
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 06/10] app/test: add aesni_mb security cpu crypto autotest Fan Zhang
                     ` (6 subsequent siblings)
  11 siblings, 1 reply; 87+ messages in thread
From: Fan Zhang @ 2019-09-06 13:13 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

This patch add rte_security support support to AESNI-MB PMD. The PMD now
initialize security context instance, create/delete PMD specific security
sessions, and process crypto workloads in synchronous mode.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c         | 291 ++++++++++++++++++++-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c     |  91 ++++++-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h |  21 +-
 3 files changed, 398 insertions(+), 5 deletions(-)

diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
index b495a9679..68767c04e 100644
--- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
+++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
@@ -8,6 +8,8 @@
 #include <rte_hexdump.h>
 #include <rte_cryptodev.h>
 #include <rte_cryptodev_pmd.h>
+#include <rte_security.h>
+#include <rte_security_driver.h>
 #include <rte_bus_vdev.h>
 #include <rte_malloc.h>
 #include <rte_cpuflags.h>
@@ -789,6 +791,167 @@ auth_start_offset(struct rte_crypto_op *op, struct aesni_mb_session *session,
 			(UINT64_MAX - u_src + u_dst + 1);
 }
 
+union sec_userdata_field {
+	int status;
+	struct {
+		uint16_t is_gen_digest;
+		uint16_t digest_len;
+	};
+};
+
+struct sec_udata_digest_field {
+	uint32_t is_digest_gen;
+	uint32_t digest_len;
+};
+
+static inline int
+set_mb_job_params_sec(JOB_AES_HMAC *job, struct aesni_mb_sec_session *sec_sess,
+		void *buf, uint32_t buf_len, void *iv, void *aad, void *digest,
+		int *status, uint8_t *digest_idx)
+{
+	struct aesni_mb_session *session = &sec_sess->sess;
+	uint32_t cipher_offset = sec_sess->cipher_offset;
+	void *user_digest = NULL;
+	union sec_userdata_field udata;
+
+	if (unlikely(cipher_offset > buf_len))
+		return -EINVAL;
+
+	/* Set crypto operation */
+	job->chain_order = session->chain_order;
+
+	/* Set cipher parameters */
+	job->cipher_direction = session->cipher.direction;
+	job->cipher_mode = session->cipher.mode;
+
+	job->aes_key_len_in_bytes = session->cipher.key_length_in_bytes;
+
+	/* Set authentication parameters */
+	job->hash_alg = session->auth.algo;
+	job->iv = iv;
+
+	switch (job->hash_alg) {
+	case AES_XCBC:
+		job->u.XCBC._k1_expanded = session->auth.xcbc.k1_expanded;
+		job->u.XCBC._k2 = session->auth.xcbc.k2;
+		job->u.XCBC._k3 = session->auth.xcbc.k3;
+
+		job->aes_enc_key_expanded =
+				session->cipher.expanded_aes_keys.encode;
+		job->aes_dec_key_expanded =
+				session->cipher.expanded_aes_keys.decode;
+		break;
+
+	case AES_CCM:
+		job->u.CCM.aad = (uint8_t *)aad + 18;
+		job->u.CCM.aad_len_in_bytes = session->aead.aad_len;
+		job->aes_enc_key_expanded =
+				session->cipher.expanded_aes_keys.encode;
+		job->aes_dec_key_expanded =
+				session->cipher.expanded_aes_keys.decode;
+		job->iv++;
+		break;
+
+	case AES_CMAC:
+		job->u.CMAC._key_expanded = session->auth.cmac.expkey;
+		job->u.CMAC._skey1 = session->auth.cmac.skey1;
+		job->u.CMAC._skey2 = session->auth.cmac.skey2;
+		job->aes_enc_key_expanded =
+				session->cipher.expanded_aes_keys.encode;
+		job->aes_dec_key_expanded =
+				session->cipher.expanded_aes_keys.decode;
+		break;
+
+	case AES_GMAC:
+		if (session->cipher.mode == GCM) {
+			job->u.GCM.aad = aad;
+			job->u.GCM.aad_len_in_bytes = session->aead.aad_len;
+		} else {
+			/* For GMAC */
+			job->u.GCM.aad = aad;
+			job->u.GCM.aad_len_in_bytes = buf_len;
+			job->cipher_mode = GCM;
+		}
+		job->aes_enc_key_expanded = &session->cipher.gcm_key;
+		job->aes_dec_key_expanded = &session->cipher.gcm_key;
+		break;
+
+	default:
+		job->u.HMAC._hashed_auth_key_xor_ipad =
+				session->auth.pads.inner;
+		job->u.HMAC._hashed_auth_key_xor_opad =
+				session->auth.pads.outer;
+
+		if (job->cipher_mode == DES3) {
+			job->aes_enc_key_expanded =
+				session->cipher.exp_3des_keys.ks_ptr;
+			job->aes_dec_key_expanded =
+				session->cipher.exp_3des_keys.ks_ptr;
+		} else {
+			job->aes_enc_key_expanded =
+				session->cipher.expanded_aes_keys.encode;
+			job->aes_dec_key_expanded =
+				session->cipher.expanded_aes_keys.decode;
+		}
+	}
+
+	/* Set digest output location */
+	if (job->hash_alg != NULL_HASH &&
+			session->auth.operation == RTE_CRYPTO_AUTH_OP_VERIFY) {
+		job->auth_tag_output = sec_sess->temp_digests[*digest_idx];
+		*digest_idx = (*digest_idx + 1) % MAX_JOBS;
+
+		udata.is_gen_digest = 0;
+		udata.digest_len = session->auth.req_digest_len;
+		user_digest = (void *)digest;
+	} else {
+		udata.is_gen_digest = 1;
+		udata.digest_len = session->auth.req_digest_len;
+
+		if (session->auth.req_digest_len !=
+				session->auth.gen_digest_len) {
+			job->auth_tag_output =
+					sec_sess->temp_digests[*digest_idx];
+			*digest_idx = (*digest_idx + 1) % MAX_JOBS;
+
+			user_digest = (void *)digest;
+		} else
+			job->auth_tag_output = digest;
+
+		/* A bit of hack here, since job structure only supports
+		 * 2 user data fields and we need 4 params to be passed
+		 * (status, direction, digest for verify, and length of
+		 * digest), we set the status value as digest length +
+		 * direction here temporarily to avoid creating longer
+		 * buffer to store all 4 params.
+		 */
+		*status = udata.status;
+	}
+	/*
+	 * Multi-buffer library current only support returning a truncated
+	 * digest length as specified in the relevant IPsec RFCs
+	 */
+
+	/* Set digest length */
+	job->auth_tag_output_len_in_bytes = session->auth.gen_digest_len;
+
+	/* Set IV parameters */
+	job->iv_len_in_bytes = session->iv.length;
+
+	/* Data Parameters */
+	job->src = buf;
+	job->dst = buf;
+	job->cipher_start_src_offset_in_bytes = cipher_offset;
+	job->msg_len_to_cipher_in_bytes = buf_len - cipher_offset;
+	job->hash_start_src_offset_in_bytes = 0;
+	job->msg_len_to_hash_in_bytes = buf_len;
+
+	job->user_data = (void *)status;
+	job->user_data2 = user_digest;
+
+	return 0;
+}
+
 /**
  * Process a crypto operation and complete a JOB_AES_HMAC job structure for
  * submission to the multi buffer library for processing.
@@ -1081,6 +1244,37 @@ post_process_mb_job(struct aesni_mb_qp *qp, JOB_AES_HMAC *job)
 	return op;
 }
 
+static inline void
+post_process_mb_sec_job(JOB_AES_HMAC *job)
+{
+	void *user_digest = job->user_data2;
+	int *status = job->user_data;
+	union sec_userdata_field udata;
+
+	switch (job->status) {
+	case STS_COMPLETED:
+		if (user_digest) {
+			udata.status = *status;
+
+			if (udata.is_gen_digest) {
+				*status = RTE_CRYPTO_OP_STATUS_SUCCESS;
+				memcpy(user_digest, job->auth_tag_output,
+						udata.digest_len);
+			} else {
+				verify_digest(job, user_digest,
+					udata.digest_len, (uint8_t *)status);
+
+				if (*status == RTE_CRYPTO_OP_STATUS_AUTH_FAILED)
+					*status = -1;
+			}
+		} else
+			*status = RTE_CRYPTO_OP_STATUS_SUCCESS;
+		break;
+	default:
+		*status = RTE_CRYPTO_OP_STATUS_ERROR;
+	}
+}
+
 /**
  * Process a completed JOB_AES_HMAC job and keep processing jobs until
  * get_completed_job return NULL
@@ -1117,6 +1311,32 @@ handle_completed_jobs(struct aesni_mb_qp *qp, JOB_AES_HMAC *job,
 	return processed_jobs;
 }
 
+static inline uint32_t
+handle_completed_sec_jobs(JOB_AES_HMAC *job, MB_MGR *mb_mgr)
+{
+	uint32_t processed = 0;
+
+	while (job != NULL) {
+		post_process_mb_sec_job(job);
+		job = IMB_GET_COMPLETED_JOB(mb_mgr);
+		processed++;
+	}
+
+	return processed;
+}
+
+static inline uint32_t
+flush_mb_sec_mgr(MB_MGR *mb_mgr)
+{
+	JOB_AES_HMAC *job = IMB_FLUSH_JOB(mb_mgr);
+	uint32_t processed = 0;
+
+	if (job)
+		processed = handle_completed_sec_jobs(job, mb_mgr);
+
+	return processed;
+}
+
 static inline uint16_t
 flush_mb_mgr(struct aesni_mb_qp *qp, struct rte_crypto_op **ops,
 		uint16_t nb_ops)
@@ -1220,6 +1440,55 @@ aesni_mb_pmd_dequeue_burst(void *queue_pair, struct rte_crypto_op **ops,
 	return processed_jobs;
 }
 
+void
+aesni_mb_sec_crypto_process_bulk(struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num)
+{
+	struct aesni_mb_sec_session *sec_sess = sess->sess_private_data;
+	JOB_AES_HMAC *job;
+	uint8_t digest_idx = sec_sess->digest_idx;
+	uint32_t i, processed = 0;
+	int ret;
+
+	for (i = 0; i < num; i++) {
+		void *seg_buf = buf[i].vec[0].iov_base;
+		uint32_t buf_len = buf[i].vec[0].iov_len;
+
+		job = IMB_GET_NEXT_JOB(sec_sess->mb_mgr);
+		if (unlikely(job == NULL)) {
+			processed += flush_mb_sec_mgr(sec_sess->mb_mgr);
+
+			job = IMB_GET_NEXT_JOB(sec_sess->mb_mgr);
+			if (!job)
+				return;
+		}
+
+		ret = set_mb_job_params_sec(job, sec_sess, seg_buf, buf_len,
+				iv[i], aad[i], digest[i], &status[i],
+				&digest_idx);
+				/* Submit job to multi-buffer for processing */
+		if (ret) {
+			processed++;
+			status[i] = ret;
+			continue;
+		}
+
+#ifdef RTE_LIBRTE_PMD_AESNI_MB_DEBUG
+		job = IMB_SUBMIT_JOB(sec_sess->mb_mgr);
+#else
+		job = IMB_SUBMIT_JOB_NOCHECK(sec_sess->mb_mgr);
+#endif
+
+		if (job)
+			processed += handle_completed_sec_jobs(job,
+					sec_sess->mb_mgr);
+	}
+
+	while (processed < num)
+		processed += flush_mb_sec_mgr(sec_sess->mb_mgr);
+}
+
 static int cryptodev_aesni_mb_remove(struct rte_vdev_device *vdev);
 
 static int
@@ -1229,8 +1498,10 @@ cryptodev_aesni_mb_create(const char *name,
 {
 	struct rte_cryptodev *dev;
 	struct aesni_mb_private *internals;
+	struct rte_security_ctx *sec_ctx;
 	enum aesni_mb_vector_mode vector_mode;
 	MB_MGR *mb_mgr;
+	char sec_name[RTE_DEV_NAME_MAX_LEN];
 
 	/* Check CPU for support for AES instruction set */
 	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_AES)) {
@@ -1264,7 +1535,8 @@ cryptodev_aesni_mb_create(const char *name,
 	dev->feature_flags = RTE_CRYPTODEV_FF_SYMMETRIC_CRYPTO |
 			RTE_CRYPTODEV_FF_SYM_OPERATION_CHAINING |
 			RTE_CRYPTODEV_FF_CPU_AESNI |
-			RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT;
+			RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT |
+			RTE_CRYPTODEV_FF_SECURITY;
 
 
 	mb_mgr = alloc_mb_mgr(0);
@@ -1303,11 +1575,28 @@ cryptodev_aesni_mb_create(const char *name,
 	AESNI_MB_LOG(INFO, "IPSec Multi-buffer library version used: %s\n",
 			imb_get_version_str());
 
+	/* setup security operations */
+	snprintf(sec_name, sizeof(sec_name) - 1, "aes_mb_sec_%u",
+			dev->driver_id);
+	sec_ctx = rte_zmalloc_socket(sec_name,
+			sizeof(struct rte_security_ctx),
+			RTE_CACHE_LINE_SIZE, init_params->socket_id);
+	if (sec_ctx == NULL) {
+		AESNI_MB_LOG(ERR, "memory allocation failed\n");
+		goto error_exit;
+	}
+
+	sec_ctx->device = (void *)dev;
+	sec_ctx->ops = rte_aesni_mb_pmd_security_ops;
+	dev->security_ctx = sec_ctx;
+
 	return 0;
 
 error_exit:
 	if (mb_mgr)
 		free_mb_mgr(mb_mgr);
+	if (sec_ctx)
+		rte_free(sec_ctx);
 
 	rte_cryptodev_pmd_destroy(dev);
 
diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
index 8d15b99d4..ca6cea775 100644
--- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
+++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
@@ -8,6 +8,7 @@
 #include <rte_common.h>
 #include <rte_malloc.h>
 #include <rte_cryptodev_pmd.h>
+#include <rte_security_driver.h>
 
 #include "rte_aesni_mb_pmd_private.h"
 
@@ -732,7 +733,8 @@ aesni_mb_pmd_qp_count(struct rte_cryptodev *dev)
 static unsigned
 aesni_mb_pmd_sym_session_get_size(struct rte_cryptodev *dev __rte_unused)
 {
-	return sizeof(struct aesni_mb_session);
+	return RTE_ALIGN_CEIL(sizeof(struct aesni_mb_session),
+			RTE_CACHE_LINE_SIZE);
 }
 
 /** Configure a aesni multi-buffer session from a crypto xform chain */
@@ -810,4 +812,91 @@ struct rte_cryptodev_ops aesni_mb_pmd_ops = {
 		.sym_session_clear	= aesni_mb_pmd_sym_session_clear
 };
 
+/** Set session authentication parameters */
+
+static int
+aesni_mb_security_session_create(void *dev,
+		struct rte_security_session_conf *conf,
+		struct rte_security_session *sess,
+		struct rte_mempool *mempool)
+{
+	struct rte_cryptodev *cdev = dev;
+	struct aesni_mb_private *internals = cdev->data->dev_private;
+	struct aesni_mb_sec_session *sess_priv;
+	int ret;
+
+	if (!conf->crypto_xform) {
+		AESNI_MB_LOG(ERR, "Invalid security session conf");
+		return -EINVAL;
+	}
+
+	if (rte_mempool_get(mempool, (void **)(&sess_priv))) {
+		AESNI_MB_LOG(ERR,
+				"Couldn't get object from session mempool");
+		return -ENOMEM;
+	}
+
+	sess_priv->mb_mgr = internals->mb_mgr;
+	if (sess_priv->mb_mgr == NULL)
+		return -ENOMEM;
+
+	sess_priv->cipher_offset = conf->cpucrypto.cipher_offset;
+
+	ret = aesni_mb_set_session_parameters(sess_priv->mb_mgr,
+			&sess_priv->sess, conf->crypto_xform);
+	if (ret != 0) {
+		AESNI_MB_LOG(ERR, "failed configure session parameters");
+
+		rte_mempool_put(mempool, sess_priv);
+	}
+
+	sess->sess_private_data = (void *)sess_priv;
+
+	return ret;
+}
+
+static int
+aesni_mb_security_session_destroy(void *dev __rte_unused,
+		struct rte_security_session *sess)
+{
+	struct aesni_mb_sec_session *sess_priv =
+			get_sec_session_private_data(sess);
+
+	if (sess_priv) {
+		struct rte_mempool *sess_mp = rte_mempool_from_obj(
+				(void *)sess_priv);
+
+		memset(sess, 0, sizeof(struct aesni_mb_sec_session));
+		set_sec_session_private_data(sess, NULL);
+
+		if (sess_mp == NULL) {
+			AESNI_MB_LOG(ERR, "failed fetch session mempool");
+			return -EINVAL;
+		}
+
+		rte_mempool_put(sess_mp, sess_priv);
+	}
+
+	return 0;
+}
+
+static unsigned int
+aesni_mb_sec_session_get_size(__rte_unused void *device)
+{
+	return RTE_ALIGN_CEIL(sizeof(struct aesni_mb_sec_session),
+			RTE_CACHE_LINE_SIZE);
+}
+
+static struct rte_security_ops aesni_mb_security_ops = {
+		.session_create = aesni_mb_security_session_create,
+		.session_get_size = aesni_mb_sec_session_get_size,
+		.session_update = NULL,
+		.session_stats_get = NULL,
+		.session_destroy = aesni_mb_security_session_destroy,
+		.set_pkt_metadata = NULL,
+		.capabilities_get = NULL,
+		.process_cpu_crypto_bulk = aesni_mb_sec_crypto_process_bulk,
+};
+
 struct rte_cryptodev_ops *rte_aesni_mb_pmd_ops = &aesni_mb_pmd_ops;
+struct rte_security_ops *rte_aesni_mb_pmd_security_ops = &aesni_mb_security_ops;
diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h
index b794d4bc1..d1cf416ab 100644
--- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h
+++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h
@@ -176,7 +176,6 @@ struct aesni_mb_qp {
 	 */
 } __rte_cache_aligned;
 
-/** AES-NI multi-buffer private session structure */
 struct aesni_mb_session {
 	JOB_CHAIN_ORDER chain_order;
 	struct {
@@ -265,16 +264,32 @@ struct aesni_mb_session {
 		/** AAD data length */
 		uint16_t aad_len;
 	} aead;
-} __rte_cache_aligned;
+};
+
+/** AES-NI multi-buffer private security session structure */
+struct aesni_mb_sec_session {
+	/**< Unique Queue Pair Name */
+	struct aesni_mb_session sess;
+	uint8_t temp_digests[MAX_JOBS][DIGEST_LENGTH_MAX];
+	uint16_t digest_idx;
+	uint32_t cipher_offset;
+	MB_MGR *mb_mgr;
+};
 
 extern int
 aesni_mb_set_session_parameters(const MB_MGR *mb_mgr,
 		struct aesni_mb_session *sess,
 		const struct rte_crypto_sym_xform *xform);
 
+extern void
+aesni_mb_sec_crypto_process_bulk(struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num);
+
 /** device specific operations function pointer structure */
 extern struct rte_cryptodev_ops *rte_aesni_mb_pmd_ops;
 
-
+/** device specific operations function pointer structure for rte_security */
+extern struct rte_security_ops *rte_aesni_mb_pmd_security_ops;
 
 #endif /* _RTE_AESNI_MB_PMD_PRIVATE_H_ */
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH 06/10] app/test: add aesni_mb security cpu crypto autotest
  2019-09-06 13:13 ` [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process Fan Zhang
                     ` (4 preceding siblings ...)
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 05/10] crypto/aesni_mb: add rte_security handler Fan Zhang
@ 2019-09-06 13:13   ` Fan Zhang
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 07/10] app/test: add aesni_mb security cpu crypto perftest Fan Zhang
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 87+ messages in thread
From: Fan Zhang @ 2019-09-06 13:13 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

This patch adds cpu crypto unit test for AESNI_MB PMD.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 app/test/test_security_cpu_crypto.c | 367 ++++++++++++++++++++++++++++++++++++
 1 file changed, 367 insertions(+)

diff --git a/app/test/test_security_cpu_crypto.c b/app/test/test_security_cpu_crypto.c
index ca9a8dae6..0ea406390 100644
--- a/app/test/test_security_cpu_crypto.c
+++ b/app/test/test_security_cpu_crypto.c
@@ -19,12 +19,23 @@
 
 #include "test.h"
 #include "test_cryptodev.h"
+#include "test_cryptodev_blockcipher.h"
+#include "test_cryptodev_aes_test_vectors.h"
 #include "test_cryptodev_aead_test_vectors.h"
+#include "test_cryptodev_des_test_vectors.h"
+#include "test_cryptodev_hash_test_vectors.h"
 
 #define CPU_CRYPTO_TEST_MAX_AAD_LENGTH	16
 #define MAX_NB_SIGMENTS			4
 #define CACHE_WARM_ITER			2048
 
+#define TOP_ENC		BLOCKCIPHER_TEST_OP_ENCRYPT
+#define TOP_DEC		BLOCKCIPHER_TEST_OP_DECRYPT
+#define TOP_AUTH_GEN	BLOCKCIPHER_TEST_OP_AUTH_GEN
+#define TOP_AUTH_VER	BLOCKCIPHER_TEST_OP_AUTH_VERIFY
+#define TOP_ENC_AUTH	BLOCKCIPHER_TEST_OP_ENC_AUTH_GEN
+#define TOP_AUTH_DEC	BLOCKCIPHER_TEST_OP_AUTH_VERIFY_DEC
+
 enum buffer_assemble_option {
 	SGL_MAX_SEG,
 	SGL_ONE_SEG,
@@ -516,6 +527,11 @@ cpu_crypto_test_aead(const struct aead_test_data *tdata,
 	TEST_EXPAND(gcm_test_case_256_6, type)	\
 	TEST_EXPAND(gcm_test_case_256_7, type)
 
+/* test-vector/sgl-option */
+#define all_ccm_unit_test_cases \
+	TEST_EXPAND(ccm_test_case_128_1, SGL_ONE_SEG) \
+	TEST_EXPAND(ccm_test_case_128_2, SGL_ONE_SEG) \
+	TEST_EXPAND(ccm_test_case_128_3, SGL_ONE_SEG)
 
 #define TEST_EXPAND(t, o)						\
 static int								\
@@ -531,6 +547,7 @@ cpu_crypto_aead_dec_test_##t##_##o(void)				\
 
 all_gcm_unit_test_cases(SGL_ONE_SEG)
 all_gcm_unit_test_cases(SGL_MAX_SEG)
+all_ccm_unit_test_cases
 #undef TEST_EXPAND
 
 static struct unit_test_suite security_cpu_crypto_aesgcm_testsuite  = {
@@ -758,8 +775,358 @@ test_security_cpu_crypto_aesni_gcm_perf(void)
 			&security_cpu_crypto_aesgcm_perf_testsuite);
 }
 
+static struct rte_security_session *
+create_blockcipher_session(struct rte_security_ctx *ctx,
+		struct rte_mempool *sess_mp,
+		uint32_t op_mask,
+		const struct blockcipher_test_data *test_data,
+		uint32_t is_unit_test)
+{
+	struct rte_security_session_conf sess_conf = {0};
+	struct rte_crypto_sym_xform xforms[2] = { {0} };
+	struct rte_crypto_sym_xform *cipher_xform = NULL;
+	struct rte_crypto_sym_xform *auth_xform = NULL;
+	struct rte_crypto_sym_xform *xform;
+
+	if (op_mask & BLOCKCIPHER_TEST_OP_CIPHER) {
+		cipher_xform = &xforms[0];
+		cipher_xform->type = RTE_CRYPTO_SYM_XFORM_CIPHER;
+
+		if (op_mask & TOP_ENC)
+			cipher_xform->cipher.op =
+				RTE_CRYPTO_CIPHER_OP_ENCRYPT;
+		else
+			cipher_xform->cipher.op =
+				RTE_CRYPTO_CIPHER_OP_DECRYPT;
+
+		cipher_xform->cipher.algo = test_data->crypto_algo;
+		cipher_xform->cipher.key.data = test_data->cipher_key.data;
+		cipher_xform->cipher.key.length = test_data->cipher_key.len;
+		cipher_xform->cipher.iv.offset = 0;
+		cipher_xform->cipher.iv.length = test_data->iv.len;
+
+		if (is_unit_test)
+			debug_hexdump(stdout, "cipher key:",
+					test_data->cipher_key.data,
+					test_data->cipher_key.len);
+	}
+
+	if (op_mask & BLOCKCIPHER_TEST_OP_AUTH) {
+		auth_xform = &xforms[1];
+		auth_xform->type = RTE_CRYPTO_SYM_XFORM_AUTH;
+
+		if (op_mask & TOP_AUTH_GEN)
+			auth_xform->auth.op = RTE_CRYPTO_AUTH_OP_GENERATE;
+		else
+			auth_xform->auth.op = RTE_CRYPTO_AUTH_OP_VERIFY;
+
+		auth_xform->auth.algo = test_data->auth_algo;
+		auth_xform->auth.key.length = test_data->auth_key.len;
+		auth_xform->auth.key.data = test_data->auth_key.data;
+		auth_xform->auth.digest_length = test_data->digest.len;
+
+		if (is_unit_test)
+			debug_hexdump(stdout, "auth key:",
+					test_data->auth_key.data,
+					test_data->auth_key.len);
+	}
+
+	if (op_mask == TOP_ENC ||
+			op_mask == TOP_DEC)
+		xform = cipher_xform;
+	else if (op_mask == TOP_AUTH_GEN ||
+			op_mask == TOP_AUTH_VER)
+		xform = auth_xform;
+	else if (op_mask == TOP_ENC_AUTH) {
+		xform = cipher_xform;
+		xform->next = auth_xform;
+	} else if (op_mask == TOP_AUTH_DEC) {
+		xform = auth_xform;
+		xform->next = cipher_xform;
+	} else
+		return NULL;
+
+	if (test_data->cipher_offset < test_data->auth_offset)
+		return NULL;
+
+	sess_conf.action_type = RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO;
+	sess_conf.crypto_xform = xform;
+	sess_conf.cpucrypto.cipher_offset = test_data->cipher_offset -
+			test_data->auth_offset;
+
+	return rte_security_session_create(ctx, &sess_conf, sess_mp);
+}
+
+static inline int
+assemble_blockcipher_buf(struct cpu_crypto_test_case *data,
+		struct cpu_crypto_test_obj *obj,
+		uint32_t obj_idx,
+		uint32_t op_mask,
+		const struct blockcipher_test_data *test_data,
+		uint32_t is_unit_test)
+{
+	const uint8_t *src;
+	uint32_t src_len;
+	uint32_t offset;
+
+	if (op_mask == TOP_ENC_AUTH ||
+			op_mask == TOP_AUTH_GEN ||
+			op_mask == BLOCKCIPHER_TEST_OP_AUTH_VERIFY)
+		offset = test_data->auth_offset;
+	else
+		offset = test_data->cipher_offset;
+
+	if (op_mask & TOP_ENC_AUTH) {
+		src = test_data->plaintext.data;
+		src_len = test_data->plaintext.len;
+		if (is_unit_test)
+			debug_hexdump(stdout, "plaintext:", src, src_len);
+	} else {
+		src = test_data->ciphertext.data;
+		src_len = test_data->ciphertext.len;
+		memcpy(data->digest, test_data->digest.data,
+				test_data->digest.len);
+		if (is_unit_test) {
+			debug_hexdump(stdout, "ciphertext:", src, src_len);
+			debug_hexdump(stdout, "digest:", test_data->digest.data,
+					test_data->digest.len);
+		}
+	}
+
+	if (src_len > MBUF_DATAPAYLOAD_SIZE)
+		return -ENOMEM;
+
+	memcpy(data->seg_buf[0].seg, src, src_len);
+	data->seg_buf[0].seg_len = src_len;
+	obj->vec[obj_idx][0].iov_base =
+			(void *)(data->seg_buf[0].seg + offset);
+	obj->vec[obj_idx][0].iov_len = src_len - offset;
+
+	obj->sec_buf[obj_idx].vec = obj->vec[obj_idx];
+	obj->sec_buf[obj_idx].num = 1;
+
+	memcpy(data->iv, test_data->iv.data, test_data->iv.len);
+	if (is_unit_test)
+		debug_hexdump(stdout, "iv:", test_data->iv.data,
+				test_data->iv.len);
+
+	obj->iv[obj_idx] = (void *)data->iv;
+	obj->digest[obj_idx] = (void *)data->digest;
+
+	return 0;
+}
+
+static int
+check_blockcipher_result(struct cpu_crypto_test_case *tcase,
+		uint32_t op_mask,
+		const struct blockcipher_test_data *test_data)
+{
+	int ret;
+
+	if (op_mask & BLOCKCIPHER_TEST_OP_CIPHER) {
+		const char *err_msg1, *err_msg2;
+		const uint8_t *src_pt_ct;
+		uint32_t src_len;
+
+		if (op_mask & TOP_ENC) {
+			src_pt_ct = test_data->ciphertext.data;
+			src_len = test_data->ciphertext.len;
+			err_msg1 = CPU_CRYPTO_ERR_EXP_CT;
+			err_msg2 = CPU_CRYPTO_ERR_GEN_CT;
+		} else {
+			src_pt_ct = test_data->plaintext.data;
+			src_len = test_data->plaintext.len;
+			err_msg1 = CPU_CRYPTO_ERR_EXP_PT;
+			err_msg2 = CPU_CRYPTO_ERR_GEN_PT;
+		}
+
+		ret = memcmp(tcase->seg_buf[0].seg, src_pt_ct, src_len);
+		if (ret != 0) {
+			debug_hexdump(stdout, err_msg1, src_pt_ct, src_len);
+			debug_hexdump(stdout, err_msg2,
+					tcase->seg_buf[0].seg,
+					test_data->ciphertext.len);
+			return -1;
+		}
+	}
+
+	if (op_mask & TOP_AUTH_GEN) {
+		ret = memcmp(tcase->digest, test_data->digest.data,
+				test_data->digest.len);
+		if (ret != 0) {
+			debug_hexdump(stdout, "expect digest:",
+					test_data->digest.data,
+					test_data->digest.len);
+			debug_hexdump(stdout, "gen digest:",
+					tcase->digest,
+					test_data->digest.len);
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
+static int
+cpu_crypto_test_blockcipher(const struct blockcipher_test_data *tdata,
+		uint32_t op_mask)
+{
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+	struct cpu_crypto_test_obj *obj = &ut_params->test_obj;
+	struct cpu_crypto_test_case *tcase;
+	int ret;
+
+	ut_params->sess = create_blockcipher_session(ts_params->ctx,
+			ts_params->session_priv_mpool,
+			op_mask,
+			tdata,
+			1);
+	if (!ut_params->sess)
+		return -1;
+
+	ret = allocate_buf(1);
+	if (ret)
+		return ret;
+
+	tcase = ut_params->test_datas[0];
+	ret = assemble_blockcipher_buf(tcase, obj, 0, op_mask, tdata, 1);
+	if (ret < 0) {
+		printf("Test is not supported by the driver\n");
+		return ret;
+	}
+
+	run_test(ts_params->ctx, ut_params->sess, obj, 1);
+
+	ret = check_status(obj, 1);
+	if (ret < 0)
+		return ret;
+
+	ret = check_blockcipher_result(tcase, op_mask, tdata);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+/* Macro to save code for defining BlockCipher test cases */
+/* test-vector-name/op */
+#define all_blockcipher_test_cases \
+	TEST_EXPAND(aes_test_data_1, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_1, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_1, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_1, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_2, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_2, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_2, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_2, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_3, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_3, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_3, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_3, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_4, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_4, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_4, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_4, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_5, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_5, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_5, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_5, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_6, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_6, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_6, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_6, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_7, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_7, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_7, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_7, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_8, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_8, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_8, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_8, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_9, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_9, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_9, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_9, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_10, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_10, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_11, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_11, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_12, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_12, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_12, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_12, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_13, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_13, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_13, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_13, TOP_AUTH_DEC) \
+	TEST_EXPAND(des_test_data_1, TOP_ENC) \
+	TEST_EXPAND(des_test_data_1, TOP_DEC) \
+	TEST_EXPAND(des_test_data_2, TOP_ENC) \
+	TEST_EXPAND(des_test_data_2, TOP_DEC) \
+	TEST_EXPAND(des_test_data_3, TOP_ENC) \
+	TEST_EXPAND(des_test_data_3, TOP_DEC) \
+	TEST_EXPAND(triple_des128cbc_hmac_sha1_test_vector, TOP_ENC) \
+	TEST_EXPAND(triple_des128cbc_hmac_sha1_test_vector, TOP_DEC) \
+	TEST_EXPAND(triple_des128cbc_hmac_sha1_test_vector, TOP_ENC_AUTH) \
+	TEST_EXPAND(triple_des128cbc_hmac_sha1_test_vector, TOP_AUTH_DEC) \
+	TEST_EXPAND(triple_des64cbc_test_vector, TOP_ENC) \
+	TEST_EXPAND(triple_des64cbc_test_vector, TOP_DEC) \
+	TEST_EXPAND(triple_des128cbc_test_vector, TOP_ENC) \
+	TEST_EXPAND(triple_des128cbc_test_vector, TOP_DEC) \
+	TEST_EXPAND(triple_des192cbc_test_vector, TOP_ENC) \
+	TEST_EXPAND(triple_des192cbc_test_vector, TOP_DEC) \
+
+#define TEST_EXPAND(t, o)						\
+static int								\
+cpu_crypto_blockcipher_test_##t##_##o(void)				\
+{									\
+	return cpu_crypto_test_blockcipher(&t, o);			\
+}
+
+all_blockcipher_test_cases
+#undef TEST_EXPAND
+
+static struct unit_test_suite security_cpu_crypto_aesni_mb_testsuite  = {
+	.suite_name = "Security CPU Crypto AESNI-MB Unit Test Suite",
+	.setup = testsuite_setup,
+	.teardown = testsuite_teardown,
+	.unit_test_cases = {
+#define TEST_EXPAND(t, o)						\
+	TEST_CASE_ST(ut_setup, ut_teardown,				\
+			cpu_crypto_aead_enc_test_##t##_##o),		\
+	TEST_CASE_ST(ut_setup, ut_teardown,				\
+			cpu_crypto_aead_dec_test_##t##_##o),		\
+
+	all_gcm_unit_test_cases(SGL_ONE_SEG)
+	all_ccm_unit_test_cases
+#undef TEST_EXPAND
+
+#define TEST_EXPAND(t, o)						\
+	TEST_CASE_ST(ut_setup, ut_teardown,				\
+			cpu_crypto_blockcipher_test_##t##_##o),		\
+
+	all_blockcipher_test_cases
+#undef TEST_EXPAND
+
+	TEST_CASES_END() /**< NULL terminate unit test array */
+	},
+};
+
+static int
+test_security_cpu_crypto_aesni_mb(void)
+{
+	gbl_driver_id =	rte_cryptodev_driver_id_get(
+			RTE_STR(CRYPTODEV_NAME_AESNI_MB_PMD));
+
+	return unit_test_suite_runner(&security_cpu_crypto_aesni_mb_testsuite);
+}
+
 REGISTER_TEST_COMMAND(security_aesni_gcm_autotest,
 		test_security_cpu_crypto_aesni_gcm);
 
 REGISTER_TEST_COMMAND(security_aesni_gcm_perftest,
 		test_security_cpu_crypto_aesni_gcm_perf);
+
+REGISTER_TEST_COMMAND(security_aesni_mb_autotest,
+		test_security_cpu_crypto_aesni_mb);
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH 07/10] app/test: add aesni_mb security cpu crypto perftest
  2019-09-06 13:13 ` [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process Fan Zhang
                     ` (5 preceding siblings ...)
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 06/10] app/test: add aesni_mb security cpu crypto autotest Fan Zhang
@ 2019-09-06 13:13   ` Fan Zhang
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 08/10] ipsec: add rte_security cpu_crypto action support Fan Zhang
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 87+ messages in thread
From: Fan Zhang @ 2019-09-06 13:13 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

Since crypto perf application does not support rte_security, this patch
adds a simple AES-CBC-SHA1-HMAC CPU crypto performance test to crypto
unittest application. The test includes different key and data sizes test
with single buffer test items and will display the throughput as well as
cycle count performance information.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 app/test/test_security_cpu_crypto.c | 194 ++++++++++++++++++++++++++++++++++++
 1 file changed, 194 insertions(+)

diff --git a/app/test/test_security_cpu_crypto.c b/app/test/test_security_cpu_crypto.c
index 0ea406390..6e012672e 100644
--- a/app/test/test_security_cpu_crypto.c
+++ b/app/test/test_security_cpu_crypto.c
@@ -1122,6 +1122,197 @@ test_security_cpu_crypto_aesni_mb(void)
 	return unit_test_suite_runner(&security_cpu_crypto_aesni_mb_testsuite);
 }
 
+static inline void
+switch_blockcipher_enc_to_dec(struct blockcipher_test_data *tdata,
+		struct cpu_crypto_test_case *tcase, uint8_t *dst)
+{
+	memcpy(dst, tcase->seg_buf[0].seg, tcase->seg_buf[0].seg_len);
+	tdata->ciphertext.len = tcase->seg_buf[0].seg_len;
+	memcpy(tdata->digest.data, tcase->digest, tdata->digest.len);
+}
+
+static int
+cpu_crypto_test_blockcipher_perf(
+		const enum rte_crypto_cipher_algorithm cipher_algo,
+		uint32_t cipher_key_sz,
+		const enum rte_crypto_auth_algorithm auth_algo,
+		uint32_t auth_key_sz, uint32_t digest_sz,
+		uint32_t op_mask)
+{
+	struct blockcipher_test_data tdata = {0};
+	uint8_t plaintext[3000], ciphertext[3000];
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+	struct cpu_crypto_test_obj *obj = &ut_params->test_obj;
+	struct cpu_crypto_test_case *tcase;
+	uint64_t hz = rte_get_tsc_hz(), time_start, time_now;
+	double rate, cycles_per_buf;
+	uint32_t test_data_szs[] = {64, 128, 256, 512, 1024, 2048};
+	uint32_t i, j;
+	uint32_t op_mask_opp = 0;
+	int ret;
+
+	if (op_mask & BLOCKCIPHER_TEST_OP_CIPHER)
+		op_mask_opp |= (~op_mask & BLOCKCIPHER_TEST_OP_CIPHER);
+	if (op_mask & BLOCKCIPHER_TEST_OP_AUTH)
+		op_mask_opp |= (~op_mask & BLOCKCIPHER_TEST_OP_AUTH);
+
+	tdata.plaintext.data = plaintext;
+	tdata.ciphertext.data = ciphertext;
+
+	tdata.cipher_key.len = cipher_key_sz;
+	tdata.auth_key.len = auth_key_sz;
+
+	gen_rand(tdata.cipher_key.data, cipher_key_sz / 8);
+	gen_rand(tdata.auth_key.data, auth_key_sz / 8);
+
+	tdata.crypto_algo = cipher_algo;
+	tdata.auth_algo = auth_algo;
+
+	tdata.digest.len = digest_sz;
+
+	ut_params->sess = create_blockcipher_session(ts_params->ctx,
+			ts_params->session_priv_mpool,
+			op_mask,
+			&tdata,
+			0);
+	if (!ut_params->sess)
+		return -1;
+
+	ret = allocate_buf(MAX_NUM_OPS_INFLIGHT);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < RTE_DIM(test_data_szs); i++) {
+		for (j = 0; j < MAX_NUM_OPS_INFLIGHT; j++) {
+			tdata.plaintext.len = test_data_szs[i];
+			gen_rand(plaintext, tdata.plaintext.len);
+
+			tdata.iv.len = 16;
+			gen_rand(tdata.iv.data, tdata.iv.len);
+
+			tcase = ut_params->test_datas[j];
+			ret = assemble_blockcipher_buf(tcase, obj, j,
+					op_mask,
+					&tdata,
+					0);
+			if (ret < 0) {
+				printf("Test is not supported by the driver\n");
+				return ret;
+			}
+		}
+
+		/* warm up cache */
+		for (j = 0; j < CACHE_WARM_ITER; j++)
+			run_test(ts_params->ctx, ut_params->sess, obj,
+					MAX_NUM_OPS_INFLIGHT);
+
+		time_start = rte_rdtsc();
+
+		run_test(ts_params->ctx, ut_params->sess, obj,
+				MAX_NUM_OPS_INFLIGHT);
+
+		time_now = rte_rdtsc();
+
+		rate = time_now - time_start;
+		cycles_per_buf = rate / MAX_NUM_OPS_INFLIGHT;
+
+		rate = ((hz / cycles_per_buf)) / 1000000;
+
+		printf("%s-%u-%s(%4uB) Enc %03.3fMpps (%03.3fGbps) ",
+			rte_crypto_cipher_algorithm_strings[cipher_algo],
+			cipher_key_sz * 8,
+			rte_crypto_auth_algorithm_strings[auth_algo],
+			test_data_szs[i],
+			rate, rate  * test_data_szs[i] * 8 / 1000);
+		printf("cycles per buf %03.3f per byte %03.3f\n",
+			cycles_per_buf, cycles_per_buf / test_data_szs[i]);
+
+		for (j = 0; j < MAX_NUM_OPS_INFLIGHT; j++) {
+			tcase = ut_params->test_datas[j];
+
+			switch_blockcipher_enc_to_dec(&tdata, tcase,
+					ciphertext);
+			ret = assemble_blockcipher_buf(tcase, obj, j,
+					op_mask_opp,
+					&tdata,
+					0);
+			if (ret < 0) {
+				printf("Test is not supported by the driver\n");
+				return ret;
+			}
+		}
+
+		time_start = rte_get_timer_cycles();
+
+		run_test(ts_params->ctx, ut_params->sess, obj,
+				MAX_NUM_OPS_INFLIGHT);
+
+		time_now = rte_get_timer_cycles();
+
+		rate = time_now - time_start;
+		cycles_per_buf = rate / MAX_NUM_OPS_INFLIGHT;
+
+		rate = ((hz / cycles_per_buf)) / 1000000;
+
+		printf("%s-%u-%s(%4uB) Dec %03.3fMpps (%03.3fGbps) ",
+			rte_crypto_cipher_algorithm_strings[cipher_algo],
+			cipher_key_sz * 8,
+			rte_crypto_auth_algorithm_strings[auth_algo],
+			test_data_szs[i],
+			rate, rate  * test_data_szs[i] * 8 / 1000);
+		printf("cycles per buf %03.3f per byte %03.3f\n",
+				cycles_per_buf,
+				cycles_per_buf / test_data_szs[i]);
+	}
+
+	return 0;
+}
+
+/* cipher-algo/cipher-key-len/auth-algo/auth-key-len/digest-len/op */
+#define all_block_cipher_perf_test_cases				\
+	TEST_EXPAND(_AES_CBC, 128, _NULL, 0, 0, TOP_ENC)		\
+	TEST_EXPAND(_NULL, 0, _SHA1_HMAC, 160, 20, TOP_AUTH_GEN)	\
+	TEST_EXPAND(_AES_CBC, 128, _SHA1_HMAC, 160, 20, TOP_ENC_AUTH)
+
+#define TEST_EXPAND(a, b, c, d, e, f)					\
+static int								\
+cpu_crypto_blockcipher_perf##a##_##b##c##_##f(void)			\
+{									\
+	return cpu_crypto_test_blockcipher_perf(RTE_CRYPTO_CIPHER##a,	\
+			b / 8, RTE_CRYPTO_AUTH##c, d / 8, e, f);	\
+}									\
+
+all_block_cipher_perf_test_cases
+#undef TEST_EXPAND
+
+static struct unit_test_suite security_cpu_crypto_aesni_mb_perf_testsuite  = {
+	.suite_name = "Security CPU Crypto AESNI-MB Perf Test Suite",
+	.setup = testsuite_setup,
+	.teardown = testsuite_teardown,
+	.unit_test_cases = {
+#define TEST_EXPAND(a, b, c, d, e, f)					\
+	TEST_CASE_ST(ut_setup, ut_teardown,				\
+		cpu_crypto_blockcipher_perf##a##_##b##c##_##f),	\
+
+	all_block_cipher_perf_test_cases
+#undef TEST_EXPAND
+
+	TEST_CASES_END() /**< NULL terminate unit test array */
+	},
+};
+
+static int
+test_security_cpu_crypto_aesni_mb_perf(void)
+{
+	gbl_driver_id =	rte_cryptodev_driver_id_get(
+			RTE_STR(CRYPTODEV_NAME_AESNI_MB_PMD));
+
+	return unit_test_suite_runner(
+			&security_cpu_crypto_aesni_mb_perf_testsuite);
+}
+
+
 REGISTER_TEST_COMMAND(security_aesni_gcm_autotest,
 		test_security_cpu_crypto_aesni_gcm);
 
@@ -1130,3 +1321,6 @@ REGISTER_TEST_COMMAND(security_aesni_gcm_perftest,
 
 REGISTER_TEST_COMMAND(security_aesni_mb_autotest,
 		test_security_cpu_crypto_aesni_mb);
+
+REGISTER_TEST_COMMAND(security_aesni_mb_perftest,
+		test_security_cpu_crypto_aesni_mb_perf);
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH 08/10] ipsec: add rte_security cpu_crypto action support
  2019-09-06 13:13 ` [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process Fan Zhang
                     ` (6 preceding siblings ...)
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 07/10] app/test: add aesni_mb security cpu crypto perftest Fan Zhang
@ 2019-09-06 13:13   ` Fan Zhang
  2019-09-26 23:20     ` Ananyev, Konstantin
  2019-09-27 10:38     ` Ananyev, Konstantin
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 09/10] examples/ipsec-secgw: add security " Fan Zhang
                     ` (3 subsequent siblings)
  11 siblings, 2 replies; 87+ messages in thread
From: Fan Zhang @ 2019-09-06 13:13 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

This patch updates the ipsec library to handle the newly introduced
RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO action.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 lib/librte_ipsec/esp_inb.c  | 174 +++++++++++++++++++++++++-
 lib/librte_ipsec/esp_outb.c | 290 +++++++++++++++++++++++++++++++++++++++++++-
 lib/librte_ipsec/sa.c       |  53 ++++++--
 lib/librte_ipsec/sa.h       |  29 +++++
 lib/librte_ipsec/ses.c      |   4 +-
 5 files changed, 539 insertions(+), 11 deletions(-)

diff --git a/lib/librte_ipsec/esp_inb.c b/lib/librte_ipsec/esp_inb.c
index 8e3ecbc64..6077dcb1e 100644
--- a/lib/librte_ipsec/esp_inb.c
+++ b/lib/librte_ipsec/esp_inb.c
@@ -105,6 +105,73 @@ inb_cop_prepare(struct rte_crypto_op *cop,
 	}
 }
 
+static inline int
+inb_sync_crypto_proc_prepare(const struct rte_ipsec_sa *sa, struct rte_mbuf *mb,
+	const union sym_op_data *icv, uint32_t pofs, uint32_t plen,
+	struct rte_security_vec *buf, struct iovec *cur_vec,
+	void *iv, void **aad, void **digest)
+{
+	struct rte_mbuf *ms;
+	struct iovec *vec = cur_vec;
+	struct aead_gcm_iv *gcm;
+	struct aesctr_cnt_blk *ctr;
+	uint64_t *ivp;
+	uint32_t algo, left, off = 0, n_seg = 0;
+
+	ivp = rte_pktmbuf_mtod_offset(mb, uint64_t *,
+		pofs + sizeof(struct rte_esp_hdr));
+	algo = sa->algo_type;
+
+	switch (algo) {
+	case ALGO_TYPE_AES_GCM:
+		gcm = (struct aead_gcm_iv *)iv;
+		aead_gcm_iv_fill(gcm, ivp[0], sa->salt);
+		*aad = icv->va + sa->icv_len;
+		off = sa->ctp.cipher.offset + pofs;
+		break;
+	case ALGO_TYPE_AES_CBC:
+	case ALGO_TYPE_3DES_CBC:
+		off = sa->ctp.auth.offset + pofs;
+		break;
+	case ALGO_TYPE_AES_CTR:
+		off = sa->ctp.auth.offset + pofs;
+		ctr = (struct aesctr_cnt_blk *)iv;
+		aes_ctr_cnt_blk_fill(ctr, ivp[0], sa->salt);
+		break;
+	case ALGO_TYPE_NULL:
+		break;
+	}
+
+	*digest = icv->va;
+
+	left = plen - sa->ctp.cipher.length;
+
+	ms = mbuf_get_seg_ofs(mb, &off);
+	if (!ms)
+		return -1;
+
+	while (n_seg < RTE_LIBRTE_IP_FRAG_MAX_FRAG && left && ms) {
+		uint32_t len = RTE_MIN(left, ms->data_len - off);
+
+		vec->iov_base = rte_pktmbuf_mtod_offset(ms, void *, off);
+		vec->iov_len = len;
+
+		left -= len;
+		vec++;
+		n_seg++;
+		ms = ms->next;
+		off = 0;
+	}
+
+	if (left)
+		return -1;
+
+	buf->vec = cur_vec;
+	buf->num = n_seg;
+
+	return n_seg;
+}
+
 /*
  * Helper function for prepare() to deal with situation when
  * ICV is spread by two segments. Tries to move ICV completely into the
@@ -512,7 +579,6 @@ tun_process(const struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
 	return k;
 }
 
-
 /*
  * *process* function for tunnel packets
  */
@@ -625,6 +691,112 @@ esp_inb_pkt_process(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
 	return n;
 }
 
+/*
+ * process packets using sync crypto engine
+ */
+static uint16_t
+esp_inb_sync_crypto_pkt_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num, uint8_t sqh_len,
+		esp_inb_process_t process)
+{
+	int32_t rc;
+	uint32_t i, k, hl, n, p;
+	struct rte_ipsec_sa *sa;
+	struct replay_sqn *rsn;
+	union sym_op_data icv;
+	uint32_t sqn[num];
+	uint32_t dr[num];
+	struct rte_security_vec buf[num];
+	struct iovec vec[RTE_LIBRTE_IP_FRAG_MAX_FRAG * num];
+	uint32_t vec_idx = 0;
+	uint8_t ivs[num][IPSEC_MAX_IV_SIZE];
+	void *iv[num];
+	void *aad[num];
+	void *digest[num];
+	int status[num];
+
+	sa = ss->sa;
+	rsn = rsn_acquire(sa);
+
+	k = 0;
+	for (i = 0; i != num; i++) {
+		hl = mb[i]->l2_len + mb[i]->l3_len;
+		rc = inb_pkt_prepare(sa, rsn, mb[i], hl, &icv);
+		if (rc >= 0) {
+			iv[k] = (void *)ivs[k];
+			rc = inb_sync_crypto_proc_prepare(sa, mb[i], &icv, hl,
+					rc, &buf[k], &vec[vec_idx], iv[k],
+					&aad[k], &digest[k]);
+			if (rc < 0) {
+				dr[i - k] = i;
+				continue;
+			}
+
+			vec_idx += rc;
+			k++;
+		} else
+			dr[i - k] = i;
+	}
+
+	/* copy not prepared mbufs beyond good ones */
+	if (k != num) {
+		rte_errno = EBADMSG;
+
+		if (unlikely(k == 0))
+			return 0;
+
+		move_bad_mbufs(mb, dr, num, num - k);
+	}
+
+	/* process the packets */
+	n = 0;
+	rte_security_process_cpu_crypto_bulk(ss->security.ctx,
+			ss->security.ses, buf, iv, aad, digest, status,
+			k);
+	/* move failed process packets to dr */
+	for (i = 0; i < k; i++) {
+		if (status[i]) {
+			dr[n++] = i;
+			rte_errno = EBADMSG;
+		}
+	}
+
+	/* move bad packets to the back */
+	if (n)
+		move_bad_mbufs(mb, dr, k, n);
+
+	/* process packets */
+	p = process(sa, mb, sqn, dr, k - n, sqh_len);
+
+	if (p != k - n && p != 0)
+		move_bad_mbufs(mb, dr, k - n, k - n - p);
+
+	if (p != num)
+		rte_errno = EBADMSG;
+
+	return p;
+}
+
+uint16_t
+esp_inb_tun_sync_crypto_pkt_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
+{
+	struct rte_ipsec_sa *sa = ss->sa;
+
+	return esp_inb_sync_crypto_pkt_process(ss, mb, num, sa->sqh_len,
+			tun_process);
+}
+
+uint16_t
+esp_inb_trs_sync_crypto_pkt_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
+{
+	struct rte_ipsec_sa *sa = ss->sa;
+
+	return esp_inb_sync_crypto_pkt_process(ss, mb, num, sa->sqh_len,
+			trs_process);
+}
+
 /*
  * process group of ESP inbound tunnel packets.
  */
diff --git a/lib/librte_ipsec/esp_outb.c b/lib/librte_ipsec/esp_outb.c
index 55799a867..097cb663f 100644
--- a/lib/librte_ipsec/esp_outb.c
+++ b/lib/librte_ipsec/esp_outb.c
@@ -403,6 +403,292 @@ esp_outb_trs_prepare(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
 	return k;
 }
 
+
+static inline int
+outb_sync_crypto_proc_prepare(struct rte_mbuf *m, const struct rte_ipsec_sa *sa,
+		const uint64_t ivp[IPSEC_MAX_IV_QWORD],
+		const union sym_op_data *icv, uint32_t hlen, uint32_t plen,
+		struct rte_security_vec *buf, struct iovec *cur_vec, void *iv,
+		void **aad, void **digest)
+{
+	struct rte_mbuf *ms;
+	struct aead_gcm_iv *gcm;
+	struct aesctr_cnt_blk *ctr;
+	struct iovec *vec = cur_vec;
+	uint32_t left, off = 0, n_seg = 0;
+	uint32_t algo;
+
+	algo = sa->algo_type;
+
+	switch (algo) {
+	case ALGO_TYPE_AES_GCM:
+		gcm = iv;
+		aead_gcm_iv_fill(gcm, ivp[0], sa->salt);
+		*aad = (void *)(icv->va + sa->icv_len);
+		off = sa->ctp.cipher.offset + hlen;
+		break;
+	case ALGO_TYPE_AES_CBC:
+	case ALGO_TYPE_3DES_CBC:
+		off = sa->ctp.auth.offset + hlen;
+		break;
+	case ALGO_TYPE_AES_CTR:
+		ctr = iv;
+		aes_ctr_cnt_blk_fill(ctr, ivp[0], sa->salt);
+		break;
+	case ALGO_TYPE_NULL:
+		break;
+	}
+
+	*digest = (void *)icv->va;
+
+	left = sa->ctp.cipher.length + plen;
+
+	ms = mbuf_get_seg_ofs(m, &off);
+	if (!ms)
+		return -1;
+
+	while (n_seg < RTE_LIBRTE_IP_FRAG_MAX_FRAG && left && ms) {
+		uint32_t len = RTE_MIN(left, ms->data_len - off);
+
+		vec->iov_base = rte_pktmbuf_mtod_offset(ms, void *, off);
+		vec->iov_len = len;
+
+		left -= len;
+		vec++;
+		n_seg++;
+		ms = ms->next;
+		off = 0;
+	}
+
+	if (left)
+		return -1;
+
+	buf->vec = cur_vec;
+	buf->num = n_seg;
+
+	return n_seg;
+}
+
+/**
+ * Local post process function prototype that same as process function prototype
+ * as rte_ipsec_sa_pkt_func's process().
+ */
+typedef uint16_t (*sync_crypto_post_process)(const struct rte_ipsec_session *ss,
+				struct rte_mbuf *mb[],
+				uint16_t num);
+static uint16_t
+esp_outb_tun_sync_crypto_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num,
+		sync_crypto_post_process post_process)
+{
+	uint64_t sqn;
+	rte_be64_t sqc;
+	struct rte_ipsec_sa *sa;
+	struct rte_security_ctx *ctx;
+	struct rte_security_session *rss;
+	union sym_op_data icv;
+	struct rte_security_vec buf[num];
+	struct iovec vec[RTE_LIBRTE_IP_FRAG_MAX_FRAG * num];
+	uint32_t vec_idx = 0;
+	void *aad[num];
+	void *digest[num];
+	void *iv[num];
+	uint8_t ivs[num][IPSEC_MAX_IV_SIZE];
+	uint64_t ivp[IPSEC_MAX_IV_QWORD];
+	int status[num];
+	uint32_t dr[num];
+	uint32_t i, n, k;
+	int32_t rc;
+
+	sa = ss->sa;
+	ctx = ss->security.ctx;
+	rss = ss->security.ses;
+
+	k = 0;
+	n = num;
+	sqn = esn_outb_update_sqn(sa, &n);
+	if (n != num)
+		rte_errno = EOVERFLOW;
+
+	for (i = 0; i != n; i++) {
+		sqc = rte_cpu_to_be_64(sqn + i);
+		gen_iv(ivp, sqc);
+
+		/* try to update the packet itself */
+		rc = outb_tun_pkt_prepare(sa, sqc, ivp, mb[i], &icv,
+				sa->sqh_len);
+
+		/* success, setup crypto op */
+		if (rc >= 0) {
+			outb_pkt_xprepare(sa, sqc, &icv);
+
+			iv[k] = (void *)ivs[k];
+			rc = outb_sync_crypto_proc_prepare(mb[i], sa, ivp, &icv,
+					0, rc, &buf[k], &vec[vec_idx], iv[k],
+					&aad[k], &digest[k]);
+			if (rc < 0) {
+				dr[i - k] = i;
+				rte_errno = -rc;
+				continue;
+			}
+
+			vec_idx += rc;
+			k++;
+		/* failure, put packet into the death-row */
+		} else {
+			dr[i - k] = i;
+			rte_errno = -rc;
+		}
+	}
+
+	 /* copy not prepared mbufs beyond good ones */
+	if (k != n && k != 0)
+		move_bad_mbufs(mb, dr, n, n - k);
+
+	if (unlikely(k == 0)) {
+		rte_errno = EBADMSG;
+		return 0;
+	}
+
+	/* process the packets */
+	n = 0;
+	rte_security_process_cpu_crypto_bulk(ctx, rss, buf, iv, aad, digest,
+			status, k);
+	/* move failed process packets to dr */
+	for (i = 0; i < n; i++) {
+		if (status[i])
+			dr[n++] = i;
+	}
+
+	if (n)
+		move_bad_mbufs(mb, dr, k, n);
+
+	return post_process(ss, mb, k - n);
+}
+
+static uint16_t
+esp_outb_trs_sync_crypto_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num,
+		sync_crypto_post_process post_process)
+
+{
+	uint64_t sqn;
+	rte_be64_t sqc;
+	struct rte_ipsec_sa *sa;
+	struct rte_security_ctx *ctx;
+	struct rte_security_session *rss;
+	union sym_op_data icv;
+	struct rte_security_vec buf[num];
+	struct iovec vec[RTE_LIBRTE_IP_FRAG_MAX_FRAG * num];
+	uint32_t vec_idx = 0;
+	void *aad[num];
+	void *digest[num];
+	uint8_t ivs[num][IPSEC_MAX_IV_SIZE];
+	void *iv[num];
+	int status[num];
+	uint64_t ivp[IPSEC_MAX_IV_QWORD];
+	uint32_t dr[num];
+	uint32_t i, n, k;
+	uint32_t l2, l3;
+	int32_t rc;
+
+	sa = ss->sa;
+	ctx = ss->security.ctx;
+	rss = ss->security.ses;
+
+	k = 0;
+	n = num;
+	sqn = esn_outb_update_sqn(sa, &n);
+	if (n != num)
+		rte_errno = EOVERFLOW;
+
+	for (i = 0; i != n; i++) {
+		l2 = mb[i]->l2_len;
+		l3 = mb[i]->l3_len;
+
+		sqc = rte_cpu_to_be_64(sqn + i);
+		gen_iv(ivp, sqc);
+
+		/* try to update the packet itself */
+		rc = outb_trs_pkt_prepare(sa, sqc, ivp, mb[i], l2, l3, &icv,
+				sa->sqh_len);
+
+		/* success, setup crypto op */
+		if (rc >= 0) {
+			outb_pkt_xprepare(sa, sqc, &icv);
+
+			iv[k] = (void *)ivs[k];
+
+			rc = outb_sync_crypto_proc_prepare(mb[i], sa, ivp, &icv,
+					l2 + l3, rc, &buf[k], &vec[vec_idx],
+					iv[k], &aad[k], &digest[k]);
+			if (rc < 0) {
+				dr[i - k] = i;
+				rte_errno = -rc;
+				continue;
+			}
+
+			vec_idx += rc;
+			k++;
+		/* failure, put packet into the death-row */
+		} else {
+			dr[i - k] = i;
+			rte_errno = -rc;
+		}
+	}
+
+	 /* copy not prepared mbufs beyond good ones */
+	if (k != n && k != 0)
+		move_bad_mbufs(mb, dr, n, n - k);
+
+	/* process the packets */
+	n = 0;
+	rte_security_process_cpu_crypto_bulk(ctx, rss, buf, iv, aad, digest,
+			status, k);
+	/* move failed process packets to dr */
+	for (i = 0; i < k; i++) {
+		if (status[i])
+			dr[n++] = i;
+	}
+
+	if (n)
+		move_bad_mbufs(mb, dr, k, n);
+
+	return post_process(ss, mb, k - n);
+}
+
+uint16_t
+esp_outb_tun_sync_crpyto_sqh_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
+{
+	return esp_outb_tun_sync_crypto_process(ss, mb, num,
+			esp_outb_sqh_process);
+}
+
+uint16_t
+esp_outb_tun_sync_crpyto_flag_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
+{
+	return esp_outb_tun_sync_crypto_process(ss, mb, num,
+			esp_outb_pkt_flag_process);
+}
+
+uint16_t
+esp_outb_trs_sync_crpyto_sqh_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
+{
+	return esp_outb_trs_sync_crypto_process(ss, mb, num,
+			esp_outb_sqh_process);
+}
+
+uint16_t
+esp_outb_trs_sync_crpyto_flag_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
+{
+	return esp_outb_trs_sync_crypto_process(ss, mb, num,
+			esp_outb_pkt_flag_process);
+}
+
 /*
  * process outbound packets for SA with ESN support,
  * for algorithms that require SQN.hibits to be implictly included
@@ -410,8 +696,8 @@ esp_outb_trs_prepare(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
  * In that case we have to move ICV bytes back to their proper place.
  */
 uint16_t
-esp_outb_sqh_process(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
-	uint16_t num)
+esp_outb_sqh_process(const struct rte_ipsec_session *ss,
+	struct rte_mbuf *mb[], uint16_t num)
 {
 	uint32_t i, k, icv_len, *icv;
 	struct rte_mbuf *ml;
diff --git a/lib/librte_ipsec/sa.c b/lib/librte_ipsec/sa.c
index 23d394b46..31ffbce2c 100644
--- a/lib/librte_ipsec/sa.c
+++ b/lib/librte_ipsec/sa.c
@@ -544,9 +544,9 @@ lksd_proto_prepare(const struct rte_ipsec_session *ss,
  * - inbound/outbound for RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
  * - outbound for RTE_SECURITY_ACTION_TYPE_NONE when ESN is disabled
  */
-static uint16_t
-pkt_flag_process(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
-	uint16_t num)
+uint16_t
+esp_outb_pkt_flag_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
 {
 	uint32_t i, k;
 	uint32_t dr[num];
@@ -599,12 +599,48 @@ lksd_none_pkt_func_select(const struct rte_ipsec_sa *sa,
 	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TUNLV6):
 		pf->prepare = esp_outb_tun_prepare;
 		pf->process = (sa->sqh_len != 0) ?
-			esp_outb_sqh_process : pkt_flag_process;
+			esp_outb_sqh_process : esp_outb_pkt_flag_process;
 		break;
 	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TRANS):
 		pf->prepare = esp_outb_trs_prepare;
 		pf->process = (sa->sqh_len != 0) ?
-			esp_outb_sqh_process : pkt_flag_process;
+			esp_outb_sqh_process : esp_outb_pkt_flag_process;
+		break;
+	default:
+		rc = -ENOTSUP;
+	}
+
+	return rc;
+}
+
+static int
+lksd_sync_crypto_pkt_func_select(const struct rte_ipsec_sa *sa,
+		struct rte_ipsec_sa_pkt_func *pf)
+{
+	int32_t rc;
+
+	static const uint64_t msk = RTE_IPSEC_SATP_DIR_MASK |
+			RTE_IPSEC_SATP_MODE_MASK;
+
+	rc = 0;
+	switch (sa->type & msk) {
+	case (RTE_IPSEC_SATP_DIR_IB | RTE_IPSEC_SATP_MODE_TUNLV4):
+	case (RTE_IPSEC_SATP_DIR_IB | RTE_IPSEC_SATP_MODE_TUNLV6):
+		pf->process = esp_inb_tun_sync_crypto_pkt_process;
+		break;
+	case (RTE_IPSEC_SATP_DIR_IB | RTE_IPSEC_SATP_MODE_TRANS):
+		pf->process = esp_inb_trs_sync_crypto_pkt_process;
+		break;
+	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TUNLV4):
+	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TUNLV6):
+		pf->process = (sa->sqh_len != 0) ?
+			esp_outb_tun_sync_crpyto_sqh_process :
+			esp_outb_tun_sync_crpyto_flag_process;
+		break;
+	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TRANS):
+		pf->process = (sa->sqh_len != 0) ?
+			esp_outb_trs_sync_crpyto_sqh_process :
+			esp_outb_trs_sync_crpyto_flag_process;
 		break;
 	default:
 		rc = -ENOTSUP;
@@ -672,13 +708,16 @@ ipsec_sa_pkt_func_select(const struct rte_ipsec_session *ss,
 	case RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL:
 		if ((sa->type & RTE_IPSEC_SATP_DIR_MASK) ==
 				RTE_IPSEC_SATP_DIR_IB)
-			pf->process = pkt_flag_process;
+			pf->process = esp_outb_pkt_flag_process;
 		else
 			pf->process = inline_proto_outb_pkt_process;
 		break;
 	case RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL:
 		pf->prepare = lksd_proto_prepare;
-		pf->process = pkt_flag_process;
+		pf->process = esp_outb_pkt_flag_process;
+		break;
+	case RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO:
+		rc = lksd_sync_crypto_pkt_func_select(sa, pf);
 		break;
 	default:
 		rc = -ENOTSUP;
diff --git a/lib/librte_ipsec/sa.h b/lib/librte_ipsec/sa.h
index 51e69ad05..02c7abc60 100644
--- a/lib/librte_ipsec/sa.h
+++ b/lib/librte_ipsec/sa.h
@@ -156,6 +156,14 @@ uint16_t
 inline_inb_trs_pkt_process(const struct rte_ipsec_session *ss,
 	struct rte_mbuf *mb[], uint16_t num);
 
+uint16_t
+esp_inb_tun_sync_crypto_pkt_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num);
+
+uint16_t
+esp_inb_trs_sync_crypto_pkt_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num);
+
 /* outbound processing */
 
 uint16_t
@@ -170,6 +178,10 @@ uint16_t
 esp_outb_sqh_process(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
 	uint16_t num);
 
+uint16_t
+esp_outb_pkt_flag_process(const struct rte_ipsec_session *ss,
+	struct rte_mbuf *mb[], uint16_t num);
+
 uint16_t
 inline_outb_tun_pkt_process(const struct rte_ipsec_session *ss,
 	struct rte_mbuf *mb[], uint16_t num);
@@ -182,4 +194,21 @@ uint16_t
 inline_proto_outb_pkt_process(const struct rte_ipsec_session *ss,
 	struct rte_mbuf *mb[], uint16_t num);
 
+uint16_t
+esp_outb_tun_sync_crpyto_sqh_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num);
+
+uint16_t
+esp_outb_tun_sync_crpyto_flag_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num);
+
+uint16_t
+esp_outb_trs_sync_crpyto_sqh_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num);
+
+uint16_t
+esp_outb_trs_sync_crpyto_flag_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num);
+
+
 #endif /* _SA_H_ */
diff --git a/lib/librte_ipsec/ses.c b/lib/librte_ipsec/ses.c
index 82c765a33..eaa8c17b7 100644
--- a/lib/librte_ipsec/ses.c
+++ b/lib/librte_ipsec/ses.c
@@ -19,7 +19,9 @@ session_check(struct rte_ipsec_session *ss)
 			return -EINVAL;
 		if ((ss->type == RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO ||
 				ss->type ==
-				RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL) &&
+				RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL ||
+				ss->type ==
+				RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO) &&
 				ss->security.ctx == NULL)
 			return -EINVAL;
 	}
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH 09/10] examples/ipsec-secgw: add security cpu_crypto action support
  2019-09-06 13:13 ` [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process Fan Zhang
                     ` (7 preceding siblings ...)
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 08/10] ipsec: add rte_security cpu_crypto action support Fan Zhang
@ 2019-09-06 13:13   ` Fan Zhang
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 10/10] doc: update security cpu process description Fan Zhang
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 87+ messages in thread
From: Fan Zhang @ 2019-09-06 13:13 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

Since ipsec library is added cpu_crypto security action type support,
this patch updates ipsec-secgw sample application with added action type
"cpu-crypto". The patch also includes a number of test scripts to
prove the correctness of the implementation.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 examples/ipsec-secgw/ipsec.c                       | 22 ++++++++++++++++++++++
 examples/ipsec-secgw/ipsec_process.c               |  7 ++++---
 examples/ipsec-secgw/sa.c                          | 13 +++++++++++--
 examples/ipsec-secgw/test/run_test.sh              | 10 ++++++++++
 .../test/trs_3descbc_sha1_cpu_crypto_defs.sh       |  5 +++++
 .../test/trs_aescbc_sha1_cpu_crypto_defs.sh        |  5 +++++
 .../test/trs_aesctr_sha1_cpu_crypto_defs.sh        |  5 +++++
 .../ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh |  5 +++++
 .../test/trs_aesgcm_mb_cpu_crypto_defs.sh          |  7 +++++++
 .../test/tun_3descbc_sha1_cpu_crypto_defs.sh       |  5 +++++
 .../test/tun_aescbc_sha1_cpu_crypto_defs.sh        |  5 +++++
 .../test/tun_aesctr_sha1_cpu_crypto_defs.sh        |  5 +++++
 .../ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh |  5 +++++
 .../test/tun_aesgcm_mb_cpu_crypto_defs.sh          |  7 +++++++
 14 files changed, 101 insertions(+), 5 deletions(-)
 create mode 100644 examples/ipsec-secgw/test/trs_3descbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aescbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aesctr_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aesgcm_mb_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_3descbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aescbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aesctr_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aesgcm_mb_cpu_crypto_defs.sh

diff --git a/examples/ipsec-secgw/ipsec.c b/examples/ipsec-secgw/ipsec.c
index dc85adfe5..4c39a7de6 100644
--- a/examples/ipsec-secgw/ipsec.c
+++ b/examples/ipsec-secgw/ipsec.c
@@ -10,6 +10,7 @@
 #include <rte_crypto.h>
 #include <rte_security.h>
 #include <rte_cryptodev.h>
+#include <rte_ipsec.h>
 #include <rte_ethdev.h>
 #include <rte_mbuf.h>
 #include <rte_hash.h>
@@ -105,6 +106,26 @@ create_lookaside_session(struct ipsec_ctx *ipsec_ctx, struct ipsec_sa *sa)
 				"SEC Session init failed: err: %d\n", ret);
 				return -1;
 			}
+		} else if (sa->type == RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO) {
+			struct rte_security_ctx *ctx =
+				(struct rte_security_ctx *)
+				rte_cryptodev_get_sec_ctx(
+					ipsec_ctx->tbl[cdev_id_qp].id);
+			int32_t offset = sizeof(struct rte_esp_hdr) +
+					sa->iv_len;
+
+			/* Set IPsec parameters in conf */
+			sess_conf.cpucrypto.cipher_offset = offset;
+
+			set_ipsec_conf(sa, &(sess_conf.ipsec));
+			sa->security_ctx = ctx;
+			sa->sec_session = rte_security_session_create(ctx,
+				&sess_conf, ipsec_ctx->session_priv_pool);
+			if (sa->sec_session == NULL) {
+				RTE_LOG(ERR, IPSEC,
+				"SEC Session init failed: err: %d\n", ret);
+				return -1;
+			}
 		} else {
 			RTE_LOG(ERR, IPSEC, "Inline not supported\n");
 			return -1;
@@ -473,6 +494,7 @@ ipsec_enqueue(ipsec_xform_fn xform_func, struct ipsec_ctx *ipsec_ctx,
 						sa->sec_session, pkts[i], NULL);
 			continue;
 		case RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO:
+		case RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO:
 			RTE_ASSERT(sa->sec_session != NULL);
 			priv->cop.type = RTE_CRYPTO_OP_TYPE_SYMMETRIC;
 			priv->cop.status = RTE_CRYPTO_OP_STATUS_NOT_PROCESSED;
diff --git a/examples/ipsec-secgw/ipsec_process.c b/examples/ipsec-secgw/ipsec_process.c
index 868f1a28d..1932b631f 100644
--- a/examples/ipsec-secgw/ipsec_process.c
+++ b/examples/ipsec-secgw/ipsec_process.c
@@ -101,7 +101,8 @@ fill_ipsec_session(struct rte_ipsec_session *ss, struct ipsec_ctx *ctx,
 		}
 		ss->crypto.ses = sa->crypto_session;
 	/* setup session action type */
-	} else if (sa->type == RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL) {
+	} else if (sa->type == RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL ||
+			sa->type == RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO) {
 		if (sa->sec_session == NULL) {
 			rc = create_lookaside_session(ctx, sa);
 			if (rc != 0)
@@ -227,8 +228,8 @@ ipsec_process(struct ipsec_ctx *ctx, struct ipsec_traffic *trf)
 
 		/* process packets inline */
 		else if (sa->type == RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO ||
-				sa->type ==
-				RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL) {
+			sa->type == RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL ||
+			sa->type == RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO) {
 
 			satp = rte_ipsec_sa_type(ips->sa);
 
diff --git a/examples/ipsec-secgw/sa.c b/examples/ipsec-secgw/sa.c
index c3cf3bd1f..ba773346f 100644
--- a/examples/ipsec-secgw/sa.c
+++ b/examples/ipsec-secgw/sa.c
@@ -570,6 +570,9 @@ parse_sa_tokens(char **tokens, uint32_t n_tokens,
 				RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL;
 			else if (strcmp(tokens[ti], "no-offload") == 0)
 				rule->type = RTE_SECURITY_ACTION_TYPE_NONE;
+			else if (strcmp(tokens[ti], "cpu-crypto") == 0)
+				rule->type =
+					RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO;
 			else {
 				APP_CHECK(0, status, "Invalid input \"%s\"",
 						tokens[ti]);
@@ -624,10 +627,13 @@ parse_sa_tokens(char **tokens, uint32_t n_tokens,
 	if (status->status < 0)
 		return;
 
-	if ((rule->type != RTE_SECURITY_ACTION_TYPE_NONE) && (portid_p == 0))
+	if ((rule->type != RTE_SECURITY_ACTION_TYPE_NONE && rule->type !=
+			RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO) &&
+			(portid_p == 0))
 		printf("Missing portid option, falling back to non-offload\n");
 
-	if (!type_p || !portid_p) {
+	if (!type_p || (!portid_p && rule->type !=
+			RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO)) {
 		rule->type = RTE_SECURITY_ACTION_TYPE_NONE;
 		rule->portid = -1;
 	}
@@ -709,6 +715,9 @@ print_one_sa_rule(const struct ipsec_sa *sa, int inbound)
 	case RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL:
 		printf("lookaside-protocol-offload ");
 		break;
+	case RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO:
+		printf("cpu-crypto-accelerated");
+		break;
 	}
 	printf("\n");
 }
diff --git a/examples/ipsec-secgw/test/run_test.sh b/examples/ipsec-secgw/test/run_test.sh
index 8055a4c04..f322aa785 100755
--- a/examples/ipsec-secgw/test/run_test.sh
+++ b/examples/ipsec-secgw/test/run_test.sh
@@ -32,15 +32,21 @@ usage()
 }
 
 LINUX_TEST="tun_aescbc_sha1 \
+tun_aescbc_sha1_cpu_crypto \
 tun_aescbc_sha1_esn \
 tun_aescbc_sha1_esn_atom \
 tun_aesgcm \
+tun_aesgcm_cpu_crypto \
+tun_aesgcm_mb_cpu_crypto \
 tun_aesgcm_esn \
 tun_aesgcm_esn_atom \
 trs_aescbc_sha1 \
+trs_aescbc_sha1_cpu_crypto \
 trs_aescbc_sha1_esn \
 trs_aescbc_sha1_esn_atom \
 trs_aesgcm \
+trs_aesgcm_cpu_crypto \
+trs_aesgcm_mb_cpu_crypto \
 trs_aesgcm_esn \
 trs_aesgcm_esn_atom \
 tun_aescbc_sha1_old \
@@ -49,17 +55,21 @@ trs_aescbc_sha1_old \
 trs_aesgcm_old \
 tun_aesctr_sha1 \
 tun_aesctr_sha1_old \
+tun_aesctr_cpu_crypto \
 tun_aesctr_sha1_esn \
 tun_aesctr_sha1_esn_atom \
 trs_aesctr_sha1 \
+trs_aesctr_sha1_cpu_crypto \
 trs_aesctr_sha1_old \
 trs_aesctr_sha1_esn \
 trs_aesctr_sha1_esn_atom \
 tun_3descbc_sha1 \
+tun_3descbc_sha1_cpu_crypto \
 tun_3descbc_sha1_old \
 tun_3descbc_sha1_esn \
 tun_3descbc_sha1_esn_atom \
 trs_3descbc_sha1 \
+trs_3descbc_sha1 \
 trs_3descbc_sha1_old \
 trs_3descbc_sha1_esn \
 trs_3descbc_sha1_esn_atom"
diff --git a/examples/ipsec-secgw/test/trs_3descbc_sha1_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/trs_3descbc_sha1_cpu_crypto_defs.sh
new file mode 100644
index 000000000..a864a8886
--- /dev/null
+++ b/examples/ipsec-secgw/test/trs_3descbc_sha1_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/trs_3descbc_sha1_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/trs_aescbc_sha1_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/trs_aescbc_sha1_cpu_crypto_defs.sh
new file mode 100644
index 000000000..b515cd9f8
--- /dev/null
+++ b/examples/ipsec-secgw/test/trs_aescbc_sha1_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/trs_aescbc_sha1_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/trs_aesctr_sha1_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/trs_aesctr_sha1_cpu_crypto_defs.sh
new file mode 100644
index 000000000..745a2a02b
--- /dev/null
+++ b/examples/ipsec-secgw/test/trs_aesctr_sha1_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/trs_aesctr_sha1_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh
new file mode 100644
index 000000000..8917122da
--- /dev/null
+++ b/examples/ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/trs_aesgcm_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/trs_aesgcm_mb_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/trs_aesgcm_mb_cpu_crypto_defs.sh
new file mode 100644
index 000000000..26943321f
--- /dev/null
+++ b/examples/ipsec-secgw/test/trs_aesgcm_mb_cpu_crypto_defs.sh
@@ -0,0 +1,7 @@
+#! /bin/bash
+
+. ${DIR}/trs_aesgcm_defs.sh
+
+CRYPTO_DEV=${CRYPTO_DEV:-'--vdev="crypto_aesni_mb0"'}
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/tun_3descbc_sha1_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/tun_3descbc_sha1_cpu_crypto_defs.sh
new file mode 100644
index 000000000..747141f62
--- /dev/null
+++ b/examples/ipsec-secgw/test/tun_3descbc_sha1_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/tun_3descbc_sha1_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/tun_aescbc_sha1_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/tun_aescbc_sha1_cpu_crypto_defs.sh
new file mode 100644
index 000000000..56076fa50
--- /dev/null
+++ b/examples/ipsec-secgw/test/tun_aescbc_sha1_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/tun_aescbc_sha1_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/tun_aesctr_sha1_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/tun_aesctr_sha1_cpu_crypto_defs.sh
new file mode 100644
index 000000000..3af680533
--- /dev/null
+++ b/examples/ipsec-secgw/test/tun_aesctr_sha1_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/tun_aesctr_sha1_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh
new file mode 100644
index 000000000..5bf1c0ae5
--- /dev/null
+++ b/examples/ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/tun_aesgcm_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/tun_aesgcm_mb_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/tun_aesgcm_mb_cpu_crypto_defs.sh
new file mode 100644
index 000000000..039b8095e
--- /dev/null
+++ b/examples/ipsec-secgw/test/tun_aesgcm_mb_cpu_crypto_defs.sh
@@ -0,0 +1,7 @@
+#! /bin/bash
+
+. ${DIR}/tun_aesgcm_defs.sh
+
+CRYPTO_DEV=${CRYPTO_DEV:-'--vdev="crypto_aesni_mb0"'}
+
+SGW_CFG_XPRM='type cpu-crypto'
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH 10/10] doc: update security cpu process description
  2019-09-06 13:13 ` [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process Fan Zhang
                     ` (8 preceding siblings ...)
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 09/10] examples/ipsec-secgw: add security " Fan Zhang
@ 2019-09-06 13:13   ` Fan Zhang
  2019-09-09 12:43   ` [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process Aaron Conole
  2019-10-07 16:28   ` [dpdk-dev] [PATCH v2 " Fan Zhang
  11 siblings, 0 replies; 87+ messages in thread
From: Fan Zhang @ 2019-09-06 13:13 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

This patch updates programmer's guide and release note for
newly added security cpu process description.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 doc/guides/cryptodevs/aesni_gcm.rst    |   6 ++
 doc/guides/cryptodevs/aesni_mb.rst     |   7 +++
 doc/guides/prog_guide/rte_security.rst | 112 ++++++++++++++++++++++++++++++++-
 doc/guides/rel_notes/release_19_11.rst |   7 +++
 4 files changed, 131 insertions(+), 1 deletion(-)

diff --git a/doc/guides/cryptodevs/aesni_gcm.rst b/doc/guides/cryptodevs/aesni_gcm.rst
index 9a8bc9323..31297fabd 100644
--- a/doc/guides/cryptodevs/aesni_gcm.rst
+++ b/doc/guides/cryptodevs/aesni_gcm.rst
@@ -9,6 +9,12 @@ The AES-NI GCM PMD (**librte_pmd_aesni_gcm**) provides poll mode crypto driver
 support for utilizing Intel multi buffer library (see AES-NI Multi-buffer PMD documentation
 to learn more about it, including installation).
 
+The AES-NI GCM PMD also supports rte_security with security session create
+and ``rte_security_process_cpu_crypto_bulk`` function call to process
+symmetric crypto synchronously with all algorithms specified below. With this
+way it supports scather-gather buffers (``rte_security_vec`` can be greater than
+``1``. Please refer to ``rte_security`` programmer's guide for more detail.
+
 Features
 --------
 
diff --git a/doc/guides/cryptodevs/aesni_mb.rst b/doc/guides/cryptodevs/aesni_mb.rst
index 1eff2b073..1a3ddd850 100644
--- a/doc/guides/cryptodevs/aesni_mb.rst
+++ b/doc/guides/cryptodevs/aesni_mb.rst
@@ -12,6 +12,13 @@ support for utilizing Intel multi buffer library, see the white paper
 
 The AES-NI MB PMD has current only been tested on Fedora 21 64-bit with gcc.
 
+The AES-NI MB PMD also supports rte_security with security session create
+and ``rte_security_process_cpu_crypto_bulk`` function call to process
+symmetric crypto synchronously with all algorithms specified below. However
+it does not support scather-gather buffer so the ``num`` value in
+``rte_security_vec`` can only be ``1``. Please refer to ``rte_security``
+programmer's guide for more detail.
+
 Features
 --------
 
diff --git a/doc/guides/prog_guide/rte_security.rst b/doc/guides/prog_guide/rte_security.rst
index 7d0734a37..861619202 100644
--- a/doc/guides/prog_guide/rte_security.rst
+++ b/doc/guides/prog_guide/rte_security.rst
@@ -296,6 +296,56 @@ Just like IPsec, in case of PDCP also header addition/deletion, cipher/
 de-cipher, integrity protection/verification is done based on the action
 type chosen.
 
+
+Synchronous CPU Crypto
+~~~~~~~~~~~~~~~~~~~~~~
+
+RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO:
+This action type allows the burst of symmetric crypto workload using the same
+algorithm, key, and direction being processed by CPU cycles synchronously.
+
+The packet is sent to the crypto device for symmetric crypto
+processing. The device will encrypt or decrypt the buffer based on the key(s)
+and algorithm(s) specified and preprocessed in the security session. Different
+than the inline or lookaside modes, when the function exits, the user will
+expect the buffers are either processed successfully, or having the error number
+assigned to the appropriate index of the status array.
+
+E.g. in case of IPsec, the application will use CPU cycles to process both
+stack and crypto workload synchronously.
+
+.. code-block:: console
+
+         Egress Data Path
+                 |
+        +--------|--------+
+        |  egress IPsec   |
+        |        |        |
+        | +------V------+ |
+        | | SADB lookup | |
+        | +------|------+ |
+        | +------V------+ |
+        | |   Desc      | |
+        | +------|------+ |
+        +--------V--------+
+                 |
+        +--------V--------+
+        |    L2 Stack     |
+        +-----------------+
+        |                 |
+        |   Synchronous   |   <------ Using CPU instructions
+        |  Crypto Process |
+        |                 |
+        +--------V--------+
+        |  L2 Stack Post  |   <------ Add tunnel, ESP header etc header etc.
+        +--------|--------+
+                 |
+        +--------|--------+
+        |       NIC       |
+        +--------|--------+
+                 V
+
+
 Device Features and Capabilities
 ---------------------------------
 
@@ -491,6 +541,7 @@ Security Session configuration structure is defined as ``rte_security_session_co
                 struct rte_security_ipsec_xform ipsec;
                 struct rte_security_macsec_xform macsec;
                 struct rte_security_pdcp_xform pdcp;
+                struct rte_security_cpu_crypto_xform cpu_crypto;
         };
         /**< Configuration parameters for security session */
         struct rte_crypto_sym_xform *crypto_xform;
@@ -515,9 +566,12 @@ Offload.
         RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL,
         /**< All security protocol processing is performed inline during
          * transmission */
-        RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
+        RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL,
         /**< All security protocol processing including crypto is performed
          * on a lookaside accelerator */
+        RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
+        /**< Crypto processing for security protocol is processed by CPU
+         * synchronously
     };
 
 The ``rte_security_session_protocol`` is defined as
@@ -587,6 +641,10 @@ PDCP related configuration parameters are defined in ``rte_security_pdcp_xform``
         uint32_t hfn_threshold;
     };
 
+For CPU Crypto processing action, the application should attach the initialized
+`xform` to the security session configuration to specify the algorithm, key,
+direction, and other necessary fields required to perform crypto operation.
+
 
 Security API
 ~~~~~~~~~~~~
@@ -650,3 +708,55 @@ it is only valid to have a single flow to map to that security session.
         +-------+            +--------+    +-----+
         |  Eth  | ->  ... -> |   ESP  | -> | END |
         +-------+            +--------+    +-----+
+
+
+Process bulk crypto workload using CPU instructions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The inline and lookaside mode depends on the external HW to complete the
+workload, where the user has another option to use rte_security to process
+symmetric crypto synchronously with CPU instructions.
+
+When creating the security session the user need to fill the
+``rte_security_session_conf`` parameter with the ``action_type`` field as
+``RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO``, and points ``crypto_xform`` to an
+properly initialized cryptodev xform. The user then passes the
+``rte_security_session_conf`` instance to ``rte_security_session_create()``
+along with the security context pointer belongs to a certain SW crypto device.
+The crypto device may or may not support this action type or the algorithm /
+key sizes specified in the ``crypto_xform``, but when everything is ok
+the function will return the created security session.
+
+The user then can use this session to process the crypto workload synchronously.
+Instead of using mbuf ``next`` pointers, synchronous CPU crypto processing uses
+a special structure ``rte_security_vec`` to describe scatter-gather buffers.
+
+.. code-block:: c
+
+    struct rte_security_vec {
+        struct iovec *vec;
+        uint32_t num;
+    };
+
+Where the structure ``rte_security_vec`` is used to store scatter-gather buffer
+pointers, where ``vec`` is the pointer to one buffer and ``num`` indicates the
+number of buffers.
+
+Please note not all crypto devices support scatter-gather buffer processing,
+please check ``cryptodev`` guide for more details.
+
+The API of the synchronous CPU crypto process is
+
+.. code-block:: c
+
+    void
+    rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
+            struct rte_security_session *sess,
+            struct rte_security_vec buf[], void *iv[], void *aad[],
+            void *digest[], int status[], uint32_t num);
+
+This function will process ``num`` number of ``rte_security_vec`` buffers using
+the content stored in ``iv`` and ``aad`` arrays. The API only support in-place
+operation so ``buf`` will be overwritten the encrypted or decrypted values
+when successfully processed. Otherwise the error number of the status array's
+according index.
diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index 8490d897c..6cd21704f 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -56,6 +56,13 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **RTE_SECURITY is added new synchronous Crypto burst API with CPU**
+
+  A new API rte_security_process_cpu_crypto_bulk is introduced in security
+  library to process crypto workload in bulk using CPU instructions. AESNI_MB
+  and AESNI_GCM PMD, as well as unit-test and ipsec-secgw sample applications
+  are updated to support this feature.
+
 
 Removed Items
 -------------
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-09-06  9:01       ` Akhil Goyal
  2019-09-06 13:12         ` Zhang, Roy Fan
@ 2019-09-06 13:27         ` Ananyev, Konstantin
  2019-09-10 10:44           ` Akhil Goyal
  1 sibling, 1 reply; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-09-06 13:27 UTC (permalink / raw)
  To: Akhil Goyal, dev; +Cc: Zhang, Roy Fan, Doherty, Declan, De Lara Guarch, Pablo

Hi Akhil,

> > This action type allows the burst of symmetric crypto workload using the same
> > algorithm, key, and direction being processed by CPU cycles synchronously.
> > This flexible action type does not require external hardware involvement,
> > having the crypto workload processed synchronously, and is more performant
> > than Cryptodev SW PMD due to the saved cycles on removed "async mode
> > simulation" as well as 3 cacheline access of the crypto ops.
> 
> Does that mean application will not call the cryptodev_enqueue_burst and corresponding dequeue burst.

Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)

> It would be a new API something like process_packets and it will have the crypto processed packets while returning from the API?

Yes, though the plan is that API will operate on raw data buffers, not mbufs.

> 
> I still do not understand why we cannot do with the conventional crypto lib only.
> As far as I can understand, you are not doing any protocol processing or any value add
> To the crypto processing. IMO, you just need a synchronous crypto processing API which
> Can be defined in cryptodev, you don't need to re-create a crypto session in the name of
> Security session in the driver just to do a synchronous processing.

I suppose your question is why not to have rte_crypot_process_cpu_crypto_bulk(...) instead?
The main reason is that would require disruptive changes in existing cryptodev API
(would cause ABI/API breakage).
Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need some extra information
that normal crypto_sym_xform doesn't contain 
(cipher offset from the start of the buffer, might be something extra in future).
Also right now there is no way to add new type of crypto_sym_session without
either breaking existing crypto-dev ABI/API or introducing new structure 
(rte_crypto_sym_cpu_session or so) for that.   
While rte_security is designed in a way that we can add new session types and
related parameters without causing API/ABI breakage. 

BTW, what is your concern with proposed approach (via rte_security)?
From my perspective it is a lightweight change and it is totally optional
for the crypto PMDs to support it or not.
Konstantin 

> >
> > AESNI-GCM and AESNI-MB PMDs are updated with this support. There is a small
> > performance test app under app/test/security_aesni_gcm(mb)_perftest to
> > prove.
> >
> > For the new API
> > The packet is sent to the crypto device for symmetric crypto
> > processing. The device will encrypt or decrypt the buffer based on the session
> > data specified and preprocessed in the security session. Different
> > than the inline or lookaside modes, when the function exits, the user will
> > expect the buffers are either processed successfully, or having the error number
> > assigned to the appropriate index of the status array.
> >
> > Will update the program's guide in the v1 patch.
> >
> > Regards,
> > Fan
> >
> > > -----Original Message-----
> > > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > > Sent: Wednesday, September 4, 2019 11:33 AM
> > > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty, Declan
> > > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > > <pablo.de.lara.guarch@intel.com>
> > > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type and
> > > API
> > >
> > > Hi Fan,
> > >
> > > >
> > > > This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > action
> > > > type to security library. The type represents performing crypto
> > > > operation with CPU cycles. The patch also includes a new API to
> > > > process crypto operations in bulk and the function pointers for PMDs.
> > > >
> > > I am not able to get the flow of execution for this action type. Could you
> > > please elaborate the flow in the documentation. If not in documentation
> > > right now, then please elaborate the flow in cover letter.
> > > Also I see that there are new APIs for processing crypto operations in bulk.
> > > What does that mean. How are they different from the existing APIs which
> > > are also handling bulk crypto ops depending on the budget.
> > >
> > >
> > > -Akhil


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process
  2019-09-06 13:13 ` [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process Fan Zhang
                     ` (9 preceding siblings ...)
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 10/10] doc: update security cpu process description Fan Zhang
@ 2019-09-09 12:43   ` Aaron Conole
  2019-10-07 16:28   ` [dpdk-dev] [PATCH v2 " Fan Zhang
  11 siblings, 0 replies; 87+ messages in thread
From: Aaron Conole @ 2019-09-09 12:43 UTC (permalink / raw)
  To: Fan Zhang; +Cc: dev, konstantin.ananyev, declan.doherty, akhil.goyal

Fan Zhang <roy.fan.zhang@intel.com> writes:

> This RFC patch adds a way to rte_security to process symmetric crypto
> workload in bulk synchronously for SW crypto devices.
>
> Originally both SW and HW crypto PMDs works under rte_cryptodev to
> process the crypto workload asynchronously. This way provides uniformity
> to both PMD types but also introduce unnecessary performance penalty to
> SW PMDs such as extra SW ring enqueue/dequeue steps to "simulate"
> asynchronous working manner and unnecessary HW addresses computation.
>
> We introduce a new way for SW crypto devices that perform crypto operation
> synchronously with only fields required for the computation as input.
>
> In rte_security, a new action type "RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO"
> is introduced. This action type allows the burst of symmetric crypto
> workload using the same algorithm, key, and direction being processed by
> CPU cycles synchronously. This flexible action type does not require
> external hardware involvement.
>
> This patch also includes the announcement of a new API
> "rte_security_process_cpu_crypto_bulk". With this API the packet is sent to
> the crypto device for symmetric crypto processing. The device will encrypt
> or decrypt the buffer based on the session data specified and preprocessed
> in the security session. Different than the inline or lookaside modes, when
> the function exits, the user will expect the buffers are either processed
> successfully, or having the error number assigned to the appropriate index
> of the status array.
>
> The proof-of-concept AESNI-GCM and AESNI-MB SW PMDs are updated with the
> support of this new method. To demonstrate the performance gain with
> this method 2 simple performance evaluation apps under unit-test are added
> "app/test: security_aesni_gcm_perftest/security_aesni_mb_perftest". The
> users can freely compare their results against crypto perf application
> results.
>
> In the end, the ipsec library and ipsec-secgw sample application are also
> updated to support this feature. Several test scripts are added to the
> ipsec-secgw test-suite to prove the correctness of the implementation.
>
> Fan Zhang (10):
>   security: introduce CPU Crypto action type and API
>   crypto/aesni_gcm: add rte_security handler
>   app/test: add security cpu crypto autotest
>   app/test: add security cpu crypto perftest
>   crypto/aesni_mb: add rte_security handler
>   app/test: add aesni_mb security cpu crypto autotest
>   app/test: add aesni_mb security cpu crypto perftest
>   ipsec: add rte_security cpu_crypto action support
>   examples/ipsec-secgw: add security cpu_crypto action support
>   doc: update security cpu process description
>

Hi Fan,

This series has problem on aarch64:

   ../app/test/test_security_cpu_crypto.c:626:16: error: implicit declaration of function ‘rte_get_tsc_hz’ [-Werror=implicit-function-declaration]
     uint64_t hz = rte_get_tsc_hz(), time_start, time_now;
                   ^
   ../app/test/test_security_cpu_crypto.c:679:16: error: implicit declaration of function ‘rte_rdtsc’ [-Werror=implicit-function-declaration]
      time_start = rte_rdtsc();
                   ^
   ../app/test/test_security_cpu_crypto.c:711:16: error: implicit declaration of function ‘rte_get_timer_cycles’ [-Werror=implicit-function-declaration]
      time_start = rte_get_timer_cycles();
                   ^

I'm not sure best way to address this in the test - maybe there's a
better API to use for getting the cycles?

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-09-06 13:27         ` Ananyev, Konstantin
@ 2019-09-10 10:44           ` Akhil Goyal
  2019-09-11 12:29             ` Ananyev, Konstantin
  0 siblings, 1 reply; 87+ messages in thread
From: Akhil Goyal @ 2019-09-10 10:44 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev
  Cc: Zhang, Roy Fan, Doherty, Declan, De Lara Guarch, Pablo


Hi Konstantin,
> 
> Hi Akhil,
> 
> > > This action type allows the burst of symmetric crypto workload using the
> same
> > > algorithm, key, and direction being processed by CPU cycles synchronously.
> > > This flexible action type does not require external hardware involvement,
> > > having the crypto workload processed synchronously, and is more
> performant
> > > than Cryptodev SW PMD due to the saved cycles on removed "async mode
> > > simulation" as well as 3 cacheline access of the crypto ops.
> >
> > Does that mean application will not call the cryptodev_enqueue_burst and
> corresponding dequeue burst.
> 
> Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> 
> > It would be a new API something like process_packets and it will have the
> crypto processed packets while returning from the API?
> 
> Yes, though the plan is that API will operate on raw data buffers, not mbufs.
> 
> >
> > I still do not understand why we cannot do with the conventional crypto lib
> only.
> > As far as I can understand, you are not doing any protocol processing or any
> value add
> > To the crypto processing. IMO, you just need a synchronous crypto processing
> API which
> > Can be defined in cryptodev, you don't need to re-create a crypto session in
> the name of
> > Security session in the driver just to do a synchronous processing.
> 
> I suppose your question is why not to have
> rte_crypot_process_cpu_crypto_bulk(...) instead?
> The main reason is that would require disruptive changes in existing cryptodev
> API
> (would cause ABI/API breakage).
> Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need some extra
> information
> that normal crypto_sym_xform doesn't contain
> (cipher offset from the start of the buffer, might be something extra in future).

Cipher offset will be part of rte_crypto_op. If you intend not to use rte_crypto_op
You can pass this as an argument in the new cryptodev API.
Something extra will also cause ABI breakage in security as well.
So it will be same.

> Also right now there is no way to add new type of crypto_sym_session without
> either breaking existing crypto-dev ABI/API or introducing new structure
> (rte_crypto_sym_cpu_session or so) for that.

What extra info is required in rte_cryptodev_sym_session to get the rte_crypto_sym_cpu_session.
I don't think there is any.
I believe the same crypto session will be able to work synchronously as well. We would only need
a new API to perform synchronous actions. That will reduce the duplication code significantly
in the driver to support 2 different kind of APIs with similar code inside. 
Please correct me in case I am missing something.


> While rte_security is designed in a way that we can add new session types and
> related parameters without causing API/ABI breakage.

Yes the intent is to add new sessions based on various protocols that can be supported by the driver.
It is not that we should find it as an alternative to cryptodev and using it just because it will not cause
ABI/API breakage. IMO the code should be placed where its intent is.

> 
> BTW, what is your concern with proposed approach (via rte_security)?
> From my perspective it is a lightweight change and it is totally optional
> for the crypto PMDs to support it or not.
> Konstantin
> 
> > >
> > > AESNI-GCM and AESNI-MB PMDs are updated with this support. There is a
> small
> > > performance test app under app/test/security_aesni_gcm(mb)_perftest to
> > > prove.
> > >
> > > For the new API
> > > The packet is sent to the crypto device for symmetric crypto
> > > processing. The device will encrypt or decrypt the buffer based on the
> session
> > > data specified and preprocessed in the security session. Different
> > > than the inline or lookaside modes, when the function exits, the user will
> > > expect the buffers are either processed successfully, or having the error
> number
> > > assigned to the appropriate index of the status array.
> > >
> > > Will update the program's guide in the v1 patch.
> > >
> > > Regards,
> > > Fan
> > >
> > > > -----Original Message-----
> > > > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > > > Sent: Wednesday, September 4, 2019 11:33 AM
> > > > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty,
> Declan
> > > > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > > > <pablo.de.lara.guarch@intel.com>
> > > > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type
> and
> > > > API
> > > >
> > > > Hi Fan,
> > > >
> > > > >
> > > > > This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > > action
> > > > > type to security library. The type represents performing crypto
> > > > > operation with CPU cycles. The patch also includes a new API to
> > > > > process crypto operations in bulk and the function pointers for PMDs.
> > > > >
> > > > I am not able to get the flow of execution for this action type. Could you
> > > > please elaborate the flow in the documentation. If not in documentation
> > > > right now, then please elaborate the flow in cover letter.
> > > > Also I see that there are new APIs for processing crypto operations in bulk.
> > > > What does that mean. How are they different from the existing APIs which
> > > > are also handling bulk crypto ops depending on the budget.
> > > >
> > > >
> > > > -Akhil


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-09-06 13:12         ` Zhang, Roy Fan
@ 2019-09-10 11:25           ` Akhil Goyal
  2019-09-11 13:01             ` Ananyev, Konstantin
  0 siblings, 1 reply; 87+ messages in thread
From: Akhil Goyal @ 2019-09-10 11:25 UTC (permalink / raw)
  To: Zhang, Roy Fan, dev
  Cc: Ananyev, Konstantin, Doherty, Declan, De Lara Guarch, Pablo

Hi Fan,
> 
> Hi Akhil,
> 
> You are right, the new API will process the crypto workload, no heavy enqueue
> Dequeue operations required.
> 
> Cryptodev tends to support multiple crypto devices, including HW and SW.
> The 3-cache line access, iova address computation and assignment, simulation
> of async enqueue/dequeue operations, allocate and free crypto ops, even the
> mbuf linked-list for scatter-gather buffers are too heavy for SW crypto PMDs.

Why cant we have a cryptodev synchronous API which work on plain bufs as your suggested
API and use the same crypto sym_session creation logic as it was before? It will perform
same as it is doing in this series.

> 
> To create this new synchronous API in cryptodev cannot avoid the problem
> listed above:  first the API shall not serve only to part of the crypto (SW) PMDs -
> as you know, it is Cryptodev. The users can expect some PMD only support part
> of the overall algorithms, but not the workload processing API.

Why cant we have an optional data path in cryptodev for synchronous behavior if the
underlying PMD support it. It depends on the PMD to decide whether it can have it supported or not.
Only a feature flag will be needed to decide that.
One more option could be a PMD API which the application can directly call if the
mode is only supported in very few PMDs. This could be a backup if there is a 
requirement of deprecation notice etc.

> 
> Another reason is, there is assumption made, first when creating a crypto op
> we have to allocate the memory to hold crypto op + sym op + iv, - we cannot
> simply declare an array of crypto ops in the run-time and discard it when
> processing
> is done. Also we need to fill aad and digest HW address, which is not required for
> SW at all.

We are defining a new API which may have its own parameters and requirements which
Need to be fulfilled. In case it was a rte_security API, then also you are defining a new way
Of packet execution and API params. So it would be same.
You can reduce the cache line accesses as you need in the new API.
The session logic need not be changed from crypto session to security session.
Only the data patch need to be altered as per the new API.

> 
> Bottom line: using crypto op will still have 3 cache-line access performance
> problem.
> 
> So if we to create the new API in Cryptodev instead of rte_security, we need to
> create new crypto op structure only for the SW PMDs, carefully document them
> to not confuse with existing cryptodev APIs, make new device feature flags to
> indicate the API is not supported by some PMDs, and again carefully document
> them of these device feature flags.

The explanation of the new API will also happen in case it is a security API. Instead you need
to add more explanation for session also which is already there in cryptodev.

> 
> So, to push these changes to rte_security instead the above problem can be
> resolved,
> and the performance improvement because of this change is big for smaller
> packets
> - I attached a performance test app in the patchset.

I believe there wont be any perf gap in case the optimized new cryptodev API is used.

> 
> For rte_security, we already have inline-crypto type that works quite close to the
> this
> new API, the only difference is that it is processed by the CPU cycles. As you may
> have already seen the ipsec-library has wrapped these changes, and ipsec-secgw
> has only minimum updates to adopt this change too. So to the end user, if they
> use IPSec this patchset can seamlessly enabled with just commandline update
> when
> creating an SA.

In the IPSec application I do not see the changes wrt the new execution API.
So the data path is not getting handled there. It looks incomplete. The user experience
to use the new API will definitely be changed.

So I believe this patchset is not required in rte_security, we can have it in cryptodev unless
I have missed something.

> 
> Regards,
> Fan
> 
> 
> > -----Original Message-----
> > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > Sent: Friday, September 6, 2019 10:01 AM
> > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty, Declan
> > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > <pablo.de.lara.guarch@intel.com>
> > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type and
> > API
> >
> >
> > Hi Fan,
> > >
> > > Hi Akhil,
> > >
> > > This action type allows the burst of symmetric crypto workload using
> > > the same algorithm, key, and direction being processed by CPU cycles
> > synchronously.
> > > This flexible action type does not require external hardware
> > > involvement, having the crypto workload processed synchronously, and
> > > is more performant than Cryptodev SW PMD due to the saved cycles on
> > > removed "async mode simulation" as well as 3 cacheline access of the
> > crypto ops.
> >
> > Does that mean application will not call the cryptodev_enqueue_burst and
> > corresponding dequeue burst.
> > It would be a new API something like process_packets and it will have the
> > crypto processed packets while returning from the API?
> >
> > I still do not understand why we cannot do with the conventional crypto lib
> > only.
> > As far as I can understand, you are not doing any protocol processing or any
> > value add To the crypto processing. IMO, you just need a synchronous crypto
> > processing API which Can be defined in cryptodev, you don't need to re-
> > create a crypto session in the name of Security session in the driver just to do
> > a synchronous processing.
> >
> > >
> > > AESNI-GCM and AESNI-MB PMDs are updated with this support. There is a
> > > small performance test app under
> > > app/test/security_aesni_gcm(mb)_perftest to prove.
> > >
> > > For the new API
> > > The packet is sent to the crypto device for symmetric crypto
> > > processing. The device will encrypt or decrypt the buffer based on the
> > > session data specified and preprocessed in the security session.
> > > Different than the inline or lookaside modes, when the function exits,
> > > the user will expect the buffers are either processed successfully, or
> > > having the error number assigned to the appropriate index of the status
> > array.
> > >
> > > Will update the program's guide in the v1 patch.
> > >
> > > Regards,
> > > Fan
> > >
> > > > -----Original Message-----
> > > > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > > > Sent: Wednesday, September 4, 2019 11:33 AM
> > > > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty,
> > > > Declan <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > > > <pablo.de.lara.guarch@intel.com>
> > > > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action
> > > > type and API
> > > >
> > > > Hi Fan,
> > > >
> > > > >
> > > > > This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > > action
> > > > > type to security library. The type represents performing crypto
> > > > > operation with CPU cycles. The patch also includes a new API to
> > > > > process crypto operations in bulk and the function pointers for PMDs.
> > > > >
> > > > I am not able to get the flow of execution for this action type.
> > > > Could you please elaborate the flow in the documentation. If not in
> > > > documentation right now, then please elaborate the flow in cover letter.
> > > > Also I see that there are new APIs for processing crypto operations in
> > bulk.
> > > > What does that mean. How are they different from the existing APIs
> > > > which are also handling bulk crypto ops depending on the budget.
> > > >
> > > >
> > > > -Akhil


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-09-10 10:44           ` Akhil Goyal
@ 2019-09-11 12:29             ` Ananyev, Konstantin
  2019-09-12 14:12               ` Akhil Goyal
  0 siblings, 1 reply; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-09-11 12:29 UTC (permalink / raw)
  To: Akhil Goyal, dev; +Cc: Zhang, Roy Fan, Doherty, Declan, De Lara Guarch, Pablo



Hi Akhil,
> >
> > > > This action type allows the burst of symmetric crypto workload using the
> > same
> > > > algorithm, key, and direction being processed by CPU cycles synchronously.
> > > > This flexible action type does not require external hardware involvement,
> > > > having the crypto workload processed synchronously, and is more
> > performant
> > > > than Cryptodev SW PMD due to the saved cycles on removed "async mode
> > > > simulation" as well as 3 cacheline access of the crypto ops.
> > >
> > > Does that mean application will not call the cryptodev_enqueue_burst and
> > corresponding dequeue burst.
> >
> > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> >
> > > It would be a new API something like process_packets and it will have the
> > crypto processed packets while returning from the API?
> >
> > Yes, though the plan is that API will operate on raw data buffers, not mbufs.
> >
> > >
> > > I still do not understand why we cannot do with the conventional crypto lib
> > only.
> > > As far as I can understand, you are not doing any protocol processing or any
> > value add
> > > To the crypto processing. IMO, you just need a synchronous crypto processing
> > API which
> > > Can be defined in cryptodev, you don't need to re-create a crypto session in
> > the name of
> > > Security session in the driver just to do a synchronous processing.
> >
> > I suppose your question is why not to have
> > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > The main reason is that would require disruptive changes in existing cryptodev
> > API
> > (would cause ABI/API breakage).
> > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need some extra
> > information
> > that normal crypto_sym_xform doesn't contain
> > (cipher offset from the start of the buffer, might be something extra in future).
> 
> Cipher offset will be part of rte_crypto_op.

fill/read (+ alloc/free) is one of the main things that slowdown current crypto-op approach.
That's why the general idea - have all data that wouldn't change from packet to packet
included into the session and setup it once at session_init().

> If you intend not to use rte_crypto_op
> You can pass this as an argument in the new cryptodev API.

You mean extra parameter in rte_security_process_cpu_crypto_bulk()?
It can be in theory, but that solution looks a bit ugly:
	why to pass for each call something that would be constant per session?
	Again having that value constant per session might allow some extra optimisations
	That would be hard to achieve for dynamic case. 
and not extendable:
Suppose tomorrow will need to add something extra (some new algorithm support or so).
With what you proposing will need to new parameter to the function,
which means API breakage. 

> Something extra will also cause ABI breakage in security as well.
> So it will be same.

I don't think it would.
AFAIK, right now this patch doesn't introduce any API/ABI breakage.
Iinside struct rte_security_session_conf we have a union of xforms
depending on session type.
So as long as cpu_crypto_xform wouldn't exceed sizes of other xform -
I believe no ABI breakage will appear.


> 
> > Also right now there is no way to add new type of crypto_sym_session without
> > either breaking existing crypto-dev ABI/API or introducing new structure
> > (rte_crypto_sym_cpu_session or so) for that.
> 
> What extra info is required in rte_cryptodev_sym_session to get the rte_crypto_sym_cpu_session.

Right now - just cipher_offset (see above).
What else in future (if any) - don't know.

> I don't think there is any.
> I believe the same crypto session will be able to work synchronously as well.

Exactly the same - problematically, see above.

> We would only need  a new API to perform synchronous actions.
> That will reduce the duplication code significantly
> in the driver to support 2 different kind of APIs with similar code inside.
> Please correct me in case I am missing something.

To add new API into crypto-dev would also require changes in the PMD,
it wouldn't come totally free and I believe would require roughly the same amount of changes. 

> 
> 
> > While rte_security is designed in a way that we can add new session types and
> > related parameters without causing API/ABI breakage.
> 
> Yes the intent is to add new sessions based on various protocols that can be supported by the driver.

Various protocols and different types of sessions (and devices they belong to).
Let say right now we have INLINE_CRYPTO, INLINE_PROTO, LOOKASIDE_PROTO, etc.
Here we introduce new type of session.

> It is not that we should find it as an alternative to cryptodev and using it just because it will not cause
> ABI/API breakage.

I am considering this new API as an alternative to existing ones, but as an extension.
Existing crypto-op API has its own advantages (generic), and I think we should keep it supported by all crypto-devs. 
From other side rte_security is an extendable framework that suits the purpose:
allows easily (and yes without ABI breakage) introduce new API for special type of crypto-dev (SW based).


 


> IMO the code should be placed where its intent is.
> 
> >
> > BTW, what is your concern with proposed approach (via rte_security)?
> > From my perspective it is a lightweight change and it is totally optional
> > for the crypto PMDs to support it or not.
> > Konstantin
> >
> > > >
> > > > AESNI-GCM and AESNI-MB PMDs are updated with this support. There is a
> > small
> > > > performance test app under app/test/security_aesni_gcm(mb)_perftest to
> > > > prove.
> > > >
> > > > For the new API
> > > > The packet is sent to the crypto device for symmetric crypto
> > > > processing. The device will encrypt or decrypt the buffer based on the
> > session
> > > > data specified and preprocessed in the security session. Different
> > > > than the inline or lookaside modes, when the function exits, the user will
> > > > expect the buffers are either processed successfully, or having the error
> > number
> > > > assigned to the appropriate index of the status array.
> > > >
> > > > Will update the program's guide in the v1 patch.
> > > >
> > > > Regards,
> > > > Fan
> > > >
> > > > > -----Original Message-----
> > > > > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > > > > Sent: Wednesday, September 4, 2019 11:33 AM
> > > > > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > > > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty,
> > Declan
> > > > > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > > > > <pablo.de.lara.guarch@intel.com>
> > > > > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type
> > and
> > > > > API
> > > > >
> > > > > Hi Fan,
> > > > >
> > > > > >
> > > > > > This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > > > action
> > > > > > type to security library. The type represents performing crypto
> > > > > > operation with CPU cycles. The patch also includes a new API to
> > > > > > process crypto operations in bulk and the function pointers for PMDs.
> > > > > >
> > > > > I am not able to get the flow of execution for this action type. Could you
> > > > > please elaborate the flow in the documentation. If not in documentation
> > > > > right now, then please elaborate the flow in cover letter.
> > > > > Also I see that there are new APIs for processing crypto operations in bulk.
> > > > > What does that mean. How are they different from the existing APIs which
> > > > > are also handling bulk crypto ops depending on the budget.
> > > > >
> > > > >
> > > > > -Akhil


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-09-10 11:25           ` Akhil Goyal
@ 2019-09-11 13:01             ` Ananyev, Konstantin
  0 siblings, 0 replies; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-09-11 13:01 UTC (permalink / raw)
  To: Akhil Goyal, Zhang, Roy Fan, dev; +Cc: Doherty, Declan, De Lara Guarch, Pablo


Hi lads,
> >
> > You are right, the new API will process the crypto workload, no heavy enqueue
> > Dequeue operations required.
> >
> > Cryptodev tends to support multiple crypto devices, including HW and SW.
> > The 3-cache line access, iova address computation and assignment, simulation
> > of async enqueue/dequeue operations, allocate and free crypto ops, even the
> > mbuf linked-list for scatter-gather buffers are too heavy for SW crypto PMDs.
> 
> Why cant we have a cryptodev synchronous API which work on plain bufs as your suggested
> API and use the same crypto sym_session creation logic as it was before? It will perform
> same as it is doing in this series.

I tried to summarize our reasons in another mail in that thread.

> 
> >
> > To create this new synchronous API in cryptodev cannot avoid the problem
> > listed above:  first the API shall not serve only to part of the crypto (SW) PMDs -
> > as you know, it is Cryptodev. The users can expect some PMD only support part
> > of the overall algorithms, but not the workload processing API.
> 
> Why cant we have an optional data path in cryptodev for synchronous behavior if the
> underlying PMD support it. It depends on the PMD to decide whether it can have it supported or not.
> Only a feature flag will be needed to decide that.
> One more option could be a PMD API which the application can directly call if the
> mode is only supported in very few PMDs. This could be a backup if there is a
> requirement of deprecation notice etc.
> 
> >
> > Another reason is, there is assumption made, first when creating a crypto op
> > we have to allocate the memory to hold crypto op + sym op + iv, - we cannot
> > simply declare an array of crypto ops in the run-time and discard it when
> > processing
> > is done. Also we need to fill aad and digest HW address, which is not required for
> > SW at all.
> 
> We are defining a new API which may have its own parameters and requirements which
> Need to be fulfilled. In case it was a rte_security API, then also you are defining a new way
> Of packet execution and API params. So it would be same.
> You can reduce the cache line accesses as you need in the new API.
> The session logic need not be changed from crypto session to security session.
> Only the data patch need to be altered as per the new API.
> 
> >
> > Bottom line: using crypto op will still have 3 cache-line access performance
> > problem.
> >
> > So if we to create the new API in Cryptodev instead of rte_security, we need to
> > create new crypto op structure only for the SW PMDs, carefully document them
> > to not confuse with existing cryptodev APIs, make new device feature flags to
> > indicate the API is not supported by some PMDs, and again carefully document
> > them of these device feature flags.
> 
> The explanation of the new API will also happen in case it is a security API. Instead you need
> to add more explanation for session also which is already there in cryptodev.
> 
> >
> > So, to push these changes to rte_security instead the above problem can be
> > resolved,
> > and the performance improvement because of this change is big for smaller
> > packets
> > - I attached a performance test app in the patchset.
> 
> I believe there wont be any perf gap in case the optimized new cryptodev API is used.
> 
> >
> > For rte_security, we already have inline-crypto type that works quite close to the
> > this
> > new API, the only difference is that it is processed by the CPU cycles. As you may
> > have already seen the ipsec-library has wrapped these changes, and ipsec-secgw
> > has only minimum updates to adopt this change too. So to the end user, if they
> > use IPSec this patchset can seamlessly enabled with just commandline update
> > when
> > creating an SA.
> 
> In the IPSec application I do not see the changes wrt the new execution API.
> So the data path is not getting handled there. It looks incomplete. The user experience
> to use the new API will definitely be changed.

I believe we do support it for libtre_ipsec mode.
librte_ipsec hides all processing complexity inside and
does call rte_security_process_cpu_crypto_bulk() internally.
That's why for librte_ipsec it is literally 2 lines change:
--- a/examples/ipsec-secgw/ipsec_process.c
+++ b/examples/ipsec-secgw/ipsec_process.c
@@ -101,7 +101,8 @@  fill_ipsec_session(struct rte_ipsec_session *ss, struct ipsec_ctx *ctx,
 		}
 		ss->crypto.ses = sa->crypto_session;
 	/* setup session action type */
-	} else if (sa->type == RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL) {
+	} else if (sa->type == RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL ||
+			sa->type == RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO) {
 		if (sa->sec_session == NULL) {
 			rc = create_lookaside_session(ctx, sa);
 			if (rc != 0)
@@ -227,8 +228,8 @@  ipsec_process(struct ipsec_ctx *ctx, struct ipsec_traffic *trf)
 
 		/* process packets inline */
 		else if (sa->type == RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO ||
-				sa->type ==
-				RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL) {
+			sa->type == RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL ||
+			sa->type == RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO) {
 
 			satp = rte_ipsec_sa_type(ips->sa);





^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-09-11 12:29             ` Ananyev, Konstantin
@ 2019-09-12 14:12               ` Akhil Goyal
  2019-09-16 14:53                 ` Ananyev, Konstantin
  0 siblings, 1 reply; 87+ messages in thread
From: Akhil Goyal @ 2019-09-12 14:12 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev, De Lara Guarch, Pablo, Thomas Monjalon
  Cc: Zhang, Roy Fan, Doherty, Declan, Anoob Joseph

Hi Konstantin,

> Hi Akhil,
> > >
> > > > > This action type allows the burst of symmetric crypto workload using the
> > > same
> > > > > algorithm, key, and direction being processed by CPU cycles
> synchronously.
> > > > > This flexible action type does not require external hardware involvement,
> > > > > having the crypto workload processed synchronously, and is more
> > > performant
> > > > > than Cryptodev SW PMD due to the saved cycles on removed "async
> mode
> > > > > simulation" as well as 3 cacheline access of the crypto ops.
> > > >
> > > > Does that mean application will not call the cryptodev_enqueue_burst and
> > > corresponding dequeue burst.
> > >
> > > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> > >
> > > > It would be a new API something like process_packets and it will have the
> > > crypto processed packets while returning from the API?
> > >
> > > Yes, though the plan is that API will operate on raw data buffers, not mbufs.
> > >
> > > >
> > > > I still do not understand why we cannot do with the conventional crypto lib
> > > only.
> > > > As far as I can understand, you are not doing any protocol processing or
> any
> > > value add
> > > > To the crypto processing. IMO, you just need a synchronous crypto
> processing
> > > API which
> > > > Can be defined in cryptodev, you don't need to re-create a crypto session
> in
> > > the name of
> > > > Security session in the driver just to do a synchronous processing.
> > >
> > > I suppose your question is why not to have
> > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > The main reason is that would require disruptive changes in existing
> cryptodev
> > > API
> > > (would cause ABI/API breakage).
> > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need some extra
> > > information
> > > that normal crypto_sym_xform doesn't contain
> > > (cipher offset from the start of the buffer, might be something extra in
> future).
> >
> > Cipher offset will be part of rte_crypto_op.
> 
> fill/read (+ alloc/free) is one of the main things that slowdown current crypto-op
> approach.
> That's why the general idea - have all data that wouldn't change from packet to
> packet
> included into the session and setup it once at session_init().

I agree that you cannot use crypto-op.
You can have the new API in crypto.
As per the current patch, you only need cipher_offset which you can have it as a parameter until
You get it approved in the crypto xform. I believe it will be beneficial in case of other crypto cases as well.
We can have cipher offset at both places(crypto-op and cipher_xform). It will give flexibility to the user to
override it.


> 
> > If you intend not to use rte_crypto_op
> > You can pass this as an argument in the new cryptodev API.
> 
> You mean extra parameter in rte_security_process_cpu_crypto_bulk()?
> It can be in theory, but that solution looks a bit ugly:
> 	why to pass for each call something that would be constant per session?
> 	Again having that value constant per session might allow some extra
> optimisations
> 	That would be hard to achieve for dynamic case.
> and not extendable:
> Suppose tomorrow will need to add something extra (some new algorithm
> support or so).
> With what you proposing will need to new parameter to the function,
> which means API breakage.
> 
> > Something extra will also cause ABI breakage in security as well.
> > So it will be same.
> 
> I don't think it would.
> AFAIK, right now this patch doesn't introduce any API/ABI breakage.
> Iinside struct rte_security_session_conf we have a union of xforms
> depending on session type.
> So as long as cpu_crypto_xform wouldn't exceed sizes of other xform -
> I believe no ABI breakage will appear.
Agreed, it will not break ABI in case of security till we do not exceed current size.

Saving an ABI/API breakage is more important or placing the code at the correct place.
We need to find a tradeoff. Others can comment on this.
@Thomas Monjalon, @De Lara Guarch, Pablo Any comments?

> 
> 
> >
> > > Also right now there is no way to add new type of crypto_sym_session
> without
> > > either breaking existing crypto-dev ABI/API or introducing new structure
> > > (rte_crypto_sym_cpu_session or so) for that.
> >
> > What extra info is required in rte_cryptodev_sym_session to get the
> rte_crypto_sym_cpu_session.
> 
> Right now - just cipher_offset (see above).
> What else in future (if any) - don't know.
> 
> > I don't think there is any.
> > I believe the same crypto session will be able to work synchronously as well.
> 
> Exactly the same - problematically, see above.
> 
> > We would only need  a new API to perform synchronous actions.
> > That will reduce the duplication code significantly
> > in the driver to support 2 different kind of APIs with similar code inside.
> > Please correct me in case I am missing something.
> 
> To add new API into crypto-dev would also require changes in the PMD,
> it wouldn't come totally free and I believe would require roughly the same
> amount of changes.

It will be required only in the PMDs which support it and would be minimal.
You would need a feature flag, support  for that synchronous API. Session information will
already be there in the session. The changes wrt cipher_offset need to be added
but with some default value to identify override will be done or not.

> 
> >
> >
> > > While rte_security is designed in a way that we can add new session types
> and
> > > related parameters without causing API/ABI breakage.
> >
> > Yes the intent is to add new sessions based on various protocols that can be
> supported by the driver.
> 
> Various protocols and different types of sessions (and devices they belong to).
> Let say right now we have INLINE_CRYPTO, INLINE_PROTO, LOOKASIDE_PROTO,
> etc.
> Here we introduce new type of session.

What is the new value add to the existing sessions. The changes that we are doing
here is just to avoid an API/ABI breakage. The synchronous processing can happen on both
crypto and security session. This would mean, only the processing API should be defined,
rest all should be already there in the sessions.
In All other cases, INLINE - eth device was not having any format to perform crypto op
LOOKASIDE - PROTO - add protocol specific sessions which is not available in crypto.

> 
> > It is not that we should find it as an alternative to cryptodev and using it just
> because it will not cause
> > ABI/API breakage.
> 
> I am considering this new API as an alternative to existing ones, but as an
> extension.
> Existing crypto-op API has its own advantages (generic), and I think we should
> keep it supported by all crypto-devs.
> From other side rte_security is an extendable framework that suits the purpose:
> allows easily (and yes without ABI breakage) introduce new API for special type
> of crypto-dev (SW based).
> 
> 

Adding a synchronous processing API is understandable and can be added in both
Crypto as well as Security, but a new action type for it is not required.
Now whether to support that, we have ABI/API breakage, that is a different issue.
And we may have to deal with it if no other option is there.

> 
> 
> 
> > IMO the code should be placed where its intent is.
> >
> > >
> > > BTW, what is your concern with proposed approach (via rte_security)?
> > > From my perspective it is a lightweight change and it is totally optional
> > > for the crypto PMDs to support it or not.
> > > Konstantin
> > >
> > > > >
> > > > > AESNI-GCM and AESNI-MB PMDs are updated with this support. There is
> a
> > > small
> > > > > performance test app under app/test/security_aesni_gcm(mb)_perftest
> to
> > > > > prove.
> > > > >
> > > > > For the new API
> > > > > The packet is sent to the crypto device for symmetric crypto
> > > > > processing. The device will encrypt or decrypt the buffer based on the
> > > session
> > > > > data specified and preprocessed in the security session. Different
> > > > > than the inline or lookaside modes, when the function exits, the user will
> > > > > expect the buffers are either processed successfully, or having the error
> > > number
> > > > > assigned to the appropriate index of the status array.
> > > > >
> > > > > Will update the program's guide in the v1 patch.
> > > > >
> > > > > Regards,
> > > > > Fan
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > > > > > Sent: Wednesday, September 4, 2019 11:33 AM
> > > > > > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > > > > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty,
> > > Declan
> > > > > > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > > > > > <pablo.de.lara.guarch@intel.com>
> > > > > > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action
> type
> > > and
> > > > > > API
> > > > > >
> > > > > > Hi Fan,
> > > > > >
> > > > > > >
> > > > > > > This patch introduce new
> RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > > > > action
> > > > > > > type to security library. The type represents performing crypto
> > > > > > > operation with CPU cycles. The patch also includes a new API to
> > > > > > > process crypto operations in bulk and the function pointers for PMDs.
> > > > > > >
> > > > > > I am not able to get the flow of execution for this action type. Could
> you
> > > > > > please elaborate the flow in the documentation. If not in
> documentation
> > > > > > right now, then please elaborate the flow in cover letter.
> > > > > > Also I see that there are new APIs for processing crypto operations in
> bulk.
> > > > > > What does that mean. How are they different from the existing APIs
> which
> > > > > > are also handling bulk crypto ops depending on the budget.
> > > > > >
> > > > > >
> > > > > > -Akhil


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-09-12 14:12               ` Akhil Goyal
@ 2019-09-16 14:53                 ` Ananyev, Konstantin
  2019-09-16 15:08                   ` Ananyev, Konstantin
  2019-09-17  6:02                   ` Akhil Goyal
  0 siblings, 2 replies; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-09-16 14:53 UTC (permalink / raw)
  To: Akhil Goyal, dev, De Lara Guarch, Pablo, Thomas Monjalon
  Cc: Zhang, Roy Fan, Doherty, Declan, Anoob Joseph

Hi Akhil,

> > > > > > This action type allows the burst of symmetric crypto workload using the
> > > > same
> > > > > > algorithm, key, and direction being processed by CPU cycles
> > synchronously.
> > > > > > This flexible action type does not require external hardware involvement,
> > > > > > having the crypto workload processed synchronously, and is more
> > > > performant
> > > > > > than Cryptodev SW PMD due to the saved cycles on removed "async
> > mode
> > > > > > simulation" as well as 3 cacheline access of the crypto ops.
> > > > >
> > > > > Does that mean application will not call the cryptodev_enqueue_burst and
> > > > corresponding dequeue burst.
> > > >
> > > > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> > > >
> > > > > It would be a new API something like process_packets and it will have the
> > > > crypto processed packets while returning from the API?
> > > >
> > > > Yes, though the plan is that API will operate on raw data buffers, not mbufs.
> > > >
> > > > >
> > > > > I still do not understand why we cannot do with the conventional crypto lib
> > > > only.
> > > > > As far as I can understand, you are not doing any protocol processing or
> > any
> > > > value add
> > > > > To the crypto processing. IMO, you just need a synchronous crypto
> > processing
> > > > API which
> > > > > Can be defined in cryptodev, you don't need to re-create a crypto session
> > in
> > > > the name of
> > > > > Security session in the driver just to do a synchronous processing.
> > > >
> > > > I suppose your question is why not to have
> > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > The main reason is that would require disruptive changes in existing
> > cryptodev
> > > > API
> > > > (would cause ABI/API breakage).
> > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need some extra
> > > > information
> > > > that normal crypto_sym_xform doesn't contain
> > > > (cipher offset from the start of the buffer, might be something extra in
> > future).
> > >
> > > Cipher offset will be part of rte_crypto_op.
> >
> > fill/read (+ alloc/free) is one of the main things that slowdown current crypto-op
> > approach.
> > That's why the general idea - have all data that wouldn't change from packet to
> > packet
> > included into the session and setup it once at session_init().
> 
> I agree that you cannot use crypto-op.
> You can have the new API in crypto.
> As per the current patch, you only need cipher_offset which you can have it as a parameter until
> You get it approved in the crypto xform. I believe it will be beneficial in case of other crypto cases as well.
> We can have cipher offset at both places(crypto-op and cipher_xform). It will give flexibility to the user to
> override it.

After having another thought on your proposal: 
Probably we can introduce new rte_crypto_sym_xform_types for CPU related stuff here?
Let say we can have :
num rte_crypto_sym_xform_type {
        RTE_CRYPTO_SYM_XFORM_NOT_SPECIFIED = 0, /**< No xform specified */
        RTE_CRYPTO_SYM_XFORM_AUTH,              /**< Authentication xform */
        RTE_CRYPTO_SYM_XFORM_CIPHER,            /**< Cipher xform  */
        RTE_CRYPTO_SYM_XFORM_AEAD               /**< AEAD xform  */
+     RTE_CRYPTO_SYM_XFORM_CPU = INT32_MIN,
+    RTE_CRYPTO_SYM_XFORM_CPU_AEAD = (RTE_CRYPTO_SYM_XFORM_CPU | RTE_CRYPTO_SYM_XFORM_CPU),
      /* same for auth and crypto xforms */
};

Then we either can re-define some values in struct rte_crypto_aead_xform (via unions),
or even have new  struct rte_crypto_cpu_aead_xform (same for crypto and auth xforms).
Then if PMD wants to support new sync API it would need to recognize new xform types
and internally  it might end up with different session structure (one for sync, another for async mode).
That I think should allow us to introduce cpu_crypto as part of crypto-dev API without ABI breakage.
What do you think?
Konstantin 
 
> 
> >
> > > If you intend not to use rte_crypto_op
> > > You can pass this as an argument in the new cryptodev API.
> >
> > You mean extra parameter in rte_security_process_cpu_crypto_bulk()?
> > It can be in theory, but that solution looks a bit ugly:
> > 	why to pass for each call something that would be constant per session?
> > 	Again having that value constant per session might allow some extra
> > optimisations
> > 	That would be hard to achieve for dynamic case.
> > and not extendable:
> > Suppose tomorrow will need to add something extra (some new algorithm
> > support or so).
> > With what you proposing will need to new parameter to the function,
> > which means API breakage.
> >
> > > Something extra will also cause ABI breakage in security as well.
> > > So it will be same.
> >
> > I don't think it would.
> > AFAIK, right now this patch doesn't introduce any API/ABI breakage.
> > Iinside struct rte_security_session_conf we have a union of xforms
> > depending on session type.
> > So as long as cpu_crypto_xform wouldn't exceed sizes of other xform -
> > I believe no ABI breakage will appear.
> Agreed, it will not break ABI in case of security till we do not exceed current size.
> 
> Saving an ABI/API breakage is more important or placing the code at the correct place.
> We need to find a tradeoff. Others can comment on this.
> @Thomas Monjalon, @De Lara Guarch, Pablo Any comments?
> 
> >
> >
> > >
> > > > Also right now there is no way to add new type of crypto_sym_session
> > without
> > > > either breaking existing crypto-dev ABI/API or introducing new structure
> > > > (rte_crypto_sym_cpu_session or so) for that.
> > >
> > > What extra info is required in rte_cryptodev_sym_session to get the
> > rte_crypto_sym_cpu_session.
> >
> > Right now - just cipher_offset (see above).
> > What else in future (if any) - don't know.
> >
> > > I don't think there is any.
> > > I believe the same crypto session will be able to work synchronously as well.
> >
> > Exactly the same - problematically, see above.
> >
> > > We would only need  a new API to perform synchronous actions.
> > > That will reduce the duplication code significantly
> > > in the driver to support 2 different kind of APIs with similar code inside.
> > > Please correct me in case I am missing something.
> >
> > To add new API into crypto-dev would also require changes in the PMD,
> > it wouldn't come totally free and I believe would require roughly the same
> > amount of changes.
> 
> It will be required only in the PMDs which support it and would be minimal.
> You would need a feature flag, support  for that synchronous API. Session information will
> already be there in the session. The changes wrt cipher_offset need to be added
> but with some default value to identify override will be done or not.
> 
> >
> > >
> > >
> > > > While rte_security is designed in a way that we can add new session types
> > and
> > > > related parameters without causing API/ABI breakage.
> > >
> > > Yes the intent is to add new sessions based on various protocols that can be
> > supported by the driver.
> >
> > Various protocols and different types of sessions (and devices they belong to).
> > Let say right now we have INLINE_CRYPTO, INLINE_PROTO, LOOKASIDE_PROTO,
> > etc.
> > Here we introduce new type of session.
> 
> What is the new value add to the existing sessions. The changes that we are doing
> here is just to avoid an API/ABI breakage. The synchronous processing can happen on both
> crypto and security session. This would mean, only the processing API should be defined,
> rest all should be already there in the sessions.
> In All other cases, INLINE - eth device was not having any format to perform crypto op
> LOOKASIDE - PROTO - add protocol specific sessions which is not available in crypto.
> 
> >
> > > It is not that we should find it as an alternative to cryptodev and using it just
> > because it will not cause
> > > ABI/API breakage.
> >
> > I am considering this new API as an alternative to existing ones, but as an
> > extension.
> > Existing crypto-op API has its own advantages (generic), and I think we should
> > keep it supported by all crypto-devs.
> > From other side rte_security is an extendable framework that suits the purpose:
> > allows easily (and yes without ABI breakage) introduce new API for special type
> > of crypto-dev (SW based).
> >
> >
> 
> Adding a synchronous processing API is understandable and can be added in both
> Crypto as well as Security, but a new action type for it is not required.
> Now whether to support that, we have ABI/API breakage, that is a different issue.
> And we may have to deal with it if no other option is there.
> 
> >
> >
> >
> > > IMO the code should be placed where its intent is.
> > >
> > > >
> > > > BTW, what is your concern with proposed approach (via rte_security)?
> > > > From my perspective it is a lightweight change and it is totally optional
> > > > for the crypto PMDs to support it or not.
> > > > Konstantin
> > > >
> > > > > >
> > > > > > AESNI-GCM and AESNI-MB PMDs are updated with this support. There is
> > a
> > > > small
> > > > > > performance test app under app/test/security_aesni_gcm(mb)_perftest
> > to
> > > > > > prove.
> > > > > >
> > > > > > For the new API
> > > > > > The packet is sent to the crypto device for symmetric crypto
> > > > > > processing. The device will encrypt or decrypt the buffer based on the
> > > > session
> > > > > > data specified and preprocessed in the security session. Different
> > > > > > than the inline or lookaside modes, when the function exits, the user will
> > > > > > expect the buffers are either processed successfully, or having the error
> > > > number
> > > > > > assigned to the appropriate index of the status array.
> > > > > >
> > > > > > Will update the program's guide in the v1 patch.
> > > > > >
> > > > > > Regards,
> > > > > > Fan
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > > > > > > Sent: Wednesday, September 4, 2019 11:33 AM
> > > > > > > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > > > > > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty,
> > > > Declan
> > > > > > > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > > > > > > <pablo.de.lara.guarch@intel.com>
> > > > > > > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action
> > type
> > > > and
> > > > > > > API
> > > > > > >
> > > > > > > Hi Fan,
> > > > > > >
> > > > > > > >
> > > > > > > > This patch introduce new
> > RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > > > > > action
> > > > > > > > type to security library. The type represents performing crypto
> > > > > > > > operation with CPU cycles. The patch also includes a new API to
> > > > > > > > process crypto operations in bulk and the function pointers for PMDs.
> > > > > > > >
> > > > > > > I am not able to get the flow of execution for this action type. Could
> > you
> > > > > > > please elaborate the flow in the documentation. If not in
> > documentation
> > > > > > > right now, then please elaborate the flow in cover letter.
> > > > > > > Also I see that there are new APIs for processing crypto operations in
> > bulk.
> > > > > > > What does that mean. How are they different from the existing APIs
> > which
> > > > > > > are also handling bulk crypto ops depending on the budget.
> > > > > > >
> > > > > > >
> > > > > > > -Akhil


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-09-16 14:53                 ` Ananyev, Konstantin
@ 2019-09-16 15:08                   ` Ananyev, Konstantin
  2019-09-17  6:02                   ` Akhil Goyal
  1 sibling, 0 replies; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-09-16 15:08 UTC (permalink / raw)
  To: Ananyev, Konstantin, Akhil Goyal, dev, De Lara Guarch, Pablo,
	Thomas Monjalon
  Cc: Zhang, Roy Fan, Doherty, Declan, Anoob Joseph


> Hi Akhil,
> 
> > > > > > > This action type allows the burst of symmetric crypto workload using the
> > > > > same
> > > > > > > algorithm, key, and direction being processed by CPU cycles
> > > synchronously.
> > > > > > > This flexible action type does not require external hardware involvement,
> > > > > > > having the crypto workload processed synchronously, and is more
> > > > > performant
> > > > > > > than Cryptodev SW PMD due to the saved cycles on removed "async
> > > mode
> > > > > > > simulation" as well as 3 cacheline access of the crypto ops.
> > > > > >
> > > > > > Does that mean application will not call the cryptodev_enqueue_burst and
> > > > > corresponding dequeue burst.
> > > > >
> > > > > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> > > > >
> > > > > > It would be a new API something like process_packets and it will have the
> > > > > crypto processed packets while returning from the API?
> > > > >
> > > > > Yes, though the plan is that API will operate on raw data buffers, not mbufs.
> > > > >
> > > > > >
> > > > > > I still do not understand why we cannot do with the conventional crypto lib
> > > > > only.
> > > > > > As far as I can understand, you are not doing any protocol processing or
> > > any
> > > > > value add
> > > > > > To the crypto processing. IMO, you just need a synchronous crypto
> > > processing
> > > > > API which
> > > > > > Can be defined in cryptodev, you don't need to re-create a crypto session
> > > in
> > > > > the name of
> > > > > > Security session in the driver just to do a synchronous processing.
> > > > >
> > > > > I suppose your question is why not to have
> > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > The main reason is that would require disruptive changes in existing
> > > cryptodev
> > > > > API
> > > > > (would cause ABI/API breakage).
> > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need some extra
> > > > > information
> > > > > that normal crypto_sym_xform doesn't contain
> > > > > (cipher offset from the start of the buffer, might be something extra in
> > > future).
> > > >
> > > > Cipher offset will be part of rte_crypto_op.
> > >
> > > fill/read (+ alloc/free) is one of the main things that slowdown current crypto-op
> > > approach.
> > > That's why the general idea - have all data that wouldn't change from packet to
> > > packet
> > > included into the session and setup it once at session_init().
> >
> > I agree that you cannot use crypto-op.
> > You can have the new API in crypto.
> > As per the current patch, you only need cipher_offset which you can have it as a parameter until
> > You get it approved in the crypto xform. I believe it will be beneficial in case of other crypto cases as well.
> > We can have cipher offset at both places(crypto-op and cipher_xform). It will give flexibility to the user to
> > override it.
> 
> After having another thought on your proposal:
> Probably we can introduce new rte_crypto_sym_xform_types for CPU related stuff here?
> Let say we can have :
> num rte_crypto_sym_xform_type {
>         RTE_CRYPTO_SYM_XFORM_NOT_SPECIFIED = 0, /**< No xform specified */
>         RTE_CRYPTO_SYM_XFORM_AUTH,              /**< Authentication xform */
>         RTE_CRYPTO_SYM_XFORM_CIPHER,            /**< Cipher xform  */
>         RTE_CRYPTO_SYM_XFORM_AEAD               /**< AEAD xform  */
> +     RTE_CRYPTO_SYM_XFORM_CPU = INT32_MIN,
> +    RTE_CRYPTO_SYM_XFORM_CPU_AEAD = (RTE_CRYPTO_SYM_XFORM_CPU | RTE_CRYPTO_SYM_XFORM_CPU),
Meant
RTE_CRYPTO_SYM_XFORM_CPU_AEAD = (RTE_CRYPTO_SYM_XFORM_CPU | RTE_CRYPTO_SYM_XFORM_AEAD),
of course.

>       /* same for auth and crypto xforms */
> };
> 
> Then we either can re-define some values in struct rte_crypto_aead_xform (via unions),
> or even have new  struct rte_crypto_cpu_aead_xform (same for crypto and auth xforms).
> Then if PMD wants to support new sync API it would need to recognize new xform types
> and internally  it might end up with different session structure (one for sync, another for async mode).
> That I think should allow us to introduce cpu_crypto as part of crypto-dev API without ABI breakage.
> What do you think?
> Konstantin
> 
> >
> > >
> > > > If you intend not to use rte_crypto_op
> > > > You can pass this as an argument in the new cryptodev API.
> > >
> > > You mean extra parameter in rte_security_process_cpu_crypto_bulk()?
> > > It can be in theory, but that solution looks a bit ugly:
> > > 	why to pass for each call something that would be constant per session?
> > > 	Again having that value constant per session might allow some extra
> > > optimisations
> > > 	That would be hard to achieve for dynamic case.
> > > and not extendable:
> > > Suppose tomorrow will need to add something extra (some new algorithm
> > > support or so).
> > > With what you proposing will need to new parameter to the function,
> > > which means API breakage.
> > >
> > > > Something extra will also cause ABI breakage in security as well.
> > > > So it will be same.
> > >
> > > I don't think it would.
> > > AFAIK, right now this patch doesn't introduce any API/ABI breakage.
> > > Iinside struct rte_security_session_conf we have a union of xforms
> > > depending on session type.
> > > So as long as cpu_crypto_xform wouldn't exceed sizes of other xform -
> > > I believe no ABI breakage will appear.
> > Agreed, it will not break ABI in case of security till we do not exceed current size.
> >
> > Saving an ABI/API breakage is more important or placing the code at the correct place.
> > We need to find a tradeoff. Others can comment on this.
> > @Thomas Monjalon, @De Lara Guarch, Pablo Any comments?
> >
> > >
> > >
> > > >
> > > > > Also right now there is no way to add new type of crypto_sym_session
> > > without
> > > > > either breaking existing crypto-dev ABI/API or introducing new structure
> > > > > (rte_crypto_sym_cpu_session or so) for that.
> > > >
> > > > What extra info is required in rte_cryptodev_sym_session to get the
> > > rte_crypto_sym_cpu_session.
> > >
> > > Right now - just cipher_offset (see above).
> > > What else in future (if any) - don't know.
> > >
> > > > I don't think there is any.
> > > > I believe the same crypto session will be able to work synchronously as well.
> > >
> > > Exactly the same - problematically, see above.
> > >
> > > > We would only need  a new API to perform synchronous actions.
> > > > That will reduce the duplication code significantly
> > > > in the driver to support 2 different kind of APIs with similar code inside.
> > > > Please correct me in case I am missing something.
> > >
> > > To add new API into crypto-dev would also require changes in the PMD,
> > > it wouldn't come totally free and I believe would require roughly the same
> > > amount of changes.
> >
> > It will be required only in the PMDs which support it and would be minimal.
> > You would need a feature flag, support  for that synchronous API. Session information will
> > already be there in the session. The changes wrt cipher_offset need to be added
> > but with some default value to identify override will be done or not.
> >
> > >
> > > >
> > > >
> > > > > While rte_security is designed in a way that we can add new session types
> > > and
> > > > > related parameters without causing API/ABI breakage.
> > > >
> > > > Yes the intent is to add new sessions based on various protocols that can be
> > > supported by the driver.
> > >
> > > Various protocols and different types of sessions (and devices they belong to).
> > > Let say right now we have INLINE_CRYPTO, INLINE_PROTO, LOOKASIDE_PROTO,
> > > etc.
> > > Here we introduce new type of session.
> >
> > What is the new value add to the existing sessions. The changes that we are doing
> > here is just to avoid an API/ABI breakage. The synchronous processing can happen on both
> > crypto and security session. This would mean, only the processing API should be defined,
> > rest all should be already there in the sessions.
> > In All other cases, INLINE - eth device was not having any format to perform crypto op
> > LOOKASIDE - PROTO - add protocol specific sessions which is not available in crypto.
> >
> > >
> > > > It is not that we should find it as an alternative to cryptodev and using it just
> > > because it will not cause
> > > > ABI/API breakage.
> > >
> > > I am considering this new API as an alternative to existing ones, but as an
> > > extension.
> > > Existing crypto-op API has its own advantages (generic), and I think we should
> > > keep it supported by all crypto-devs.
> > > From other side rte_security is an extendable framework that suits the purpose:
> > > allows easily (and yes without ABI breakage) introduce new API for special type
> > > of crypto-dev (SW based).
> > >
> > >
> >
> > Adding a synchronous processing API is understandable and can be added in both
> > Crypto as well as Security, but a new action type for it is not required.
> > Now whether to support that, we have ABI/API breakage, that is a different issue.
> > And we may have to deal with it if no other option is there.
> >
> > >
> > >
> > >
> > > > IMO the code should be placed where its intent is.
> > > >
> > > > >
> > > > > BTW, what is your concern with proposed approach (via rte_security)?
> > > > > From my perspective it is a lightweight change and it is totally optional
> > > > > for the crypto PMDs to support it or not.
> > > > > Konstantin
> > > > >
> > > > > > >
> > > > > > > AESNI-GCM and AESNI-MB PMDs are updated with this support. There is
> > > a
> > > > > small
> > > > > > > performance test app under app/test/security_aesni_gcm(mb)_perftest
> > > to
> > > > > > > prove.
> > > > > > >
> > > > > > > For the new API
> > > > > > > The packet is sent to the crypto device for symmetric crypto
> > > > > > > processing. The device will encrypt or decrypt the buffer based on the
> > > > > session
> > > > > > > data specified and preprocessed in the security session. Different
> > > > > > > than the inline or lookaside modes, when the function exits, the user will
> > > > > > > expect the buffers are either processed successfully, or having the error
> > > > > number
> > > > > > > assigned to the appropriate index of the status array.
> > > > > > >
> > > > > > > Will update the program's guide in the v1 patch.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Fan
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > > > > > > > Sent: Wednesday, September 4, 2019 11:33 AM
> > > > > > > > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > > > > > > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty,
> > > > > Declan
> > > > > > > > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > > > > > > > <pablo.de.lara.guarch@intel.com>
> > > > > > > > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action
> > > type
> > > > > and
> > > > > > > > API
> > > > > > > >
> > > > > > > > Hi Fan,
> > > > > > > >
> > > > > > > > >
> > > > > > > > > This patch introduce new
> > > RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > > > > > > action
> > > > > > > > > type to security library. The type represents performing crypto
> > > > > > > > > operation with CPU cycles. The patch also includes a new API to
> > > > > > > > > process crypto operations in bulk and the function pointers for PMDs.
> > > > > > > > >
> > > > > > > > I am not able to get the flow of execution for this action type. Could
> > > you
> > > > > > > > please elaborate the flow in the documentation. If not in
> > > documentation
> > > > > > > > right now, then please elaborate the flow in cover letter.
> > > > > > > > Also I see that there are new APIs for processing crypto operations in
> > > bulk.
> > > > > > > > What does that mean. How are they different from the existing APIs
> > > which
> > > > > > > > are also handling bulk crypto ops depending on the budget.
> > > > > > > >
> > > > > > > >
> > > > > > > > -Akhil


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-09-16 14:53                 ` Ananyev, Konstantin
  2019-09-16 15:08                   ` Ananyev, Konstantin
@ 2019-09-17  6:02                   ` Akhil Goyal
  2019-09-18  7:44                     ` Ananyev, Konstantin
  1 sibling, 1 reply; 87+ messages in thread
From: Akhil Goyal @ 2019-09-17  6:02 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev, De Lara Guarch, Pablo, Thomas Monjalon
  Cc: Zhang, Roy Fan, Doherty, Declan, Anoob Joseph


Hi Konstantin,
> 
> Hi Akhil,
> 
> > > > > > > This action type allows the burst of symmetric crypto workload using
> the
> > > > > same
> > > > > > > algorithm, key, and direction being processed by CPU cycles
> > > synchronously.
> > > > > > > This flexible action type does not require external hardware
> involvement,
> > > > > > > having the crypto workload processed synchronously, and is more
> > > > > performant
> > > > > > > than Cryptodev SW PMD due to the saved cycles on removed "async
> > > mode
> > > > > > > simulation" as well as 3 cacheline access of the crypto ops.
> > > > > >
> > > > > > Does that mean application will not call the cryptodev_enqueue_burst
> and
> > > > > corresponding dequeue burst.
> > > > >
> > > > > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> > > > >
> > > > > > It would be a new API something like process_packets and it will have
> the
> > > > > crypto processed packets while returning from the API?
> > > > >
> > > > > Yes, though the plan is that API will operate on raw data buffers, not
> mbufs.
> > > > >
> > > > > >
> > > > > > I still do not understand why we cannot do with the conventional
> crypto lib
> > > > > only.
> > > > > > As far as I can understand, you are not doing any protocol processing
> or
> > > any
> > > > > value add
> > > > > > To the crypto processing. IMO, you just need a synchronous crypto
> > > processing
> > > > > API which
> > > > > > Can be defined in cryptodev, you don't need to re-create a crypto
> session
> > > in
> > > > > the name of
> > > > > > Security session in the driver just to do a synchronous processing.
> > > > >
> > > > > I suppose your question is why not to have
> > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > The main reason is that would require disruptive changes in existing
> > > cryptodev
> > > > > API
> > > > > (would cause ABI/API breakage).
> > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need some
> extra
> > > > > information
> > > > > that normal crypto_sym_xform doesn't contain
> > > > > (cipher offset from the start of the buffer, might be something extra in
> > > future).
> > > >
> > > > Cipher offset will be part of rte_crypto_op.
> > >
> > > fill/read (+ alloc/free) is one of the main things that slowdown current
> crypto-op
> > > approach.
> > > That's why the general idea - have all data that wouldn't change from packet
> to
> > > packet
> > > included into the session and setup it once at session_init().
> >
> > I agree that you cannot use crypto-op.
> > You can have the new API in crypto.
> > As per the current patch, you only need cipher_offset which you can have it as
> a parameter until
> > You get it approved in the crypto xform. I believe it will be beneficial in case of
> other crypto cases as well.
> > We can have cipher offset at both places(crypto-op and cipher_xform). It will
> give flexibility to the user to
> > override it.
> 
> After having another thought on your proposal:
> Probably we can introduce new rte_crypto_sym_xform_types for CPU related
> stuff here?

I also thought of adding new xforms, but that wont serve the purpose for may be all the cases.
You would be needing all information currently available in the current xforms.
So if you are adding new fields in the new xform, the size will be more than that of the union of xforms.
ABI breakage would still be there. 

If you think a valid compression of the AEAD xform can be done, then that can be done for each of the
Xforms and we can have a solution to this issue.

> Let say we can have :
> num rte_crypto_sym_xform_type {
>         RTE_CRYPTO_SYM_XFORM_NOT_SPECIFIED = 0, /**< No xform specified
> */
>         RTE_CRYPTO_SYM_XFORM_AUTH,              /**< Authentication xform */
>         RTE_CRYPTO_SYM_XFORM_CIPHER,            /**< Cipher xform  */
>         RTE_CRYPTO_SYM_XFORM_AEAD               /**< AEAD xform  */
> +     RTE_CRYPTO_SYM_XFORM_CPU = INT32_MIN,
> +    RTE_CRYPTO_SYM_XFORM_CPU_AEAD = (RTE_CRYPTO_SYM_XFORM_CPU |
> RTE_CRYPTO_SYM_XFORM_CPU),

Instead of CPU I believe SYNC would be better.

>       /* same for auth and crypto xforms */
> };
> 
> Then we either can re-define some values in struct rte_crypto_aead_xform (via
> unions),
> or even have new  struct rte_crypto_cpu_aead_xform (same for crypto and auth
> xforms).
> Then if PMD wants to support new sync API it would need to recognize new
> xform types
> and internally  it might end up with different session structure (one for sync,
> another for async mode).
> That I think should allow us to introduce cpu_crypto as part of crypto-dev API
> without ABI breakage.
> What do you think?
> Konstantin
> 
> >
> > >
> > > > If you intend not to use rte_crypto_op
> > > > You can pass this as an argument in the new cryptodev API.
> > >
> > > You mean extra parameter in rte_security_process_cpu_crypto_bulk()?
> > > It can be in theory, but that solution looks a bit ugly:
> > > 	why to pass for each call something that would be constant per session?
> > > 	Again having that value constant per session might allow some extra
> > > optimisations
> > > 	That would be hard to achieve for dynamic case.
> > > and not extendable:
> > > Suppose tomorrow will need to add something extra (some new algorithm
> > > support or so).
> > > With what you proposing will need to new parameter to the function,
> > > which means API breakage.
> > >
> > > > Something extra will also cause ABI breakage in security as well.
> > > > So it will be same.
> > >
> > > I don't think it would.
> > > AFAIK, right now this patch doesn't introduce any API/ABI breakage.
> > > Iinside struct rte_security_session_conf we have a union of xforms
> > > depending on session type.
> > > So as long as cpu_crypto_xform wouldn't exceed sizes of other xform -
> > > I believe no ABI breakage will appear.
> > Agreed, it will not break ABI in case of security till we do not exceed current
> size.
> >
> > Saving an ABI/API breakage is more important or placing the code at the
> correct place.
> > We need to find a tradeoff. Others can comment on this.
> > @Thomas Monjalon, @De Lara Guarch, Pablo Any comments?
> >
> > >
> > >
> > > >
> > > > > Also right now there is no way to add new type of crypto_sym_session
> > > without
> > > > > either breaking existing crypto-dev ABI/API or introducing new structure
> > > > > (rte_crypto_sym_cpu_session or so) for that.
> > > >
> > > > What extra info is required in rte_cryptodev_sym_session to get the
> > > rte_crypto_sym_cpu_session.
> > >
> > > Right now - just cipher_offset (see above).
> > > What else in future (if any) - don't know.
> > >
> > > > I don't think there is any.
> > > > I believe the same crypto session will be able to work synchronously as well.
> > >
> > > Exactly the same - problematically, see above.
> > >
> > > > We would only need  a new API to perform synchronous actions.
> > > > That will reduce the duplication code significantly
> > > > in the driver to support 2 different kind of APIs with similar code inside.
> > > > Please correct me in case I am missing something.
> > >
> > > To add new API into crypto-dev would also require changes in the PMD,
> > > it wouldn't come totally free and I believe would require roughly the same
> > > amount of changes.
> >
> > It will be required only in the PMDs which support it and would be minimal.
> > You would need a feature flag, support  for that synchronous API. Session
> information will
> > already be there in the session. The changes wrt cipher_offset need to be
> added
> > but with some default value to identify override will be done or not.
> >
> > >
> > > >
> > > >
> > > > > While rte_security is designed in a way that we can add new session
> types
> > > and
> > > > > related parameters without causing API/ABI breakage.
> > > >
> > > > Yes the intent is to add new sessions based on various protocols that can
> be
> > > supported by the driver.
> > >
> > > Various protocols and different types of sessions (and devices they belong
> to).
> > > Let say right now we have INLINE_CRYPTO, INLINE_PROTO,
> LOOKASIDE_PROTO,
> > > etc.
> > > Here we introduce new type of session.
> >
> > What is the new value add to the existing sessions. The changes that we are
> doing
> > here is just to avoid an API/ABI breakage. The synchronous processing can
> happen on both
> > crypto and security session. This would mean, only the processing API should
> be defined,
> > rest all should be already there in the sessions.
> > In All other cases, INLINE - eth device was not having any format to perform
> crypto op
> > LOOKASIDE - PROTO - add protocol specific sessions which is not available in
> crypto.
> >
> > >
> > > > It is not that we should find it as an alternative to cryptodev and using it
> just
> > > because it will not cause
> > > > ABI/API breakage.
> > >
> > > I am considering this new API as an alternative to existing ones, but as an
> > > extension.
> > > Existing crypto-op API has its own advantages (generic), and I think we
> should
> > > keep it supported by all crypto-devs.
> > > From other side rte_security is an extendable framework that suits the
> purpose:
> > > allows easily (and yes without ABI breakage) introduce new API for special
> type
> > > of crypto-dev (SW based).
> > >
> > >
> >
> > Adding a synchronous processing API is understandable and can be added in
> both
> > Crypto as well as Security, but a new action type for it is not required.
> > Now whether to support that, we have ABI/API breakage, that is a different
> issue.
> > And we may have to deal with it if no other option is there.
> >
> > >
> > >
> > >
> > > > IMO the code should be placed where its intent is.
> > > >
> > > > >
> > > > > BTW, what is your concern with proposed approach (via rte_security)?
> > > > > From my perspective it is a lightweight change and it is totally optional
> > > > > for the crypto PMDs to support it or not.
> > > > > Konstantin
> > > > >
> > > > > > >
> > > > > > > AESNI-GCM and AESNI-MB PMDs are updated with this support.
> There is
> > > a
> > > > > small
> > > > > > > performance test app under
> app/test/security_aesni_gcm(mb)_perftest
> > > to
> > > > > > > prove.
> > > > > > >
> > > > > > > For the new API
> > > > > > > The packet is sent to the crypto device for symmetric crypto
> > > > > > > processing. The device will encrypt or decrypt the buffer based on the
> > > > > session
> > > > > > > data specified and preprocessed in the security session. Different
> > > > > > > than the inline or lookaside modes, when the function exits, the user
> will
> > > > > > > expect the buffers are either processed successfully, or having the
> error
> > > > > number
> > > > > > > assigned to the appropriate index of the status array.
> > > > > > >
> > > > > > > Will update the program's guide in the v1 patch.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Fan
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > > > > > > > Sent: Wednesday, September 4, 2019 11:33 AM
> > > > > > > > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > > > > > > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty,
> > > > > Declan
> > > > > > > > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > > > > > > > <pablo.de.lara.guarch@intel.com>
> > > > > > > > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action
> > > type
> > > > > and
> > > > > > > > API
> > > > > > > >
> > > > > > > > Hi Fan,
> > > > > > > >
> > > > > > > > >
> > > > > > > > > This patch introduce new
> > > RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > > > > > > action
> > > > > > > > > type to security library. The type represents performing crypto
> > > > > > > > > operation with CPU cycles. The patch also includes a new API to
> > > > > > > > > process crypto operations in bulk and the function pointers for
> PMDs.
> > > > > > > > >
> > > > > > > > I am not able to get the flow of execution for this action type.
> Could
> > > you
> > > > > > > > please elaborate the flow in the documentation. If not in
> > > documentation
> > > > > > > > right now, then please elaborate the flow in cover letter.
> > > > > > > > Also I see that there are new APIs for processing crypto operations
> in
> > > bulk.
> > > > > > > > What does that mean. How are they different from the existing APIs
> > > which
> > > > > > > > are also handling bulk crypto ops depending on the budget.
> > > > > > > >
> > > > > > > >
> > > > > > > > -Akhil


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-09-17  6:02                   ` Akhil Goyal
@ 2019-09-18  7:44                     ` Ananyev, Konstantin
  2019-09-25 18:24                       ` Ananyev, Konstantin
  0 siblings, 1 reply; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-09-18  7:44 UTC (permalink / raw)
  To: Akhil Goyal, dev, De Lara Guarch, Pablo, Thomas Monjalon
  Cc: Zhang, Roy Fan, Doherty, Declan, Anoob Joseph


Hi Akhil,

> > > > > > > > This action type allows the burst of symmetric crypto workload using
> > the
> > > > > > same
> > > > > > > > algorithm, key, and direction being processed by CPU cycles
> > > > synchronously.
> > > > > > > > This flexible action type does not require external hardware
> > involvement,
> > > > > > > > having the crypto workload processed synchronously, and is more
> > > > > > performant
> > > > > > > > than Cryptodev SW PMD due to the saved cycles on removed "async
> > > > mode
> > > > > > > > simulation" as well as 3 cacheline access of the crypto ops.
> > > > > > >
> > > > > > > Does that mean application will not call the cryptodev_enqueue_burst
> > and
> > > > > > corresponding dequeue burst.
> > > > > >
> > > > > > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> > > > > >
> > > > > > > It would be a new API something like process_packets and it will have
> > the
> > > > > > crypto processed packets while returning from the API?
> > > > > >
> > > > > > Yes, though the plan is that API will operate on raw data buffers, not
> > mbufs.
> > > > > >
> > > > > > >
> > > > > > > I still do not understand why we cannot do with the conventional
> > crypto lib
> > > > > > only.
> > > > > > > As far as I can understand, you are not doing any protocol processing
> > or
> > > > any
> > > > > > value add
> > > > > > > To the crypto processing. IMO, you just need a synchronous crypto
> > > > processing
> > > > > > API which
> > > > > > > Can be defined in cryptodev, you don't need to re-create a crypto
> > session
> > > > in
> > > > > > the name of
> > > > > > > Security session in the driver just to do a synchronous processing.
> > > > > >
> > > > > > I suppose your question is why not to have
> > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > The main reason is that would require disruptive changes in existing
> > > > cryptodev
> > > > > > API
> > > > > > (would cause ABI/API breakage).
> > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need some
> > extra
> > > > > > information
> > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > (cipher offset from the start of the buffer, might be something extra in
> > > > future).
> > > > >
> > > > > Cipher offset will be part of rte_crypto_op.
> > > >
> > > > fill/read (+ alloc/free) is one of the main things that slowdown current
> > crypto-op
> > > > approach.
> > > > That's why the general idea - have all data that wouldn't change from packet
> > to
> > > > packet
> > > > included into the session and setup it once at session_init().
> > >
> > > I agree that you cannot use crypto-op.
> > > You can have the new API in crypto.
> > > As per the current patch, you only need cipher_offset which you can have it as
> > a parameter until
> > > You get it approved in the crypto xform. I believe it will be beneficial in case of
> > other crypto cases as well.
> > > We can have cipher offset at both places(crypto-op and cipher_xform). It will
> > give flexibility to the user to
> > > override it.
> >
> > After having another thought on your proposal:
> > Probably we can introduce new rte_crypto_sym_xform_types for CPU related
> > stuff here?
> 
> I also thought of adding new xforms, but that wont serve the purpose for may be all the cases.
> You would be needing all information currently available in the current xforms.
> So if you are adding new fields in the new xform, the size will be more than that of the union of xforms.
> ABI breakage would still be there.
> 
> If you think a valid compression of the AEAD xform can be done, then that can be done for each of the
> Xforms and we can have a solution to this issue.

I think that we can re-use iv.offset for our purposes (for crypto offset).
So for now we can make that path work without any ABI breakage. 
Fan, please feel free to correct me here, if I missed something.
If in future we would need to add some extra information it might
require ABI breakage, though by now I don't envision anything particular to add.
Anyway, if there is no objection to go that way, we can try to make
these changes for v2. 

> 
> > Let say we can have :
> > num rte_crypto_sym_xform_type {
> >         RTE_CRYPTO_SYM_XFORM_NOT_SPECIFIED = 0, /**< No xform specified
> > */
> >         RTE_CRYPTO_SYM_XFORM_AUTH,              /**< Authentication xform */
> >         RTE_CRYPTO_SYM_XFORM_CIPHER,            /**< Cipher xform  */
> >         RTE_CRYPTO_SYM_XFORM_AEAD               /**< AEAD xform  */
> > +     RTE_CRYPTO_SYM_XFORM_CPU = INT32_MIN,
> > +    RTE_CRYPTO_SYM_XFORM_CPU_AEAD = (RTE_CRYPTO_SYM_XFORM_CPU |
> > RTE_CRYPTO_SYM_XFORM_CPU),
> 
> Instead of CPU I believe SYNC would be better.

I don't mind to name it to SYNC, but I'd like to outline,
that it's not really more CPU then generic SYNC API
(it doesn't pass IOVA for data buffers, etc., only VA). 

> 
> >       /* same for auth and crypto xforms */
> > };
> >
> > Then we either can re-define some values in struct rte_crypto_aead_xform (via
> > unions),
> > or even have new  struct rte_crypto_cpu_aead_xform (same for crypto and auth
> > xforms).
> > Then if PMD wants to support new sync API it would need to recognize new
> > xform types
> > and internally  it might end up with different session structure (one for sync,
> > another for async mode).
> > That I think should allow us to introduce cpu_crypto as part of crypto-dev API
> > without ABI breakage.
> > What do you think?
> > Konstantin
> >
> > >
> > > >
> > > > > If you intend not to use rte_crypto_op
> > > > > You can pass this as an argument in the new cryptodev API.
> > > >
> > > > You mean extra parameter in rte_security_process_cpu_crypto_bulk()?
> > > > It can be in theory, but that solution looks a bit ugly:
> > > > 	why to pass for each call something that would be constant per session?
> > > > 	Again having that value constant per session might allow some extra
> > > > optimisations
> > > > 	That would be hard to achieve for dynamic case.
> > > > and not extendable:
> > > > Suppose tomorrow will need to add something extra (some new algorithm
> > > > support or so).
> > > > With what you proposing will need to new parameter to the function,
> > > > which means API breakage.
> > > >
> > > > > Something extra will also cause ABI breakage in security as well.
> > > > > So it will be same.
> > > >
> > > > I don't think it would.
> > > > AFAIK, right now this patch doesn't introduce any API/ABI breakage.
> > > > Iinside struct rte_security_session_conf we have a union of xforms
> > > > depending on session type.
> > > > So as long as cpu_crypto_xform wouldn't exceed sizes of other xform -
> > > > I believe no ABI breakage will appear.
> > > Agreed, it will not break ABI in case of security till we do not exceed current
> > size.
> > >
> > > Saving an ABI/API breakage is more important or placing the code at the
> > correct place.
> > > We need to find a tradeoff. Others can comment on this.
> > > @Thomas Monjalon, @De Lara Guarch, Pablo Any comments?
> > >
> > > >
> > > >
> > > > >
> > > > > > Also right now there is no way to add new type of crypto_sym_session
> > > > without
> > > > > > either breaking existing crypto-dev ABI/API or introducing new structure
> > > > > > (rte_crypto_sym_cpu_session or so) for that.
> > > > >
> > > > > What extra info is required in rte_cryptodev_sym_session to get the
> > > > rte_crypto_sym_cpu_session.
> > > >
> > > > Right now - just cipher_offset (see above).
> > > > What else in future (if any) - don't know.
> > > >
> > > > > I don't think there is any.
> > > > > I believe the same crypto session will be able to work synchronously as well.
> > > >
> > > > Exactly the same - problematically, see above.
> > > >
> > > > > We would only need  a new API to perform synchronous actions.
> > > > > That will reduce the duplication code significantly
> > > > > in the driver to support 2 different kind of APIs with similar code inside.
> > > > > Please correct me in case I am missing something.
> > > >
> > > > To add new API into crypto-dev would also require changes in the PMD,
> > > > it wouldn't come totally free and I believe would require roughly the same
> > > > amount of changes.
> > >
> > > It will be required only in the PMDs which support it and would be minimal.
> > > You would need a feature flag, support  for that synchronous API. Session
> > information will
> > > already be there in the session. The changes wrt cipher_offset need to be
> > added
> > > but with some default value to identify override will be done or not.
> > >
> > > >
> > > > >
> > > > >
> > > > > > While rte_security is designed in a way that we can add new session
> > types
> > > > and
> > > > > > related parameters without causing API/ABI breakage.
> > > > >
> > > > > Yes the intent is to add new sessions based on various protocols that can
> > be
> > > > supported by the driver.
> > > >
> > > > Various protocols and different types of sessions (and devices they belong
> > to).
> > > > Let say right now we have INLINE_CRYPTO, INLINE_PROTO,
> > LOOKASIDE_PROTO,
> > > > etc.
> > > > Here we introduce new type of session.
> > >
> > > What is the new value add to the existing sessions. The changes that we are
> > doing
> > > here is just to avoid an API/ABI breakage. The synchronous processing can
> > happen on both
> > > crypto and security session. This would mean, only the processing API should
> > be defined,
> > > rest all should be already there in the sessions.
> > > In All other cases, INLINE - eth device was not having any format to perform
> > crypto op
> > > LOOKASIDE - PROTO - add protocol specific sessions which is not available in
> > crypto.
> > >
> > > >
> > > > > It is not that we should find it as an alternative to cryptodev and using it
> > just
> > > > because it will not cause
> > > > > ABI/API breakage.
> > > >
> > > > I am considering this new API as an alternative to existing ones, but as an
> > > > extension.
> > > > Existing crypto-op API has its own advantages (generic), and I think we
> > should
> > > > keep it supported by all crypto-devs.
> > > > From other side rte_security is an extendable framework that suits the
> > purpose:
> > > > allows easily (and yes without ABI breakage) introduce new API for special
> > type
> > > > of crypto-dev (SW based).
> > > >
> > > >
> > >
> > > Adding a synchronous processing API is understandable and can be added in
> > both
> > > Crypto as well as Security, but a new action type for it is not required.
> > > Now whether to support that, we have ABI/API breakage, that is a different
> > issue.
> > > And we may have to deal with it if no other option is there.
> > >
> > > >
> > > >
> > > >
> > > > > IMO the code should be placed where its intent is.
> > > > >
> > > > > >
> > > > > > BTW, what is your concern with proposed approach (via rte_security)?
> > > > > > From my perspective it is a lightweight change and it is totally optional
> > > > > > for the crypto PMDs to support it or not.
> > > > > > Konstantin
> > > > > >
> > > > > > > >
> > > > > > > > AESNI-GCM and AESNI-MB PMDs are updated with this support.
> > There is
> > > > a
> > > > > > small
> > > > > > > > performance test app under
> > app/test/security_aesni_gcm(mb)_perftest
> > > > to
> > > > > > > > prove.
> > > > > > > >
> > > > > > > > For the new API
> > > > > > > > The packet is sent to the crypto device for symmetric crypto
> > > > > > > > processing. The device will encrypt or decrypt the buffer based on the
> > > > > > session
> > > > > > > > data specified and preprocessed in the security session. Different
> > > > > > > > than the inline or lookaside modes, when the function exits, the user
> > will
> > > > > > > > expect the buffers are either processed successfully, or having the
> > error
> > > > > > number
> > > > > > > > assigned to the appropriate index of the status array.
> > > > > > > >
> > > > > > > > Will update the program's guide in the v1 patch.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Fan
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > > > > > > > > Sent: Wednesday, September 4, 2019 11:33 AM
> > > > > > > > > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > > > > > > > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Doherty,
> > > > > > Declan
> > > > > > > > > <declan.doherty@intel.com>; De Lara Guarch, Pablo
> > > > > > > > > <pablo.de.lara.guarch@intel.com>
> > > > > > > > > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action
> > > > type
> > > > > > and
> > > > > > > > > API
> > > > > > > > >
> > > > > > > > > Hi Fan,
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > This patch introduce new
> > > > RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > > > > > > > action
> > > > > > > > > > type to security library. The type represents performing crypto
> > > > > > > > > > operation with CPU cycles. The patch also includes a new API to
> > > > > > > > > > process crypto operations in bulk and the function pointers for
> > PMDs.
> > > > > > > > > >
> > > > > > > > > I am not able to get the flow of execution for this action type.
> > Could
> > > > you
> > > > > > > > > please elaborate the flow in the documentation. If not in
> > > > documentation
> > > > > > > > > right now, then please elaborate the flow in cover letter.
> > > > > > > > > Also I see that there are new APIs for processing crypto operations
> > in
> > > > bulk.
> > > > > > > > > What does that mean. How are they different from the existing APIs
> > > > which
> > > > > > > > > are also handling bulk crypto ops depending on the budget.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > -Akhil


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [PATCH 02/10] crypto/aesni_gcm: add rte_security handler
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 02/10] crypto/aesni_gcm: add rte_security handler Fan Zhang
@ 2019-09-18 10:24     ` Ananyev, Konstantin
  0 siblings, 0 replies; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-09-18 10:24 UTC (permalink / raw)
  To: Zhang, Roy Fan, dev; +Cc: Doherty, Declan, akhil.goyal

Hi Fan,

> 
> This patch add rte_security support support to AESNI-GCM PMD. The PMD now
> initialize security context instance, create/delete PMD specific security
> sessions, and process crypto workloads in synchronous mode with
> scatter-gather list buffer supported.Hi 
> 
> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> ---
>  drivers/crypto/aesni_gcm/aesni_gcm_pmd.c         | 91 ++++++++++++++++++++++-
>  drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c     | 95 ++++++++++++++++++++++++
>  drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h | 23 ++++++
>  3 files changed, 208 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
> index 1006a5c4d..0a346eddd 100644
> --- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
> +++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
> @@ -6,6 +6,7 @@
>  #include <rte_hexdump.h>
>  #include <rte_cryptodev.h>
>  #include <rte_cryptodev_pmd.h>
> +#include <rte_security_driver.h>
>  #include <rte_bus_vdev.h>
>  #include <rte_malloc.h>
>  #include <rte_cpuflags.h>
> @@ -174,6 +175,56 @@ aesni_gcm_get_session(struct aesni_gcm_qp *qp, struct rte_crypto_op *op)
>  	return sess;
>  }
> 
> +static __rte_always_inline int
> +process_gcm_security_sgl_buf(struct aesni_gcm_security_session *sess,
> +		struct rte_security_vec *buf, uint8_t *iv,
> +		uint8_t *aad, uint8_t *digest)
> +{
> +	struct aesni_gcm_session *session = &sess->sess;
> +	uint8_t *tag;
> +	uint32_t i;
> +
> +	sess->init(&session->gdata_key, &sess->gdata_ctx, iv, aad,
> +			(uint64_t)session->aad_length);
> +
> +	for (i = 0; i < buf->num; i++) {
> +		struct iovec *vec = &buf->vec[i];
> +
> +		sess->update(&session->gdata_key, &sess->gdata_ctx,
> +				vec->iov_base, vec->iov_base, vec->iov_len);
> +	}
> +
> +	switch (session->op) {
> +	case AESNI_GCM_OP_AUTHENTICATED_ENCRYPTION:
> +		if (session->req_digest_length != session->gen_digest_length)
> +			tag = sess->temp_digest;
> +		else
> +			tag = digest;
> +
> +		sess->finalize(&session->gdata_key, &sess->gdata_ctx, tag,
> +				session->gen_digest_length);
> +
> +		if (session->req_digest_length != session->gen_digest_length)
> +			memcpy(digest, sess->temp_digest,
> +					session->req_digest_length);
> +		break;


Wonder can we move all these cases and ifs into session_create() time -
so instead of one process() function with a lot of branches,
we'll have several process functions with minimal/none branches.
I think it should help us to save extra cycles.

> +
> +	case AESNI_GCM_OP_AUTHENTICATED_DECRYPTION:
> +		tag = sess->temp_digest;
> +
> +		sess->finalize(&session->gdata_key, &sess->gdata_ctx, tag,
> +				session->gen_digest_length);
> +
> +		if (memcmp(tag, digest,	session->req_digest_length) != 0)
> +			return -1;
> +		break;
> +	default:
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
>  /**
>   * Process a crypto operation, calling
>   * the GCM API from the multi buffer library.
> @@ -488,8 +539,10 @@ aesni_gcm_create(const char *name,
>  {
>  	struct rte_cryptodev *dev;
>  	struct aesni_gcm_private *internals;
> +	struct rte_security_ctx *sec_ctx;
>  	enum aesni_gcm_vector_mode vector_mode;
>  	MB_MGR *mb_mgr;
> +	char sec_name[RTE_DEV_NAME_MAX_LEN];
> 
>  	/* Check CPU for support for AES instruction set */
>  	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_AES)) {
> @@ -524,7 +577,8 @@ aesni_gcm_create(const char *name,
>  			RTE_CRYPTODEV_FF_SYM_OPERATION_CHAINING |
>  			RTE_CRYPTODEV_FF_CPU_AESNI |
>  			RTE_CRYPTODEV_FF_OOP_SGL_IN_LB_OUT |
> -			RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT;
> +			RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT |
> +			RTE_CRYPTODEV_FF_SECURITY;
> 
>  	mb_mgr = alloc_mb_mgr(0);
>  	if (mb_mgr == NULL)
> @@ -587,6 +641,21 @@ aesni_gcm_create(const char *name,
> 
>  	internals->max_nb_queue_pairs = init_params->max_nb_queue_pairs;
> 
> +	/* setup security operations */
> +	snprintf(sec_name, sizeof(sec_name) - 1, "aes_gcm_sec_%u",
> +			dev->driver_id);
> +	sec_ctx = rte_zmalloc_socket(sec_name,
> +			sizeof(struct rte_security_ctx),
> +			RTE_CACHE_LINE_SIZE, init_params->socket_id);
> +	if (sec_ctx == NULL) {
> +		AESNI_GCM_LOG(ERR, "memory allocation failed\n");
> +		goto error_exit;
> +	}
> +
> +	sec_ctx->device = (void *)dev;
> +	sec_ctx->ops = rte_aesni_gcm_pmd_security_ops;
> +	dev->security_ctx = sec_ctx;
> +
>  #if IMB_VERSION_NUM >= IMB_VERSION(0, 50, 0)
>  	AESNI_GCM_LOG(INFO, "IPSec Multi-buffer library version used: %s\n",
>  			imb_get_version_str());
> @@ -641,6 +710,8 @@ aesni_gcm_remove(struct rte_vdev_device *vdev)
>  	if (cryptodev == NULL)
>  		return -ENODEV;
> 
> +	rte_free(cryptodev->security_ctx);
> +
>  	internals = cryptodev->data->dev_private;
> 
>  	free_mb_mgr(internals->mb_mgr);
> @@ -648,6 +719,24 @@ aesni_gcm_remove(struct rte_vdev_device *vdev)
>  	return rte_cryptodev_pmd_destroy(cryptodev);
>  }
> 
> +void
> +aesni_gcm_sec_crypto_process_bulk(struct rte_security_session *sess,
> +		struct rte_security_vec buf[], void *iv[], void *aad[],
> +		void *digest[], int status[], uint32_t num)
> +{
> +	struct aesni_gcm_security_session *session =
> +			get_sec_session_private_data(sess);
> +	uint32_t i;
> +
> +	if (unlikely(!session))
> +		return;

I think you can't just return here, you need to
set all status[] entries to some -errno value.

> +
> +	for (i = 0; i < num; i++)
> +		status[i] = process_gcm_security_sgl_buf(session, &buf[i],
> +				(uint8_t *)iv[i], (uint8_t *)aad[i],
> +				(uint8_t *)digest[i]);
> +}
> +
>  static struct rte_vdev_driver aesni_gcm_pmd_drv = {
>  	.probe = aesni_gcm_probe,
>  	.remove = aesni_gcm_remove
> diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
> index 2f66c7c58..cc71dbd60 100644
> --- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
> +++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
> @@ -7,6 +7,7 @@
>  #include <rte_common.h>
>  #include <rte_malloc.h>
>  #include <rte_cryptodev_pmd.h>
> +#include <rte_security_driver.h>
> 
>  #include "aesni_gcm_pmd_private.h"
> 
> @@ -316,6 +317,85 @@ aesni_gcm_pmd_sym_session_clear(struct rte_cryptodev *dev,
>  	}
>  }
> 
> +static int
> +aesni_gcm_security_session_create(void *dev,
> +		struct rte_security_session_conf *conf,
> +		struct rte_security_session *sess,
> +		struct rte_mempool *mempool)
> +{
> +	struct rte_cryptodev *cdev = dev;
> +	struct aesni_gcm_private *internals = cdev->data->dev_private;
> +	struct aesni_gcm_security_session *sess_priv;
> +	int ret;
> +
> +	if (!conf->crypto_xform) {
> +		AESNI_GCM_LOG(ERR, "Invalid security session conf");
> +		return -EINVAL;
> +	}
> +
> +	if (conf->crypto_xform->type == RTE_CRYPTO_SYM_XFORM_AUTH) {
> +		AESNI_GCM_LOG(ERR, "GMAC is not supported in security session");
> +		return -EINVAL;
> +	}
> +
> +
> +	if (rte_mempool_get(mempool, (void **)(&sess_priv))) {
> +		AESNI_GCM_LOG(ERR,
> +				"Couldn't get object from session mempool");
> +		return -ENOMEM;
> +	}
> +
> +	ret = aesni_gcm_set_session_parameters(internals->ops,
> +				&sess_priv->sess, conf->crypto_xform);
> +	if (ret != 0) {
> +		AESNI_GCM_LOG(ERR, "Failed configure session parameters");
> +
> +		/* Return session to mempool */
> +		rte_mempool_put(mempool, (void *)sess_priv);
> +		return ret;
> +	}
> +
> +	sess_priv->pre = internals->ops[sess_priv->sess.key].pre;
> +	sess_priv->init = internals->ops[sess_priv->sess.key].init;
> +	if (sess_priv->sess.op == AESNI_GCM_OP_AUTHENTICATED_ENCRYPTION) {
> +		sess_priv->update =
> +			internals->ops[sess_priv->sess.key].update_enc;
> +		sess_priv->finalize =
> +			internals->ops[sess_priv->sess.key].finalize_enc;
> +	} else {
> +		sess_priv->update =
> +			internals->ops[sess_priv->sess.key].update_dec;
> +		sess_priv->finalize =
> +			internals->ops[sess_priv->sess.key].finalize_dec;
> +	}
> +
> +	sess->sess_private_data = sess_priv;
> +
> +	return 0;
> +}
> +
> +static int
> +aesni_gcm_security_session_destroy(void *dev __rte_unused,
> +		struct rte_security_session *sess)
> +{
> +	void *sess_priv = get_sec_session_private_data(sess);
> +
> +	if (sess_priv) {
> +		struct rte_mempool *sess_mp = rte_mempool_from_obj(sess_priv);
> +
> +		memset(sess, 0, sizeof(struct aesni_gcm_security_session));
> +		set_sec_session_private_data(sess, NULL);
> +		rte_mempool_put(sess_mp, sess_priv);
> +	}
> +	return 0;
> +}
> +
> +static unsigned int
> +aesni_gcm_sec_session_get_size(__rte_unused void *device)
> +{
> +	return sizeof(struct aesni_gcm_security_session);
> +}
> +
>  struct rte_cryptodev_ops aesni_gcm_pmd_ops = {
>  		.dev_configure		= aesni_gcm_pmd_config,
>  		.dev_start		= aesni_gcm_pmd_start,
> @@ -336,4 +416,19 @@ struct rte_cryptodev_ops aesni_gcm_pmd_ops = {
>  		.sym_session_clear	= aesni_gcm_pmd_sym_session_clear
>  };
> 
> +static struct rte_security_ops aesni_gcm_security_ops = {
> +		.session_create = aesni_gcm_security_session_create,
> +		.session_get_size = aesni_gcm_sec_session_get_size,
> +		.session_update = NULL,
> +		.session_stats_get = NULL,
> +		.session_destroy = aesni_gcm_security_session_destroy,
> +		.set_pkt_metadata = NULL,
> +		.capabilities_get = NULL,
> +		.process_cpu_crypto_bulk =
> +				aesni_gcm_sec_crypto_process_bulk,
> +};
> +
>  struct rte_cryptodev_ops *rte_aesni_gcm_pmd_ops = &aesni_gcm_pmd_ops;
> +
> +struct rte_security_ops *rte_aesni_gcm_pmd_security_ops =
> +		&aesni_gcm_security_ops;
> diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
> index 56b29e013..8e490b6ce 100644
> --- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
> +++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
> @@ -114,5 +114,28 @@ aesni_gcm_set_session_parameters(const struct aesni_gcm_ops *ops,
>   * Device specific operations function pointer structure */
>  extern struct rte_cryptodev_ops *rte_aesni_gcm_pmd_ops;
> 
> +/**
> + * Security session structure.
> + */
> +struct aesni_gcm_security_session {
> +	/** Temp digest for decryption */
> +	uint8_t temp_digest[DIGEST_LENGTH_MAX];
> +	/** GCM operations */
> +	aesni_gcm_pre_t pre;
> +	aesni_gcm_init_t init;
> +	aesni_gcm_update_t update;
> +	aesni_gcm_finalize_t finalize;
> +	/** AESNI-GCM session */
> +	struct aesni_gcm_session sess;
> +	/** AESNI-GCM context */
> +	struct gcm_context_data gdata_ctx;
> +};
> +
> +extern void
> +aesni_gcm_sec_crypto_process_bulk(struct rte_security_session *sess,
> +		struct rte_security_vec buf[], void *iv[], void *aad[],
> +		void *digest[], int status[], uint32_t num);
> +
> +extern struct rte_security_ops *rte_aesni_gcm_pmd_security_ops;
> 
>  #endif /* _RTE_AESNI_GCM_PMD_PRIVATE_H_ */
> --
> 2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [PATCH 01/10] security: introduce CPU Crypto action type and API
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 01/10] security: introduce CPU Crypto action type and API Fan Zhang
@ 2019-09-18 12:45     ` Ananyev, Konstantin
  2019-09-29  6:00     ` Hemant Agrawal
  1 sibling, 0 replies; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-09-18 12:45 UTC (permalink / raw)
  To: Zhang, Roy Fan, dev; +Cc: Doherty, Declan, akhil.goyal

> +/**
> + * Security vector structure, contains pointer to vector array and the length
> + * of the array
> + */
> +struct rte_security_vec {
> +	struct iovec *vec;
> +	uint32_t num;
> +};
> +
> +/**
> + * Processing bulk crypto workload with CPU
> + *
> + * @param	instance	security instance.
> + * @param	sess		security session
> + * @param	buf		array of buffer SGL vectors
> + * @param	iv		array of IV pointers
> + * @param	aad		array of AAD pointers
> + * @param	digest		array of digest pointers
> + * @param	status		array of status for the function to return


Need to specify what are expected status values.
I suppose zero for success, negative errno for some error happens?

> + * @param	num		number of elements in each array
> + *
> + */
> +__rte_experimental
> +void
> +rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
> +		struct rte_security_session *sess,
> +		struct rte_security_vec buf[], void *iv[], void *aad[],
> +		void *digest[], int status[], uint32_t num);
> +
>  #ifdef __cplusplus
>  }
>  #endif

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [PATCH 05/10] crypto/aesni_mb: add rte_security handler
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 05/10] crypto/aesni_mb: add rte_security handler Fan Zhang
@ 2019-09-18 15:20     ` Ananyev, Konstantin
  0 siblings, 0 replies; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-09-18 15:20 UTC (permalink / raw)
  To: Zhang, Roy Fan, dev; +Cc: Doherty, Declan, akhil.goyal


> 
> This patch add rte_security support support to AESNI-MB PMD. The PMD now
> initialize security context instance, create/delete PMD specific security
> sessions, and process crypto workloads in synchronous mode.
> 
> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> ---
>  drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c         | 291 ++++++++++++++++++++-
>  drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c     |  91 ++++++-
>  drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h |  21 +-
>  3 files changed, 398 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
> index b495a9679..68767c04e 100644
> --- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
> +++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
> @@ -8,6 +8,8 @@
>  #include <rte_hexdump.h>
>  #include <rte_cryptodev.h>
>  #include <rte_cryptodev_pmd.h>
> +#include <rte_security.h>
> +#include <rte_security_driver.h>
>  #include <rte_bus_vdev.h>
>  #include <rte_malloc.h>
>  #include <rte_cpuflags.h>
> @@ -789,6 +791,167 @@ auth_start_offset(struct rte_crypto_op *op, struct aesni_mb_session *session,
>  			(UINT64_MAX - u_src + u_dst + 1);
>  }
> 
> +union sec_userdata_field {
> +	int status;
> +	struct {
> +		uint16_t is_gen_digest;
> +		uint16_t digest_len;
> +	};
> +};
> +
> +struct sec_udata_digest_field {
> +	uint32_t is_digest_gen;
> +	uint32_t digest_len;
> +};
> +
> +static inline int
> +set_mb_job_params_sec(JOB_AES_HMAC *job, struct aesni_mb_sec_session *sec_sess,
> +		void *buf, uint32_t buf_len, void *iv, void *aad, void *digest,
> +		int *status, uint8_t *digest_idx)
> +{
> +	struct aesni_mb_session *session = &sec_sess->sess;
> +	uint32_t cipher_offset = sec_sess->cipher_offset;
> +	void *user_digest = NULL;
> +	union sec_userdata_field udata;
> +
> +	if (unlikely(cipher_offset > buf_len))
> +		return -EINVAL;
> +
> +	/* Set crypto operation */
> +	job->chain_order = session->chain_order;
> +
> +	/* Set cipher parameters */
> +	job->cipher_direction = session->cipher.direction;
> +	job->cipher_mode = session->cipher.mode;
> +
> +	job->aes_key_len_in_bytes = session->cipher.key_length_in_bytes;
> +
> +	/* Set authentication parameters */
> +	job->hash_alg = session->auth.algo;
> +	job->iv = iv;
> +
> +	switch (job->hash_alg) {
> +	case AES_XCBC:
> +		job->u.XCBC._k1_expanded = session->auth.xcbc.k1_expanded;
> +		job->u.XCBC._k2 = session->auth.xcbc.k2;
> +		job->u.XCBC._k3 = session->auth.xcbc.k3;
> +
> +		job->aes_enc_key_expanded =
> +				session->cipher.expanded_aes_keys.encode;
> +		job->aes_dec_key_expanded =
> +				session->cipher.expanded_aes_keys.decode;
> +		break;
> +
> +	case AES_CCM:
> +		job->u.CCM.aad = (uint8_t *)aad + 18;
> +		job->u.CCM.aad_len_in_bytes = session->aead.aad_len;
> +		job->aes_enc_key_expanded =
> +				session->cipher.expanded_aes_keys.encode;
> +		job->aes_dec_key_expanded =
> +				session->cipher.expanded_aes_keys.decode;
> +		job->iv++;
> +		break;
> +
> +	case AES_CMAC:
> +		job->u.CMAC._key_expanded = session->auth.cmac.expkey;
> +		job->u.CMAC._skey1 = session->auth.cmac.skey1;
> +		job->u.CMAC._skey2 = session->auth.cmac.skey2;
> +		job->aes_enc_key_expanded =
> +				session->cipher.expanded_aes_keys.encode;
> +		job->aes_dec_key_expanded =
> +				session->cipher.expanded_aes_keys.decode;
> +		break;
> +
> +	case AES_GMAC:
> +		if (session->cipher.mode == GCM) {
> +			job->u.GCM.aad = aad;
> +			job->u.GCM.aad_len_in_bytes = session->aead.aad_len;
> +		} else {
> +			/* For GMAC */
> +			job->u.GCM.aad = aad;
> +			job->u.GCM.aad_len_in_bytes = buf_len;
> +			job->cipher_mode = GCM;
> +		}
> +		job->aes_enc_key_expanded = &session->cipher.gcm_key;
> +		job->aes_dec_key_expanded = &session->cipher.gcm_key;
> +		break;
> +
> +	default:
> +		job->u.HMAC._hashed_auth_key_xor_ipad =
> +				session->auth.pads.inner;
> +		job->u.HMAC._hashed_auth_key_xor_opad =
> +				session->auth.pads.outer;
> +
> +		if (job->cipher_mode == DES3) {
> +			job->aes_enc_key_expanded =
> +				session->cipher.exp_3des_keys.ks_ptr;
> +			job->aes_dec_key_expanded =
> +				session->cipher.exp_3des_keys.ks_ptr;
> +		} else {
> +			job->aes_enc_key_expanded =
> +				session->cipher.expanded_aes_keys.encode;
> +			job->aes_dec_key_expanded =
> +				session->cipher.expanded_aes_keys.decode;
> +		}
> +	}

Seems like too many branches at data-path.
We'll have only one job-type(alg) per session.
So we can have prefilled job struct template with all common fields already setuped,
and then at process() just copy it over and update few fields that has to be different
(like msg_len_to_cipher_in_bytes).
 

> +
> +	/* Set digest output location */
> +	if (job->hash_alg != NULL_HASH &&
> +			session->auth.operation == RTE_CRYPTO_AUTH_OP_VERIFY) {
> +		job->auth_tag_output = sec_sess->temp_digests[*digest_idx];
> +		*digest_idx = (*digest_idx + 1) % MAX_JOBS;
> +
> +		udata.is_gen_digest = 0;
> +		udata.digest_len = session->auth.req_digest_len;
> +		user_digest = (void *)digest;
> +	} else {
> +		udata.is_gen_digest = 1;
> +		udata.digest_len = session->auth.req_digest_len;
> +
> +		if (session->auth.req_digest_len !=
> +				session->auth.gen_digest_len) {
> +			job->auth_tag_output =
> +					sec_sess->temp_digests[*digest_idx];
> +			*digest_idx = (*digest_idx + 1) % MAX_JOBS;
> +
> +			user_digest = (void *)digest;
> +		} else
> +			job->auth_tag_output = digest;
> +
> +		/* A bit of hack here, since job structure only supports
> +		 * 2 user data fields and we need 4 params to be passed
> +		 * (status, direction, digest for verify, and length of
> +		 * digest), we set the status value as digest length +
> +		 * direction here temporarily to avoid creating longer
> +		 * buffer to store all 4 params.
> +		 */
> +		*status = udata.status;
> +	}
> +	/*
> +	 * Multi-buffer library current only support returning a truncated
> +	 * digest length as specified in the relevant IPsec RFCs
> +	 */
> +
> +	/* Set digest length */
> +	job->auth_tag_output_len_in_bytes = session->auth.gen_digest_len;
> +
> +	/* Set IV parameters */
> +	job->iv_len_in_bytes = session->iv.length;
> +
> +	/* Data Parameters */
> +	job->src = buf;
> +	job->dst = buf;
> +	job->cipher_start_src_offset_in_bytes = cipher_offset;
> +	job->msg_len_to_cipher_in_bytes = buf_len - cipher_offset;
> +	job->hash_start_src_offset_in_bytes = 0;
> +	job->msg_len_to_hash_in_bytes = buf_len;
> +
> +	job->user_data = (void *)status;
> +	job->user_data2 = user_digest;
> +
> +	return 0;
> +}
> +
>  /**
>   * Process a crypto operation and complete a JOB_AES_HMAC job structure for
>   * submission to the multi buffer library for processing.
> @@ -1081,6 +1244,37 @@ post_process_mb_job(struct aesni_mb_qp *qp, JOB_AES_HMAC *job)
>  	return op;
>  }
> 
> +static inline void
> +post_process_mb_sec_job(JOB_AES_HMAC *job)
> +{
> +	void *user_digest = job->user_data2;
> +	int *status = job->user_data;
> +	union sec_userdata_field udata;
> +
> +	switch (job->status) {
> +	case STS_COMPLETED:
> +		if (user_digest) {
> +			udata.status = *status;
> +
> +			if (udata.is_gen_digest) {
> +				*status = RTE_CRYPTO_OP_STATUS_SUCCESS;
> +				memcpy(user_digest, job->auth_tag_output,
> +						udata.digest_len);
> +			} else {
> +				verify_digest(job, user_digest,
> +					udata.digest_len, (uint8_t *)status);
> +
> +				if (*status == RTE_CRYPTO_OP_STATUS_AUTH_FAILED)
> +					*status = -1;
> +			}

Again - multiple process() functions instead of branches at data-path?

> +		} else
> +			*status = RTE_CRYPTO_OP_STATUS_SUCCESS;
> +		break;
> +	default:
> +		*status = RTE_CRYPTO_OP_STATUS_ERROR;
> +	}
> +}
> +
>  /**
>   * Process a completed JOB_AES_HMAC job and keep processing jobs until
>   * get_completed_job return NULL
> @@ -1117,6 +1311,32 @@ handle_completed_jobs(struct aesni_mb_qp *qp, JOB_AES_HMAC *job,
>  	return processed_jobs;
>  }
> 
> +static inline uint32_t
> +handle_completed_sec_jobs(JOB_AES_HMAC *job, MB_MGR *mb_mgr)
> +{
> +	uint32_t processed = 0;
> +
> +	while (job != NULL) {
> +		post_process_mb_sec_job(job);
> +		job = IMB_GET_COMPLETED_JOB(mb_mgr);
> +		processed++;
> +	}
> +
> +	return processed;
> +}
> +
> +static inline uint32_t
> +flush_mb_sec_mgr(MB_MGR *mb_mgr)
> +{
> +	JOB_AES_HMAC *job = IMB_FLUSH_JOB(mb_mgr);
> +	uint32_t processed = 0;
> +
> +	if (job)
> +		processed = handle_completed_sec_jobs(job, mb_mgr);
> +
> +	return processed;
> +}
> +
>  static inline uint16_t
>  flush_mb_mgr(struct aesni_mb_qp *qp, struct rte_crypto_op **ops,
>  		uint16_t nb_ops)
> @@ -1220,6 +1440,55 @@ aesni_mb_pmd_dequeue_burst(void *queue_pair, struct rte_crypto_op **ops,
>  	return processed_jobs;
>  }
> 
> +void
> +aesni_mb_sec_crypto_process_bulk(struct rte_security_session *sess,
> +		struct rte_security_vec buf[], void *iv[], void *aad[],
> +		void *digest[], int status[], uint32_t num)
> +{
> +	struct aesni_mb_sec_session *sec_sess = sess->sess_private_data;
> +	JOB_AES_HMAC *job;
> +	uint8_t digest_idx = sec_sess->digest_idx;
> +	uint32_t i, processed = 0;
> +	int ret;
> +
> +	for (i = 0; i < num; i++) {
> +		void *seg_buf = buf[i].vec[0].iov_base;
> +		uint32_t buf_len = buf[i].vec[0].iov_len;
> +
> +		job = IMB_GET_NEXT_JOB(sec_sess->mb_mgr);
> +		if (unlikely(job == NULL)) {
> +			processed += flush_mb_sec_mgr(sec_sess->mb_mgr);
> +
> +			job = IMB_GET_NEXT_JOB(sec_sess->mb_mgr);
> +			if (!job)
> +				return;

You can't just return here.
Need to fill remaining statsu[] with some meaningfull error value.
As alternative make proceee_bulk() to return number of processed buffers instead of void.


> +		}
> +
> +		ret = set_mb_job_params_sec(job, sec_sess, seg_buf, buf_len,
> +				iv[i], aad[i], digest[i], &status[i],
> +				&digest_idx);

That doesn't look right: 
digest_idx is a temporary valiable, you pass it's address to set_mb_job_params_sec(),
where it will be updated, but then you never write you back.
So do we really need digest_idx inside the session?
Overall, the whole construction with having status and idx stored inside job struct
seems overcomplicated and probably error prone.
AFAIK, aesni-mb job-manager guarantees FIFO order jobs submitted.
So just having idx counter inside that function seems enough, no?


> +				/* Submit job to multi-buffer for processing */
> +		if (ret) {
> +			processed++;
> +			status[i] = ret;
> +			continue;
> +		}
> +
> +#ifdef RTE_LIBRTE_PMD_AESNI_MB_DEBUG
> +		job = IMB_SUBMIT_JOB(sec_sess->mb_mgr);
> +#else
> +		job = IMB_SUBMIT_JOB_NOCHECK(sec_sess->mb_mgr);
> +#endif
> +
> +		if (job)
> +			processed += handle_completed_sec_jobs(job,
> +					sec_sess->mb_mgr);
> +	}
> +
> +	while (processed < num)
> +		processed += flush_mb_sec_mgr(sec_sess->mb_mgr);
> +}
> +
>  static int cryptodev_aesni_mb_remove(struct rte_vdev_device *vdev);
> 
>  static int
> @@ -1229,8 +1498,10 @@ cryptodev_aesni_mb_create(const char *name,
>  {
>  	struct rte_cryptodev *dev;
>  	struct aesni_mb_private *internals;
> +	struct rte_security_ctx *sec_ctx;
>  	enum aesni_mb_vector_mode vector_mode;
>  	MB_MGR *mb_mgr;
> +	char sec_name[RTE_DEV_NAME_MAX_LEN];
> 
>  	/* Check CPU for support for AES instruction set */
>  	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_AES)) {
> @@ -1264,7 +1535,8 @@ cryptodev_aesni_mb_create(const char *name,
>  	dev->feature_flags = RTE_CRYPTODEV_FF_SYMMETRIC_CRYPTO |
>  			RTE_CRYPTODEV_FF_SYM_OPERATION_CHAINING |
>  			RTE_CRYPTODEV_FF_CPU_AESNI |
> -			RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT;
> +			RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT |
> +			RTE_CRYPTODEV_FF_SECURITY;
> 
> 
>  	mb_mgr = alloc_mb_mgr(0);
> @@ -1303,11 +1575,28 @@ cryptodev_aesni_mb_create(const char *name,
>  	AESNI_MB_LOG(INFO, "IPSec Multi-buffer library version used: %s\n",
>  			imb_get_version_str());
> 
> +	/* setup security operations */
> +	snprintf(sec_name, sizeof(sec_name) - 1, "aes_mb_sec_%u",
> +			dev->driver_id);
> +	sec_ctx = rte_zmalloc_socket(sec_name,
> +			sizeof(struct rte_security_ctx),
> +			RTE_CACHE_LINE_SIZE, init_params->socket_id);
> +	if (sec_ctx == NULL) {
> +		AESNI_MB_LOG(ERR, "memory allocation failed\n");
> +		goto error_exit;
> +	}
> +
> +	sec_ctx->device = (void *)dev;
> +	sec_ctx->ops = rte_aesni_mb_pmd_security_ops;
> +	dev->security_ctx = sec_ctx;
> +
>  	return 0;
> 
>  error_exit:
>  	if (mb_mgr)
>  		free_mb_mgr(mb_mgr);
> +	if (sec_ctx)
> +		rte_free(sec_ctx);
> 
>  	rte_cryptodev_pmd_destroy(dev);
> 
> diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
> index 8d15b99d4..ca6cea775 100644
> --- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
> +++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
> @@ -8,6 +8,7 @@
>  #include <rte_common.h>
>  #include <rte_malloc.h>
>  #include <rte_cryptodev_pmd.h>
> +#include <rte_security_driver.h>
> 
>  #include "rte_aesni_mb_pmd_private.h"
> 
> @@ -732,7 +733,8 @@ aesni_mb_pmd_qp_count(struct rte_cryptodev *dev)
>  static unsigned
>  aesni_mb_pmd_sym_session_get_size(struct rte_cryptodev *dev __rte_unused)
>  {
> -	return sizeof(struct aesni_mb_session);
> +	return RTE_ALIGN_CEIL(sizeof(struct aesni_mb_session),
> +			RTE_CACHE_LINE_SIZE);
>  }
> 
>  /** Configure a aesni multi-buffer session from a crypto xform chain */
> @@ -810,4 +812,91 @@ struct rte_cryptodev_ops aesni_mb_pmd_ops = {
>  		.sym_session_clear	= aesni_mb_pmd_sym_session_clear
>  };
> 
> +/** Set session authentication parameters */
> +
> +static int
> +aesni_mb_security_session_create(void *dev,
> +		struct rte_security_session_conf *conf,
> +		struct rte_security_session *sess,
> +		struct rte_mempool *mempool)
> +{
> +	struct rte_cryptodev *cdev = dev;
> +	struct aesni_mb_private *internals = cdev->data->dev_private;
> +	struct aesni_mb_sec_session *sess_priv;
> +	int ret;
> +
> +	if (!conf->crypto_xform) {
> +		AESNI_MB_LOG(ERR, "Invalid security session conf");
> +		return -EINVAL;
> +	}
> +
> +	if (rte_mempool_get(mempool, (void **)(&sess_priv))) {
> +		AESNI_MB_LOG(ERR,
> +				"Couldn't get object from session mempool");
> +		return -ENOMEM;
> +	}
> +
> +	sess_priv->mb_mgr = internals->mb_mgr;

After another thoughts - I don't think it is ok to use the same job-manager
across all sessions. Different sessions can be used by different threads, etc.
I think we need a separate instance of job-manager for every session.  


> +	if (sess_priv->mb_mgr == NULL)
> +		return -ENOMEM;
> +
> +	sess_priv->cipher_offset = conf->cpucrypto.cipher_offset;
> +
> +	ret = aesni_mb_set_session_parameters(sess_priv->mb_mgr,
> +			&sess_priv->sess, conf->crypto_xform);
> +	if (ret != 0) {
> +		AESNI_MB_LOG(ERR, "failed configure session parameters");
> +
> +		rte_mempool_put(mempool, sess_priv);
> +	}
> +
> +	sess->sess_private_data = (void *)sess_priv;
> +
> +	return ret;
> +}
> +
> +static int
> +aesni_mb_security_session_destroy(void *dev __rte_unused,
> +		struct rte_security_session *sess)
> +{
> +	struct aesni_mb_sec_session *sess_priv =
> +			get_sec_session_private_data(sess);
> +
> +	if (sess_priv) {
> +		struct rte_mempool *sess_mp = rte_mempool_from_obj(
> +				(void *)sess_priv);
> +
> +		memset(sess, 0, sizeof(struct aesni_mb_sec_session));
> +		set_sec_session_private_data(sess, NULL);
> +
> +		if (sess_mp == NULL) {
> +			AESNI_MB_LOG(ERR, "failed fetch session mempool");
> +			return -EINVAL;
> +		}
> +
> +		rte_mempool_put(sess_mp, sess_priv);
> +	}
> +
> +	return 0;
> +}
> +
> +static unsigned int
> +aesni_mb_sec_session_get_size(__rte_unused void *device)
> +{
> +	return RTE_ALIGN_CEIL(sizeof(struct aesni_mb_sec_session),
> +			RTE_CACHE_LINE_SIZE);
> +}
> +
> +static struct rte_security_ops aesni_mb_security_ops = {
> +		.session_create = aesni_mb_security_session_create,
> +		.session_get_size = aesni_mb_sec_session_get_size,
> +		.session_update = NULL,
> +		.session_stats_get = NULL,
> +		.session_destroy = aesni_mb_security_session_destroy,
> +		.set_pkt_metadata = NULL,
> +		.capabilities_get = NULL,
> +		.process_cpu_crypto_bulk = aesni_mb_sec_crypto_process_bulk,
> +};
> +
>  struct rte_cryptodev_ops *rte_aesni_mb_pmd_ops = &aesni_mb_pmd_ops;
> +struct rte_security_ops *rte_aesni_mb_pmd_security_ops = &aesni_mb_security_ops;
> diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h
> index b794d4bc1..d1cf416ab 100644
> --- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h
> +++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h
> @@ -176,7 +176,6 @@ struct aesni_mb_qp {
>  	 */
>  } __rte_cache_aligned;
> 
> -/** AES-NI multi-buffer private session structure */
>  struct aesni_mb_session {
>  	JOB_CHAIN_ORDER chain_order;
>  	struct {
> @@ -265,16 +264,32 @@ struct aesni_mb_session {
>  		/** AAD data length */
>  		uint16_t aad_len;
>  	} aead;
> -} __rte_cache_aligned;

Didn't look through all the code 

> +};
> +
> +/** AES-NI multi-buffer private security session structure */
> +struct aesni_mb_sec_session {
> +	/**< Unique Queue Pair Name */
> +	struct aesni_mb_session sess;
> +	uint8_t temp_digests[MAX_JOBS][DIGEST_LENGTH_MAX];

Probably better to move these temp_digest[][] at the very end?
To have all read-only data grouped together?

> +	uint16_t digest_idx;
> +	uint32_t cipher_offset;
> +	MB_MGR *mb_mgr;
> +};
> 
>  extern int
>  aesni_mb_set_session_parameters(const MB_MGR *mb_mgr,
>  		struct aesni_mb_session *sess,
>  		const struct rte_crypto_sym_xform *xform);
> 
> +extern void
> +aesni_mb_sec_crypto_process_bulk(struct rte_security_session *sess,
> +		struct rte_security_vec buf[], void *iv[], void *aad[],
> +		void *digest[], int status[], uint32_t num);
> +
>  /** device specific operations function pointer structure */
>  extern struct rte_cryptodev_ops *rte_aesni_mb_pmd_ops;
> 
> -
> +/** device specific operations function pointer structure for rte_security */
> +extern struct rte_security_ops *rte_aesni_mb_pmd_security_ops;
> 
>  #endif /* _RTE_AESNI_MB_PMD_PRIVATE_H_ */
> --
> 2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-09-18  7:44                     ` Ananyev, Konstantin
@ 2019-09-25 18:24                       ` Ananyev, Konstantin
  2019-09-27  9:26                         ` Akhil Goyal
  0 siblings, 1 reply; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-09-25 18:24 UTC (permalink / raw)
  To: 'Akhil Goyal', 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon'
  Cc: Zhang, Roy Fan, Doherty, Declan, 'Anoob Joseph'


> > > > > > > > > This action type allows the burst of symmetric crypto workload using
> > > the
> > > > > > > same
> > > > > > > > > algorithm, key, and direction being processed by CPU cycles
> > > > > synchronously.
> > > > > > > > > This flexible action type does not require external hardware
> > > involvement,
> > > > > > > > > having the crypto workload processed synchronously, and is more
> > > > > > > performant
> > > > > > > > > than Cryptodev SW PMD due to the saved cycles on removed "async
> > > > > mode
> > > > > > > > > simulation" as well as 3 cacheline access of the crypto ops.
> > > > > > > >
> > > > > > > > Does that mean application will not call the cryptodev_enqueue_burst
> > > and
> > > > > > > corresponding dequeue burst.
> > > > > > >
> > > > > > > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> > > > > > >
> > > > > > > > It would be a new API something like process_packets and it will have
> > > the
> > > > > > > crypto processed packets while returning from the API?
> > > > > > >
> > > > > > > Yes, though the plan is that API will operate on raw data buffers, not
> > > mbufs.
> > > > > > >
> > > > > > > >
> > > > > > > > I still do not understand why we cannot do with the conventional
> > > crypto lib
> > > > > > > only.
> > > > > > > > As far as I can understand, you are not doing any protocol processing
> > > or
> > > > > any
> > > > > > > value add
> > > > > > > > To the crypto processing. IMO, you just need a synchronous crypto
> > > > > processing
> > > > > > > API which
> > > > > > > > Can be defined in cryptodev, you don't need to re-create a crypto
> > > session
> > > > > in
> > > > > > > the name of
> > > > > > > > Security session in the driver just to do a synchronous processing.
> > > > > > >
> > > > > > > I suppose your question is why not to have
> > > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > > The main reason is that would require disruptive changes in existing
> > > > > cryptodev
> > > > > > > API
> > > > > > > (would cause ABI/API breakage).
> > > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need some
> > > extra
> > > > > > > information
> > > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > > (cipher offset from the start of the buffer, might be something extra in
> > > > > future).
> > > > > >
> > > > > > Cipher offset will be part of rte_crypto_op.
> > > > >
> > > > > fill/read (+ alloc/free) is one of the main things that slowdown current
> > > crypto-op
> > > > > approach.
> > > > > That's why the general idea - have all data that wouldn't change from packet
> > > to
> > > > > packet
> > > > > included into the session and setup it once at session_init().
> > > >
> > > > I agree that you cannot use crypto-op.
> > > > You can have the new API in crypto.
> > > > As per the current patch, you only need cipher_offset which you can have it as
> > > a parameter until
> > > > You get it approved in the crypto xform. I believe it will be beneficial in case of
> > > other crypto cases as well.
> > > > We can have cipher offset at both places(crypto-op and cipher_xform). It will
> > > give flexibility to the user to
> > > > override it.
> > >
> > > After having another thought on your proposal:
> > > Probably we can introduce new rte_crypto_sym_xform_types for CPU related
> > > stuff here?
> >
> > I also thought of adding new xforms, but that wont serve the purpose for may be all the cases.
> > You would be needing all information currently available in the current xforms.
> > So if you are adding new fields in the new xform, the size will be more than that of the union of xforms.
> > ABI breakage would still be there.
> >
> > If you think a valid compression of the AEAD xform can be done, then that can be done for each of the
> > Xforms and we can have a solution to this issue.
> 
> I think that we can re-use iv.offset for our purposes (for crypto offset).
> So for now we can make that path work without any ABI breakage.
> Fan, please feel free to correct me here, if I missed something.
> If in future we would need to add some extra information it might
> require ABI breakage, though by now I don't envision anything particular to add.
> Anyway, if there is no objection to go that way, we can try to make
> these changes for v2.
> 

Actually, after looking at it more deeply it appears not that easy as I thought it would be :)
Below is a very draft version of proposed API additions.
I think it avoids ABI breakages right now and provides enough flexibility for future extensions (if any). 
For now, it doesn't address your comments about naming conventions (_CPU_ vs _SYNC_) , etc.
but I suppose is comprehensive enough to provide a main idea beyond it.
Akhil and other interested parties, please try to review and provide feedback ASAP,
as related changes would take some time and we still like to hit 19.11 deadline.
Konstantin

 diff --git a/lib/librte_cryptodev/rte_crypto_sym.h b/lib/librte_cryptodev/rte_crypto_sym.h
index bc8da2466..c03069e23 100644
--- a/lib/librte_cryptodev/rte_crypto_sym.h
+++ b/lib/librte_cryptodev/rte_crypto_sym.h
@@ -103,6 +103,9 @@ rte_crypto_cipher_operation_strings[];
  *
  * This structure contains data relating to Cipher (Encryption and Decryption)
  *  use to create a session.
+ * Actually I was wrong saying that we don't have free space inside xforms.
+ * Making key struct packed (see below) allow us to regain 6B that could be
+ * used for future extensions.
  */
 struct rte_crypto_cipher_xform {
        enum rte_crypto_cipher_operation op;
@@ -116,7 +119,25 @@ struct rte_crypto_cipher_xform {
        struct {
                const uint8_t *data;    /**< pointer to key data */
                uint16_t length;        /**< key length in bytes */
-       } key;
+       } __attribute__((__packed__)) key;
+
+       /**
+         * offset for cipher to start within user provided data buffer.
+        * Fan suggested another (and less space consuming way) -
+         * reuse iv.offset space below, by changing:
+        * struct {uint16_t offset, length;} iv;
+        * to uunamed union:
+        * union {
+        *      struct {uint16_t offset, length;} iv;
+        *      struct {uint16_t iv_len, crypto_offset} cpu_crypto_param;
+        * };
+        * Both approaches seems ok to me in general.
+        * Comments/suggestions are welcome.
+         */
+       uint16_t offset;
+
+       uint8_t reserved1[4];
+
        /**< Cipher key
         *
         * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operation, key.data will
@@ -284,7 +305,7 @@ struct rte_crypto_auth_xform {
        struct {
                const uint8_t *data;    /**< pointer to key data */
                uint16_t length;        /**< key length in bytes */
-       } key;
+       } __attribute__((__packed__)) key;
        /**< Authentication key data.
         * The authentication key length MUST be less than or equal to the
         * block size of the algorithm. It is the callers responsibility to
@@ -292,6 +313,8 @@ struct rte_crypto_auth_xform {
         * (for example RFC 2104, FIPS 198a).
         */

+       uint8_t reserved1[6];
+
        struct {
                uint16_t offset;
                /**< Starting point for Initialisation Vector or Counter,
@@ -376,7 +399,12 @@ struct rte_crypto_aead_xform {
        struct {
                const uint8_t *data;    /**< pointer to key data */
                uint16_t length;        /**< key length in bytes */
-       } key;
+       } __attribute__((__packed__)) key;
+
+       /** offset for cipher to start within data buffer */
+       uint16_t cipher_offset;
+
+       uint8_t reserved1[4];

        struct {
                uint16_t offset;
diff --git a/lib/librte_cryptodev/rte_cryptodev.h b/lib/librte_cryptodev/rte_cryptodev.h
index e175b838c..c0c7bfed7 100644
--- a/lib/librte_cryptodev/rte_cryptodev.h
+++ b/lib/librte_cryptodev/rte_cryptodev.h
@@ -1272,6 +1272,101 @@ void *
 rte_cryptodev_sym_session_get_user_data(
                                        struct rte_cryptodev_sym_session *sess);

+/*
+ * After several thoughts decided not to try to squeeze CPU_CRYPTO
+ * into existing rte_crypto_sym_session structure/API, but instead
+ * introduce an extentsion to it via new fully opaque
+ * struct rte_crypto_cpu_sym_session and additional related API.
+ * Main points:
+ * - Current crypto-dev API is reasonably mature and it is desirable
+ *   to keep it unchanged (API/ABI stability). From other side, this
+ *   new sync API is new one and probably would require extra changes.
+ *   Having it as a new one allows to mark it as experimental, without
+ *   affecting existing one.
+ * - Fully opaque cpu_sym_session structure gives more flexibility
+ *   to the PMD writers and again allows to avoid ABI breakages in future.
+ * - process() function per set of xforms
+ *   allows to expose different process() functions for different
+ *   xform combinations. PMD writer can decide, does he wants to
+ *   push all supported algorithms into one process() function,
+ *   or spread it across several ones.
+ *   I.E. More flexibility for PMD writer.
+ * - Not storing process() pointer inside the session -
+ *   Allows user to choose does he want to store a process() pointer
+ *   per session, or per group of sessions for that device that share
+ *   the same input xforms. I.E. extra flexibility for the user,
+ *   plus allows us to keep cpu_sym_session totally opaque, see above.
+ * Sketched usage model:
+ * ....
+ * /* control path, alloc/init session */
+ * int32_t sz = rte_crypto_cpu_sym_session_size(dev_id, &xform);
+ * struct rte_crypto_cpu_sym_session *ses = user_alloc(..., sz);
+ * rte_crypto_cpu_sym_process_t process =
+ *     rte_crypto_cpu_sym_session_func(dev_id, &xform);
+ * rte_crypto_cpu_sym_session_init(dev_id, ses, &xform);
+ * ...
+ * /* data-path*/
+ * process(ses, ....);
+ * ....
+ * /* control path, termiante/free session */
+ * rte_crypto_cpu_sym_session_fini(dev_id, ses);
+ */
+
+/**
+ * vector structure, contains pointer to vector array and the length
+ * of the array
+ */
+struct rte_crypto_vec {
+       struct iovec *vec;
+       uint32_t num;
+};
+
+/*
+ * Data-path bulk process crypto function.
+ */
+typedef void (*rte_crypto_cpu_sym_process_t)(
+               struct rte_crypto_cpu_sym_session *sess,
+               struct rte_crypto_vec buf[], void *iv[], void *aad[],
+               void *digest[], int status[], uint32_t num);
+/*
+ * for given device return process function specific to input xforms
+ * on error - return NULL and set rte_errno value.
+ * Note that for same input xfroms for the same device should return
+ * the same process function.
+ */
+__rte_experimental
+rte_crypto_cpu_sym_process_t
+rte_crypto_cpu_sym_session_func(uint8_t dev_id,
+                       const struct rte_crypto_sym_xform *xforms);
+
+/*
+ * Return required session size in bytes for given set of xforms.
+ * if xforms == NULL, then return the max possible session size,
+ * that would fit session for any supported by the device algorithm.
+ * if CPU mode is not supported at all, or requeted in xform
+ * algorithm is not supported, then return -ENOTSUP.
+ */
+__rte_experimental
+int
+rte_crypto_cpu_sym_session_size(uint8_t dev_id,
+                       const struct rte_crypto_sym_xform *xforms);
+
+/*
+ * Initialize session.
+ * It is caller responsibility to allocate enough space for it.
+ * See rte_crypto_cpu_sym_session_size above.
+ */
+__rte_experimental
+int rte_crypto_cpu_sym_session_init(uint8_t dev_id,
+                       struct rte_crypto_cpu_sym_session *sess,
+                       const struct rte_crypto_sym_xform *xforms);
+
+__rte_experimental
+void
+rte_crypto_cpu_sym_session_fini(uint8_t dev_id,
+                       struct rte_crypto_cpu_sym_session *sess);
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h b/lib/librte_cryptodev/rte_cryptodev_pmd.h
index defe05ea0..ed7e63fab 100644
--- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
+++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
@@ -310,6 +310,20 @@ typedef void (*cryptodev_sym_free_session_t)(struct rte_cryptodev *dev,
 typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev *dev,
                struct rte_cryptodev_asym_session *sess);

+typedef int (*cryptodev_cpu_sym_session_size_t) (struct rte_cryptodev *dev,
+                       const struct rte_crypto_sym_xform *xforms);
+
+typedef int (*cryptodev_cpu_sym_session_init_t) (struct rte_cryptodev *dev,
+                       struct rte_crypto_cpu_sym_session *sess,
+                       const struct rte_crypto_sym_xform *xforms);
+
+typedef void (*cryptodev_cpu_sym_session_fini_t) (struct rte_cryptodev *dev,
+                       struct rte_crypto_cpu_sym_session *sess);
+
+typedef rte_crypto_cpu_sym_process_t (*cryptodev_cpu_sym_session_func_t) (
+                       struct rte_cryptodev *dev,
+                       const struct rte_crypto_sym_xform *xforms);
+
 /** Crypto device operations function pointer table */
 struct rte_cryptodev_ops {
        cryptodev_configure_t dev_configure;    /**< Configure device. */
@@ -343,6 +357,11 @@ struct rte_cryptodev_ops {
        /**< Clear a Crypto sessions private data. */
        cryptodev_asym_free_session_t asym_session_clear;
        /**< Clear a Crypto sessions private data. */
+
+       cryptodev_cpu_sym_session_size_t sym_cpu_session_get_size;
+       cryptodev_cpu_sym_session_func_t sym_cpu_session_get_func;
+       cryptodev_cpu_sym_session_init_t sym_cpu_session_init;
+       cryptodev_cpu_sym_session_fini_t sym_cpu_session_fini;
 };





^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [PATCH 08/10] ipsec: add rte_security cpu_crypto action support
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 08/10] ipsec: add rte_security cpu_crypto action support Fan Zhang
@ 2019-09-26 23:20     ` Ananyev, Konstantin
  2019-09-27 10:38     ` Ananyev, Konstantin
  1 sibling, 0 replies; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-09-26 23:20 UTC (permalink / raw)
  To: Zhang, Roy Fan, dev; +Cc: Doherty, Declan, akhil.goyal

Hi Fan,

...
> diff --git a/lib/librte_ipsec/esp_outb.c b/lib/librte_ipsec/esp_outb.c
> index 55799a867..097cb663f 100644
> --- a/lib/librte_ipsec/esp_outb.c
> +++ b/lib/librte_ipsec/esp_outb.c
> @@ -403,6 +403,292 @@ esp_outb_trs_prepare(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
>  	return k;
>  }
> 
> +
> +static inline int
> +outb_sync_crypto_proc_prepare(struct rte_mbuf *m, const struct rte_ipsec_sa *sa,
> +		const uint64_t ivp[IPSEC_MAX_IV_QWORD],
> +		const union sym_op_data *icv, uint32_t hlen, uint32_t plen,
> +		struct rte_security_vec *buf, struct iovec *cur_vec, void *iv,
> +		void **aad, void **digest)
> +{
> +	struct rte_mbuf *ms;
> +	struct aead_gcm_iv *gcm;
> +	struct aesctr_cnt_blk *ctr;
> +	struct iovec *vec = cur_vec;
> +	uint32_t left, off = 0, n_seg = 0;

Please separate variable definition and value assignment.
It makes it hard to read, plus we don't do that in the rest of the library,
so better to follow rest of the code style. 

> +	uint32_t algo;
> +
> +	algo = sa->algo_type;
> +
> +	switch (algo) {
> +	case ALGO_TYPE_AES_GCM:
> +		gcm = iv;
> +		aead_gcm_iv_fill(gcm, ivp[0], sa->salt);
> +		*aad = (void *)(icv->va + sa->icv_len);

Why do we want to allocate aad inside the packet at all?
Why not just to do that on the stack instead?
In that case you probably wouldn't need this icv stuff at all to be passed to that function.

> +		off = sa->ctp.cipher.offset + hlen;
> +		break;
> +	case ALGO_TYPE_AES_CBC:
> +	case ALGO_TYPE_3DES_CBC:
> +		off = sa->ctp.auth.offset + hlen;
> +		break;
> +	case ALGO_TYPE_AES_CTR:
> +		ctr = iv;
> +		aes_ctr_cnt_blk_fill(ctr, ivp[0], sa->salt);
> +		break;
> +	case ALGO_TYPE_NULL:
> +		break;

For latest two, why off is zero?
Shouldn't it at least be 'hlen'?
In fact, I think it needs to be: sa->ctp.auth.offset + hlen;

> +	}
> +
> +	*digest = (void *)icv->va;

Could be done in the upper layer function, together with aad assignment, I think.

Looking at this function, it seems to consist of 2 separate parts:
1. calculates offset and generates iv
2. setup iovec[].
Probably worth to split it into 2 separate functions like that.
Would be much easier to read/understand.

> +
> +	left = sa->ctp.cipher.length + plen;
> +
> +	ms = mbuf_get_seg_ofs(m, &off);
> +	if (!ms)
> +		return -1;

outb_tun_pkt_prepare() should already check that we have a valid packet.
I don't think there is a need to check for any failure here.
Another thing, our esp header will be in the first segment for sure,
so do we need get_seg_ofs() here at all? 

> +
> +	while (n_seg < RTE_LIBRTE_IP_FRAG_MAX_FRAG && left && ms) {

I don't think this is right, we shouldn't impose additional limitations to
the number of segments in the packet.

> +		uint32_t len = RTE_MIN(left, ms->data_len - off);
> +
> +		vec->iov_base = rte_pktmbuf_mtod_offset(ms, void *, off);
> +		vec->iov_len = len;
> +
> +		left -= len;
> +		vec++;
> +		n_seg++;
> +		ms = ms->next;
> +		off = 0;


Whole construction seems a bit over-complicated here...
Why just not have a separate function that would dill iovec[] from mbuf
And return an error if there is not enough iovec[] entries?
Something like:

static inline int
mbuf_to_iovec(const struct rte_mbuf *mb, uint32_t ofs, uint32_t len, struct iovec vec[], uint32_t num)
{
     uint32_t i;
     if (mb->nb_seg > num)
        return - mb->nb_seg;

    vec[0].iov_base =  rte_pktmbuf_mtod_offset(mb, void *, off);
    vec[0].iov_len = mb->data_len - off;

    for (i = 1, ms = mb->next; mb != NULL; ms = ms->next, i++) {
        vec[i].iov_base = rte_pktmbuf_mtod(ms);
        vec[i].iov_len = ms->data_len;
    }

   vec[i].iov_len -= mb->pkt_len - len;
   return i;
}

Then we can use that function to fill our iovec[] in a loop.

> +	}
> +
> +	if (left)
> +		return -1;
> +
> +	buf->vec = cur_vec;
> +	buf->num = n_seg;
> +
> +	return n_seg;
> +}
> +
> +/**
> + * Local post process function prototype that same as process function prototype
> + * as rte_ipsec_sa_pkt_func's process().
> + */
> +typedef uint16_t (*sync_crypto_post_process)(const struct rte_ipsec_session *ss,
> +				struct rte_mbuf *mb[],
> +				uint16_t num);

Stylish thing: typdef newtype_t ....

> +static uint16_t
> +esp_outb_tun_sync_crypto_process(const struct rte_ipsec_session *ss,
> +		struct rte_mbuf *mb[], uint16_t num,
> +		sync_crypto_post_process post_process)
> +{
> +	uint64_t sqn;
> +	rte_be64_t sqc;
> +	struct rte_ipsec_sa *sa;
> +	struct rte_security_ctx *ctx;
> +	struct rte_security_session *rss;
> +	union sym_op_data icv;
> +	struct rte_security_vec buf[num];
> +	struct iovec vec[RTE_LIBRTE_IP_FRAG_MAX_FRAG * num];
> +	uint32_t vec_idx = 0;
> +	void *aad[num];
> +	void *digest[num];
> +	void *iv[num];
> +	uint8_t ivs[num][IPSEC_MAX_IV_SIZE];
> +	uint64_t ivp[IPSEC_MAX_IV_QWORD];

Why do we need both ivs and ivp?

> +	int status[num];
> +	uint32_t dr[num];
> +	uint32_t i, n, k;
> +	int32_t rc;
> +
> +	sa = ss->sa;
> +	ctx = ss->security.ctx;
> +	rss = ss->security.ses;
> +
> +	k = 0;
> +	n = num;
> +	sqn = esn_outb_update_sqn(sa, &n);
> +	if (n != num)
> +		rte_errno = EOVERFLOW;
> +
> +	for (i = 0; i != n; i++) {
> +		sqc = rte_cpu_to_be_64(sqn + i);
> +		gen_iv(ivp, sqc);
> +
> +		/* try to update the packet itself */
> +		rc = outb_tun_pkt_prepare(sa, sqc, ivp, mb[i], &icv,
> +				sa->sqh_len);
> +
> +		/* success, setup crypto op */
> +		if (rc >= 0) {
> +			outb_pkt_xprepare(sa, sqc, &icv);

We probably need something like outb_pkt_sync_xprepare(sa, sqc, &aad[i]); here.
To avoid using space in the packet for aad.

> +
> +			iv[k] = (void *)ivs[k];

Do we really need type conversion here?

> +			rc = outb_sync_crypto_proc_prepare(mb[i], sa, ivp, &icv,
> +					0, rc, &buf[k], &vec[vec_idx], iv[k],
> +					&aad[k], &digest[k]);



> +			if (rc < 0) {
> +				dr[i - k] = i;
> +				rte_errno = -rc;
> +				continue;
> +			}
> +
> +			vec_idx += rc;
> +			k++;
> +		/* failure, put packet into the death-row */
> +		} else {
> +			dr[i - k] = i;
> +			rte_errno = -rc;
> +		}
> +	}
> +
> +	 /* copy not prepared mbufs beyond good ones */
> +	if (k != n && k != 0)
> +		move_bad_mbufs(mb, dr, n, n - k);
> +
> +	if (unlikely(k == 0)) {

I don't think 'unlikely' will make any difference here here.

> +		rte_errno = EBADMSG;
> +		return 0;
> +	}
> +
> +	/* process the packets */
> +	n = 0;
> +	rte_security_process_cpu_crypto_bulk(ctx, rss, buf, iv, aad, digest,
> +			status, k);

Looking at the code below, I think it will be plausible to make 
rte_security_process_cpu_crypto_bulk() to return number of failures
(or number of succese).

> +	/* move failed process packets to dr */
> +	for (i = 0; i < n; i++) {

That loop will never be executed.
Should be i < k.

> +		if (status[i])
> +			dr[n++] = i;

Forgot to set rte_errno.

> +	}
> +
> +	if (n)

if (n != 0 && n != k)

> +		move_bad_mbufs(mb, dr, k, n);
> +
> +	return post_process(ss, mb, k - n);
> +}
> +
> +static uint16_t
> +esp_outb_trs_sync_crypto_process(const struct rte_ipsec_session *ss,
> +		struct rte_mbuf *mb[], uint16_t num,
> +		sync_crypto_post_process post_process)
> +
> +{
> +	uint64_t sqn;
> +	rte_be64_t sqc;
> +	struct rte_ipsec_sa *sa;
> +	struct rte_security_ctx *ctx;
> +	struct rte_security_session *rss;
> +	union sym_op_data icv;
> +	struct rte_security_vec buf[num];
> +	struct iovec vec[RTE_LIBRTE_IP_FRAG_MAX_FRAG * num];
> +	uint32_t vec_idx = 0;
> +	void *aad[num];
> +	void *digest[num];
> +	uint8_t ivs[num][IPSEC_MAX_IV_SIZE];
> +	void *iv[num];
> +	int status[num];
> +	uint64_t ivp[IPSEC_MAX_IV_QWORD];
> +	uint32_t dr[num];
> +	uint32_t i, n, k;
> +	uint32_t l2, l3;
> +	int32_t rc;
> +
> +	sa = ss->sa;
> +	ctx = ss->security.ctx;
> +	rss = ss->security.ses;
> +
> +	k = 0;
> +	n = num;
> +	sqn = esn_outb_update_sqn(sa, &n);
> +	if (n != num)
> +		rte_errno = EOVERFLOW;
> +
> +	for (i = 0; i != n; i++) {
> +		l2 = mb[i]->l2_len;
> +		l3 = mb[i]->l3_len;
> +
> +		sqc = rte_cpu_to_be_64(sqn + i);
> +		gen_iv(ivp, sqc);
> +
> +		/* try to update the packet itself */
> +		rc = outb_trs_pkt_prepare(sa, sqc, ivp, mb[i], l2, l3, &icv,
> +				sa->sqh_len);
> +
> +		/* success, setup crypto op */
> +		if (rc >= 0) {
> +			outb_pkt_xprepare(sa, sqc, &icv);
> +
> +			iv[k] = (void *)ivs[k];
> +
> +			rc = outb_sync_crypto_proc_prepare(mb[i], sa, ivp, &icv,
> +					l2 + l3, rc, &buf[k], &vec[vec_idx],
> +					iv[k], &aad[k], &digest[k]);
> +			if (rc < 0) {
> +				dr[i - k] = i;
> +				rte_errno = -rc;
> +				continue;
> +			}
> +
> +			vec_idx += rc;
> +			k++;
> +		/* failure, put packet into the death-row */
> +		} else {
> +			dr[i - k] = i;
> +			rte_errno = -rc;
> +		}
> +	}
> +
> +	 /* copy not prepared mbufs beyond good ones */
> +	if (k != n && k != 0)
> +		move_bad_mbufs(mb, dr, n, n - k);


You don't really need to do it here.
Just one such thing at the very end should be enough.

> +
> +	/* process the packets */
> +	n = 0;
> +	rte_security_process_cpu_crypto_bulk(ctx, rss, buf, iv, aad, digest,
> +			status, k);
> +	/* move failed process packets to dr */
> +	for (i = 0; i < k; i++) {
> +		if (status[i])
> +			dr[n++] = i;
> +	}
> +
> +	if (n)
> +		move_bad_mbufs(mb, dr, k, n);
> +
> +	return post_process(ss, mb, k - n);
> +}
> +
> +uint16_t
> +esp_outb_tun_sync_crpyto_sqh_process(const struct rte_ipsec_session *ss,
> +		struct rte_mbuf *mb[], uint16_t num)
> +{
> +	return esp_outb_tun_sync_crypto_process(ss, mb, num,
> +			esp_outb_sqh_process);

esp_outb_sqh_process() relies on PKT_RX_SEC_OFFLOAD_FAILED been set
in mb->ol_flags for failed packets.
At first for _sync_ case no-one will set it for you.
Second - for _sync_ you don't really need that it is just an extra overhead here.
So I think you can't reuse this function without some modifications here.
Probably easier to make a new one (and extract some common code into
another helper function that esp_out_sqh_process and new one can call).

> +}
> +
> +uint16_t
> +esp_outb_tun_sync_crpyto_flag_process(const struct rte_ipsec_session *ss,
> +		struct rte_mbuf *mb[], uint16_t num)
> +{
> +	return esp_outb_tun_sync_crypto_process(ss, mb, num,
> +			esp_outb_pkt_flag_process);

Same as above, plus the fact that you made esp_outb_pkt_flag_process()
is not a static one, so compiler wouldn't be able to inline it.

> +}
> +
> +uint16_t
> +esp_outb_trs_sync_crpyto_sqh_process(const struct rte_ipsec_session *ss,
> +		struct rte_mbuf *mb[], uint16_t num)
> +{
> +	return esp_outb_trs_sync_crypto_process(ss, mb, num,
> +			esp_outb_sqh_process);
> +}
> +
> +uint16_t
> +esp_outb_trs_sync_crpyto_flag_process(const struct rte_ipsec_session *ss,
> +		struct rte_mbuf *mb[], uint16_t num)
> +{
> +	return esp_outb_trs_sync_crypto_process(ss, mb, num,
> +			esp_outb_pkt_flag_process);
> +}
> +
>  /*
>   * process outbound packets for SA with ESN support,
>   * for algorithms that require SQN.hibits to be implictly included
> @@ -410,8 +696,8 @@ esp_outb_trs_prepare(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
>   * In that case we have to move ICV bytes back to their proper place.
>   */
>  uint16_t
> -esp_outb_sqh_process(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
> -	uint16_t num)
> +esp_outb_sqh_process(const struct rte_ipsec_session *ss,
> +	struct rte_mbuf *mb[], uint16_t num)

Any purpose for that change?

>  {
>  	uint32_t i, k, icv_len, *icv;
>  	struct rte_mbuf *ml;
> diff --git a/lib/librte_ipsec/sa.c b/lib/librte_ipsec/sa.c
> index 23d394b46..31ffbce2c 100644
> --- a/lib/librte_ipsec/sa.c
> +++ b/lib/librte_ipsec/sa.c
> @@ -544,9 +544,9 @@ lksd_proto_prepare(const struct rte_ipsec_session *ss,
>   * - inbound/outbound for RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
>   * - outbound for RTE_SECURITY_ACTION_TYPE_NONE when ESN is disabled
>   */
> -static uint16_t
> -pkt_flag_process(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
> -	uint16_t num)
> +uint16_t
> +esp_outb_pkt_flag_process(const struct rte_ipsec_session *ss,
> +		struct rte_mbuf *mb[], uint16_t num)


Why to rename this function?
As comment above it states, the function is used for both inbound and outbound
code path. 
Such renaming seems misleading to me.

>  {
>  	uint32_t i, k;
>  	uint32_t dr[num];
> @@ -599,12 +599,48 @@ lksd_none_pkt_func_select(const struct rte_ipsec_sa *sa,
>  	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TUNLV6):
>  		pf->prepare = esp_outb_tun_prepare;
>  		pf->process = (sa->sqh_len != 0) ?
> -			esp_outb_sqh_process : pkt_flag_process;
> +			esp_outb_sqh_process : esp_outb_pkt_flag_process;
>  		break;
>  	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TRANS):
>  		pf->prepare = esp_outb_trs_prepare;
>  		pf->process = (sa->sqh_len != 0) ?
> -			esp_outb_sqh_process : pkt_flag_process;
> +			esp_outb_sqh_process : esp_outb_pkt_flag_process;
> +		break;
> +	default:
> +		rc = -ENOTSUP;
> +	}
> +
> +	return rc;
> +}
> +
> +static int
> +lksd_sync_crypto_pkt_func_select(const struct rte_ipsec_sa *sa,
> +		struct rte_ipsec_sa_pkt_func *pf)

As a nit: probably no point to have lksd_prefix for _sync_ functions.

> +{
> +	int32_t rc;
> +
> +	static const uint64_t msk = RTE_IPSEC_SATP_DIR_MASK |
> +			RTE_IPSEC_SATP_MODE_MASK;
> +
> +	rc = 0;
> +	switch (sa->type & msk) {
> +	case (RTE_IPSEC_SATP_DIR_IB | RTE_IPSEC_SATP_MODE_TUNLV4):
> +	case (RTE_IPSEC_SATP_DIR_IB | RTE_IPSEC_SATP_MODE_TUNLV6):
> +		pf->process = esp_inb_tun_sync_crypto_pkt_process;
> +		break;
> +	case (RTE_IPSEC_SATP_DIR_IB | RTE_IPSEC_SATP_MODE_TRANS):
> +		pf->process = esp_inb_trs_sync_crypto_pkt_process;
> +		break;
> +	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TUNLV4):
> +	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TUNLV6):
> +		pf->process = (sa->sqh_len != 0) ?
> +			esp_outb_tun_sync_crpyto_sqh_process :
> +			esp_outb_tun_sync_crpyto_flag_process;
> +		break;
> +	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TRANS):
> +		pf->process = (sa->sqh_len != 0) ?
> +			esp_outb_trs_sync_crpyto_sqh_process :
> +			esp_outb_trs_sync_crpyto_flag_process;
>  		break;
>  	default:
>  		rc = -ENOTSUP;
> @@ -672,13 +708,16 @@ ipsec_sa_pkt_func_select(const struct rte_ipsec_session *ss,
>  	case RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL:
>  		if ((sa->type & RTE_IPSEC_SATP_DIR_MASK) ==
>  				RTE_IPSEC_SATP_DIR_IB)
> -			pf->process = pkt_flag_process;
> +			pf->process = esp_outb_pkt_flag_process;
>  		else
>  			pf->process = inline_proto_outb_pkt_process;
>  		break;
>  	case RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL:
>  		pf->prepare = lksd_proto_prepare;
> -		pf->process = pkt_flag_process;
> +		pf->process = esp_outb_pkt_flag_process;
> +		break;
> +	case RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO:
> +		rc = lksd_sync_crypto_pkt_func_select(sa, pf);
>  		break;
>  	default:
>  		rc = -ENOTSUP;
> diff --git a/lib/librte_ipsec/sa.h b/lib/librte_ipsec/sa.h
> index 51e69ad05..02c7abc60 100644
> --- a/lib/librte_ipsec/sa.h
> +++ b/lib/librte_ipsec/sa.h
> @@ -156,6 +156,14 @@ uint16_t
>  inline_inb_trs_pkt_process(const struct rte_ipsec_session *ss,
>  	struct rte_mbuf *mb[], uint16_t num);
> 
> +uint16_t
> +esp_inb_tun_sync_crypto_pkt_process(const struct rte_ipsec_session *ss,
> +		struct rte_mbuf *mb[], uint16_t num);
> +
> +uint16_t
> +esp_inb_trs_sync_crypto_pkt_process(const struct rte_ipsec_session *ss,
> +		struct rte_mbuf *mb[], uint16_t num);
> +
>  /* outbound processing */
> 
>  uint16_t
> @@ -170,6 +178,10 @@ uint16_t
>  esp_outb_sqh_process(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
>  	uint16_t num);
> 
> +uint16_t
> +esp_outb_pkt_flag_process(const struct rte_ipsec_session *ss,
> +	struct rte_mbuf *mb[], uint16_t num);
> +
>  uint16_t
>  inline_outb_tun_pkt_process(const struct rte_ipsec_session *ss,
>  	struct rte_mbuf *mb[], uint16_t num);
> @@ -182,4 +194,21 @@ uint16_t
>  inline_proto_outb_pkt_process(const struct rte_ipsec_session *ss,
>  	struct rte_mbuf *mb[], uint16_t num);
> 
> +uint16_t
> +esp_outb_tun_sync_crpyto_sqh_process(const struct rte_ipsec_session *ss,
> +		struct rte_mbuf *mb[], uint16_t num);
> +
> +uint16_t
> +esp_outb_tun_sync_crpyto_flag_process(const struct rte_ipsec_session *ss,
> +		struct rte_mbuf *mb[], uint16_t num);
> +
> +uint16_t
> +esp_outb_trs_sync_crpyto_sqh_process(const struct rte_ipsec_session *ss,
> +		struct rte_mbuf *mb[], uint16_t num);
> +
> +uint16_t
> +esp_outb_trs_sync_crpyto_flag_process(const struct rte_ipsec_session *ss,
> +		struct rte_mbuf *mb[], uint16_t num);
> +
> +
>  #endif /* _SA_H_ */


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-09-25 18:24                       ` Ananyev, Konstantin
@ 2019-09-27  9:26                         ` Akhil Goyal
  2019-09-30 12:22                           ` Ananyev, Konstantin
  0 siblings, 1 reply; 87+ messages in thread
From: Akhil Goyal @ 2019-09-27  9:26 UTC (permalink / raw)
  To: Ananyev, Konstantin, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon'
  Cc: Zhang, Roy Fan, Doherty, Declan, 'Anoob Joseph'

Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Sent: Wednesday, September 25, 2019 11:54 PM
> To: Akhil Goyal <akhil.goyal@nxp.com>; 'dev@dpdk.org' <dev@dpdk.org>; De
> Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; 'Thomas Monjalon'
> <thomas@monjalon.net>
> Cc: Zhang, Roy Fan <roy.fan.zhang@intel.com>; Doherty, Declan
> <declan.doherty@intel.com>; 'Anoob Joseph' <anoobj@marvell.com>
> Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
> 
> 
> > > > > > > > > > This action type allows the burst of symmetric crypto workload
> using
> > > > the
> > > > > > > > same
> > > > > > > > > > algorithm, key, and direction being processed by CPU cycles
> > > > > > synchronously.
> > > > > > > > > > This flexible action type does not require external hardware
> > > > involvement,
> > > > > > > > > > having the crypto workload processed synchronously, and is
> more
> > > > > > > > performant
> > > > > > > > > > than Cryptodev SW PMD due to the saved cycles on removed
> "async
> > > > > > mode
> > > > > > > > > > simulation" as well as 3 cacheline access of the crypto ops.
> > > > > > > > >
> > > > > > > > > Does that mean application will not call the
> cryptodev_enqueue_burst
> > > > and
> > > > > > > > corresponding dequeue burst.
> > > > > > > >
> > > > > > > > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> > > > > > > >
> > > > > > > > > It would be a new API something like process_packets and it will
> have
> > > > the
> > > > > > > > crypto processed packets while returning from the API?
> > > > > > > >
> > > > > > > > Yes, though the plan is that API will operate on raw data buffers,
> not
> > > > mbufs.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > I still do not understand why we cannot do with the conventional
> > > > crypto lib
> > > > > > > > only.
> > > > > > > > > As far as I can understand, you are not doing any protocol
> processing
> > > > or
> > > > > > any
> > > > > > > > value add
> > > > > > > > > To the crypto processing. IMO, you just need a synchronous
> crypto
> > > > > > processing
> > > > > > > > API which
> > > > > > > > > Can be defined in cryptodev, you don't need to re-create a crypto
> > > > session
> > > > > > in
> > > > > > > > the name of
> > > > > > > > > Security session in the driver just to do a synchronous processing.
> > > > > > > >
> > > > > > > > I suppose your question is why not to have
> > > > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > > > The main reason is that would require disruptive changes in existing
> > > > > > cryptodev
> > > > > > > > API
> > > > > > > > (would cause ABI/API breakage).
> > > > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need
> some
> > > > extra
> > > > > > > > information
> > > > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > > > (cipher offset from the start of the buffer, might be something extra
> in
> > > > > > future).
> > > > > > >
> > > > > > > Cipher offset will be part of rte_crypto_op.
> > > > > >
> > > > > > fill/read (+ alloc/free) is one of the main things that slowdown current
> > > > crypto-op
> > > > > > approach.
> > > > > > That's why the general idea - have all data that wouldn't change from
> packet
> > > > to
> > > > > > packet
> > > > > > included into the session and setup it once at session_init().
> > > > >
> > > > > I agree that you cannot use crypto-op.
> > > > > You can have the new API in crypto.
> > > > > As per the current patch, you only need cipher_offset which you can have
> it as
> > > > a parameter until
> > > > > You get it approved in the crypto xform. I believe it will be beneficial in
> case of
> > > > other crypto cases as well.
> > > > > We can have cipher offset at both places(crypto-op and cipher_xform). It
> will
> > > > give flexibility to the user to
> > > > > override it.
> > > >
> > > > After having another thought on your proposal:
> > > > Probably we can introduce new rte_crypto_sym_xform_types for CPU
> related
> > > > stuff here?
> > >
> > > I also thought of adding new xforms, but that wont serve the purpose for
> may be all the cases.
> > > You would be needing all information currently available in the current
> xforms.
> > > So if you are adding new fields in the new xform, the size will be more than
> that of the union of xforms.
> > > ABI breakage would still be there.
> > >
> > > If you think a valid compression of the AEAD xform can be done, then that
> can be done for each of the
> > > Xforms and we can have a solution to this issue.
> >
> > I think that we can re-use iv.offset for our purposes (for crypto offset).
> > So for now we can make that path work without any ABI breakage.
> > Fan, please feel free to correct me here, if I missed something.
> > If in future we would need to add some extra information it might
> > require ABI breakage, though by now I don't envision anything particular to
> add.
> > Anyway, if there is no objection to go that way, we can try to make
> > these changes for v2.
> >
> 
> Actually, after looking at it more deeply it appears not that easy as I thought it
> would be :)
> Below is a very draft version of proposed API additions.
> I think it avoids ABI breakages right now and provides enough flexibility for
> future extensions (if any).
> For now, it doesn't address your comments about naming conventions (_CPU_
> vs _SYNC_) , etc.
> but I suppose is comprehensive enough to provide a main idea beyond it.
> Akhil and other interested parties, please try to review and provide feedback
> ASAP,
> as related changes would take some time and we still like to hit 19.11 deadline.
> Konstantin
> 
>  diff --git a/lib/librte_cryptodev/rte_crypto_sym.h
> b/lib/librte_cryptodev/rte_crypto_sym.h
> index bc8da2466..c03069e23 100644
> --- a/lib/librte_cryptodev/rte_crypto_sym.h
> +++ b/lib/librte_cryptodev/rte_crypto_sym.h
> @@ -103,6 +103,9 @@ rte_crypto_cipher_operation_strings[];
>   *
>   * This structure contains data relating to Cipher (Encryption and Decryption)
>   *  use to create a session.
> + * Actually I was wrong saying that we don't have free space inside xforms.
> + * Making key struct packed (see below) allow us to regain 6B that could be
> + * used for future extensions.
>   */
>  struct rte_crypto_cipher_xform {
>         enum rte_crypto_cipher_operation op;
> @@ -116,7 +119,25 @@ struct rte_crypto_cipher_xform {
>         struct {
>                 const uint8_t *data;    /**< pointer to key data */
>                 uint16_t length;        /**< key length in bytes */
> -       } key;
> +       } __attribute__((__packed__)) key;
> +
> +       /**
> +         * offset for cipher to start within user provided data buffer.
> +        * Fan suggested another (and less space consuming way) -
> +         * reuse iv.offset space below, by changing:
> +        * struct {uint16_t offset, length;} iv;
> +        * to uunamed union:
> +        * union {
> +        *      struct {uint16_t offset, length;} iv;
> +        *      struct {uint16_t iv_len, crypto_offset} cpu_crypto_param;
> +        * };
> +        * Both approaches seems ok to me in general.

No strong opinions here. OK with this one.

> +        * Comments/suggestions are welcome.
> +         */
> +       uint16_t offset;
> +
> +       uint8_t reserved1[4];
> +
>         /**< Cipher key
>          *
>          * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operation, key.data will
> @@ -284,7 +305,7 @@ struct rte_crypto_auth_xform {
>         struct {
>                 const uint8_t *data;    /**< pointer to key data */
>                 uint16_t length;        /**< key length in bytes */
> -       } key;
> +       } __attribute__((__packed__)) key;
>         /**< Authentication key data.
>          * The authentication key length MUST be less than or equal to the
>          * block size of the algorithm. It is the callers responsibility to
> @@ -292,6 +313,8 @@ struct rte_crypto_auth_xform {
>          * (for example RFC 2104, FIPS 198a).
>          */
> 
> +       uint8_t reserved1[6];
> +
>         struct {
>                 uint16_t offset;
>                 /**< Starting point for Initialisation Vector or Counter,
> @@ -376,7 +399,12 @@ struct rte_crypto_aead_xform {
>         struct {
>                 const uint8_t *data;    /**< pointer to key data */
>                 uint16_t length;        /**< key length in bytes */
> -       } key;
> +       } __attribute__((__packed__)) key;
> +
> +       /** offset for cipher to start within data buffer */
> +       uint16_t cipher_offset;
> +
> +       uint8_t reserved1[4];
> 
>         struct {
>                 uint16_t offset;
> diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> b/lib/librte_cryptodev/rte_cryptodev.h
> index e175b838c..c0c7bfed7 100644
> --- a/lib/librte_cryptodev/rte_cryptodev.h
> +++ b/lib/librte_cryptodev/rte_cryptodev.h
> @@ -1272,6 +1272,101 @@ void *
>  rte_cryptodev_sym_session_get_user_data(
>                                         struct rte_cryptodev_sym_session *sess);
> 
> +/*
> + * After several thoughts decided not to try to squeeze CPU_CRYPTO
> + * into existing rte_crypto_sym_session structure/API, but instead
> + * introduce an extentsion to it via new fully opaque
> + * struct rte_crypto_cpu_sym_session and additional related API.


What all things do we need to squeeze?
In this proposal I do not see the new struct cpu_sym_session  defined here.
I believe you will have same lib API/struct for cpu_sym_session  and sym_session.
I am not sure if that would be needed.
It would be internal to the driver that if synchronous processing is supported(from feature flag) and
Have relevant fields in xform(the newly added ones which are packed as per your suggestions) set,
It will create that type of session.


> + * Main points:
> + * - Current crypto-dev API is reasonably mature and it is desirable
> + *   to keep it unchanged (API/ABI stability). From other side, this
> + *   new sync API is new one and probably would require extra changes.
> + *   Having it as a new one allows to mark it as experimental, without
> + *   affecting existing one.
> + * - Fully opaque cpu_sym_session structure gives more flexibility
> + *   to the PMD writers and again allows to avoid ABI breakages in future.
> + * - process() function per set of xforms
> + *   allows to expose different process() functions for different
> + *   xform combinations. PMD writer can decide, does he wants to
> + *   push all supported algorithms into one process() function,
> + *   or spread it across several ones.
> + *   I.E. More flexibility for PMD writer.

Which process function should be chosen is internal to PMD, how would that info
be visible to the application or the library. These will get stored in the session private
data. It would be upto the PMD writer, to store the per session process function in
the session private data.

Process function would be a dev ops just like enc/deq operations and it should call
The respective process API stored in the session private data.

I am not sure if you would need a new session init API for this as nothing would be visible to
the app or lib.

> + * - Not storing process() pointer inside the session -
> + *   Allows user to choose does he want to store a process() pointer
> + *   per session, or per group of sessions for that device that share
> + *   the same input xforms. I.E. extra flexibility for the user,
> + *   plus allows us to keep cpu_sym_session totally opaque, see above.

If multiple sessions need to be processed via the same process function, 
PMD would save the same process in all the sessions, I don't think there would
be any perf overhead with that.

> + * Sketched usage model:
> + * ....
> + * /* control path, alloc/init session */
> + * int32_t sz = rte_crypto_cpu_sym_session_size(dev_id, &xform);
> + * struct rte_crypto_cpu_sym_session *ses = user_alloc(..., sz);
> + * rte_crypto_cpu_sym_process_t process =
> + *     rte_crypto_cpu_sym_session_func(dev_id, &xform);
> + * rte_crypto_cpu_sym_session_init(dev_id, ses, &xform);
> + * ...
> + * /* data-path*/
> + * process(ses, ....);
> + * ....
> + * /* control path, termiante/free session */
> + * rte_crypto_cpu_sym_session_fini(dev_id, ses);
> + */
> +
> +/**
> + * vector structure, contains pointer to vector array and the length
> + * of the array
> + */
> +struct rte_crypto_vec {
> +       struct iovec *vec;
> +       uint32_t num;
> +};
> +
> +/*
> + * Data-path bulk process crypto function.
> + */
> +typedef void (*rte_crypto_cpu_sym_process_t)(
> +               struct rte_crypto_cpu_sym_session *sess,
> +               struct rte_crypto_vec buf[], void *iv[], void *aad[],
> +               void *digest[], int status[], uint32_t num);
> +/*
> + * for given device return process function specific to input xforms
> + * on error - return NULL and set rte_errno value.
> + * Note that for same input xfroms for the same device should return
> + * the same process function.
> + */
> +__rte_experimental
> +rte_crypto_cpu_sym_process_t
> +rte_crypto_cpu_sym_session_func(uint8_t dev_id,
> +                       const struct rte_crypto_sym_xform *xforms);
> +
> +/*
> + * Return required session size in bytes for given set of xforms.
> + * if xforms == NULL, then return the max possible session size,
> + * that would fit session for any supported by the device algorithm.
> + * if CPU mode is not supported at all, or requeted in xform
> + * algorithm is not supported, then return -ENOTSUP.
> + */
> +__rte_experimental
> +int
> +rte_crypto_cpu_sym_session_size(uint8_t dev_id,
> +                       const struct rte_crypto_sym_xform *xforms);
> +
> +/*
> + * Initialize session.
> + * It is caller responsibility to allocate enough space for it.
> + * See rte_crypto_cpu_sym_session_size above.
> + */
> +__rte_experimental
> +int rte_crypto_cpu_sym_session_init(uint8_t dev_id,
> +                       struct rte_crypto_cpu_sym_session *sess,
> +                       const struct rte_crypto_sym_xform *xforms);
> +
> +__rte_experimental
> +void
> +rte_crypto_cpu_sym_session_fini(uint8_t dev_id,
> +                       struct rte_crypto_cpu_sym_session *sess);
> +
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> index defe05ea0..ed7e63fab 100644
> --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> @@ -310,6 +310,20 @@ typedef void (*cryptodev_sym_free_session_t)(struct
> rte_cryptodev *dev,
>  typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev *dev,
>                 struct rte_cryptodev_asym_session *sess);
> 
> +typedef int (*cryptodev_cpu_sym_session_size_t) (struct rte_cryptodev *dev,
> +                       const struct rte_crypto_sym_xform *xforms);
> +
> +typedef int (*cryptodev_cpu_sym_session_init_t) (struct rte_cryptodev *dev,
> +                       struct rte_crypto_cpu_sym_session *sess,
> +                       const struct rte_crypto_sym_xform *xforms);
> +
> +typedef void (*cryptodev_cpu_sym_session_fini_t) (struct rte_cryptodev *dev,
> +                       struct rte_crypto_cpu_sym_session *sess);
> +
> +typedef rte_crypto_cpu_sym_process_t (*cryptodev_cpu_sym_session_func_t)
> (
> +                       struct rte_cryptodev *dev,
> +                       const struct rte_crypto_sym_xform *xforms);
> +
>  /** Crypto device operations function pointer table */
>  struct rte_cryptodev_ops {
>         cryptodev_configure_t dev_configure;    /**< Configure device. */
> @@ -343,6 +357,11 @@ struct rte_cryptodev_ops {
>         /**< Clear a Crypto sessions private data. */
>         cryptodev_asym_free_session_t asym_session_clear;
>         /**< Clear a Crypto sessions private data. */
> +
> +       cryptodev_cpu_sym_session_size_t sym_cpu_session_get_size;
> +       cryptodev_cpu_sym_session_func_t sym_cpu_session_get_func;
> +       cryptodev_cpu_sym_session_init_t sym_cpu_session_init;
> +       cryptodev_cpu_sym_session_fini_t sym_cpu_session_fini;
>  };
> 
> 
> 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [PATCH 08/10] ipsec: add rte_security cpu_crypto action support
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 08/10] ipsec: add rte_security cpu_crypto action support Fan Zhang
  2019-09-26 23:20     ` Ananyev, Konstantin
@ 2019-09-27 10:38     ` Ananyev, Konstantin
  1 sibling, 0 replies; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-09-27 10:38 UTC (permalink / raw)
  To: Zhang, Roy Fan, dev; +Cc: Doherty, Declan, akhil.goyal

Hi Fan,

> 
> This patch updates the ipsec library to handle the newly introduced
> RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO action.
> 
> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> ---
>  lib/librte_ipsec/esp_inb.c  | 174 +++++++++++++++++++++++++-
>  lib/librte_ipsec/esp_outb.c | 290 +++++++++++++++++++++++++++++++++++++++++++-
>  lib/librte_ipsec/sa.c       |  53 ++++++--
>  lib/librte_ipsec/sa.h       |  29 +++++
>  lib/librte_ipsec/ses.c      |   4 +-
>  5 files changed, 539 insertions(+), 11 deletions(-)
> 
> diff --git a/lib/librte_ipsec/esp_inb.c b/lib/librte_ipsec/esp_inb.c
> index 8e3ecbc64..6077dcb1e 100644
> --- a/lib/librte_ipsec/esp_inb.c
> +++ b/lib/librte_ipsec/esp_inb.c
> @@ -105,6 +105,73 @@ inb_cop_prepare(struct rte_crypto_op *cop,
>  	}
>  }
> 
> +static inline int
> +inb_sync_crypto_proc_prepare(const struct rte_ipsec_sa *sa, struct rte_mbuf *mb,
> +	const union sym_op_data *icv, uint32_t pofs, uint32_t plen,
> +	struct rte_security_vec *buf, struct iovec *cur_vec,
> +	void *iv, void **aad, void **digest)
> +{
> +	struct rte_mbuf *ms;
> +	struct iovec *vec = cur_vec;
> +	struct aead_gcm_iv *gcm;
> +	struct aesctr_cnt_blk *ctr;
> +	uint64_t *ivp;
> +	uint32_t algo, left, off = 0, n_seg = 0;

Same thing as for outbound pls keep definitions and assignments separated.

> +
> +	ivp = rte_pktmbuf_mtod_offset(mb, uint64_t *,
> +		pofs + sizeof(struct rte_esp_hdr));
> +	algo = sa->algo_type;
> +
> +	switch (algo) {
> +	case ALGO_TYPE_AES_GCM:
> +		gcm = (struct aead_gcm_iv *)iv;
> +		aead_gcm_iv_fill(gcm, ivp[0], sa->salt);
> +		*aad = icv->va + sa->icv_len;
> +		off = sa->ctp.cipher.offset + pofs;
> +		break;
> +	case ALGO_TYPE_AES_CBC:
> +	case ALGO_TYPE_3DES_CBC:
> +		off = sa->ctp.auth.offset + pofs;
> +		break;
> +	case ALGO_TYPE_AES_CTR:
> +		off = sa->ctp.auth.offset + pofs;
> +		ctr = (struct aesctr_cnt_blk *)iv;
> +		aes_ctr_cnt_blk_fill(ctr, ivp[0], sa->salt);
> +		break;
> +	case ALGO_TYPE_NULL:
> +		break;
> +	}
> +
> +	*digest = icv->va;
> +
> +	left = plen - sa->ctp.cipher.length;
> +
> +	ms = mbuf_get_seg_ofs(mb, &off);
> +	if (!ms)
> +		return -1;

Same as for outbound: I think no need to check/return failure.
This function could be split into two.

> +
> +	while (n_seg < RTE_LIBRTE_IP_FRAG_MAX_FRAG && left && ms) {


Same thing - we shouldn't limt ourselves to 5 segs per packet.
Pretty much same comments about code restructuring as for outbound case.

> +		uint32_t len = RTE_MIN(left, ms->data_len - off);
> +
> +		vec->iov_base = rte_pktmbuf_mtod_offset(ms, void *, off);
> +		vec->iov_len = len;
> +
> +		left -= len;
> +		vec++;
> +		n_seg++;
> +		ms = ms->next;
> +		off = 0;
> +	}
> +
> +	if (left)
> +		return -1;
> +
> +	buf->vec = cur_vec;
> +	buf->num = n_seg;
> +
> +	return n_seg;
> +}
> +
>  /*
>   * Helper function for prepare() to deal with situation when
>   * ICV is spread by two segments. Tries to move ICV completely into the
> @@ -512,7 +579,6 @@ tun_process(const struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
>  	return k;
>  }
> 
> -
>  /*
>   * *process* function for tunnel packets
>   */
> @@ -625,6 +691,112 @@ esp_inb_pkt_process(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
>  	return n;
>  }
> 
> +/*
> + * process packets using sync crypto engine
> + */
> +static uint16_t
> +esp_inb_sync_crypto_pkt_process(const struct rte_ipsec_session *ss,
> +		struct rte_mbuf *mb[], uint16_t num, uint8_t sqh_len,
> +		esp_inb_process_t process)
> +{
> +	int32_t rc;
> +	uint32_t i, k, hl, n, p;
> +	struct rte_ipsec_sa *sa;
> +	struct replay_sqn *rsn;
> +	union sym_op_data icv;
> +	uint32_t sqn[num];
> +	uint32_t dr[num];
> +	struct rte_security_vec buf[num];
> +	struct iovec vec[RTE_LIBRTE_IP_FRAG_MAX_FRAG * num];
> +	uint32_t vec_idx = 0;
> +	uint8_t ivs[num][IPSEC_MAX_IV_SIZE];
> +	void *iv[num];
> +	void *aad[num];
> +	void *digest[num];
> +	int status[num];
> +
> +	sa = ss->sa;
> +	rsn = rsn_acquire(sa);
> +
> +	k = 0;
> +	for (i = 0; i != num; i++) {
> +		hl = mb[i]->l2_len + mb[i]->l3_len;
> +		rc = inb_pkt_prepare(sa, rsn, mb[i], hl, &icv);
> +		if (rc >= 0) {
> +			iv[k] = (void *)ivs[k];
> +			rc = inb_sync_crypto_proc_prepare(sa, mb[i], &icv, hl,
> +					rc, &buf[k], &vec[vec_idx], iv[k],
> +					&aad[k], &digest[k]);
> +			if (rc < 0) {
> +				dr[i - k] = i;
> +				continue;
> +			}
> +
> +			vec_idx += rc;
> +			k++;
> +		} else
> +			dr[i - k] = i;
> +	}
> +
> +	/* copy not prepared mbufs beyond good ones */
> +	if (k != num) {
> +		rte_errno = EBADMSG;
> +
> +		if (unlikely(k == 0))
> +			return 0;
> +
> +		move_bad_mbufs(mb, dr, num, num - k);
> +	}
> +
> +	/* process the packets */
> +	n = 0;
> +	rte_security_process_cpu_crypto_bulk(ss->security.ctx,
> +			ss->security.ses, buf, iv, aad, digest, status,
> +			k);
> +	/* move failed process packets to dr */
> +	for (i = 0; i < k; i++) {
> +		if (status[i]) {
> +			dr[n++] = i;
> +			rte_errno = EBADMSG;
> +		}
> +	}
> +
> +	/* move bad packets to the back */
> +	if (n)
> +		move_bad_mbufs(mb, dr, k, n);

I don't think you need to set dr[] here and call that function, see below.

> +
> +	/* process packets */
> +	p = process(sa, mb, sqn, dr, k - n, sqh_len);

tun_process(), etc. expects PKT_RX_SEC_OFFLOAD_FAILED to be set in mb->ol_flags
for failed packets.
So you either need to set this value in ol_flags based on status,
or tweak existing process functions, or introduce new ones.


> +
> +	if (p != k - n && p != 0)
> +		move_bad_mbufs(mb, dr, k - n, k - n - p);
> +
> +	if (p != num)
> +		rte_errno = EBADMSG;
> +
> +	return p;
> +}
> +
> +uint16_t
> +esp_inb_tun_sync_crypto_pkt_process(const struct rte_ipsec_session *ss,
> +		struct rte_mbuf *mb[], uint16_t num)
> +{
> +	struct rte_ipsec_sa *sa = ss->sa;
> +
> +	return esp_inb_sync_crypto_pkt_process(ss, mb, num, sa->sqh_len,
> +			tun_process);
> +}
> +
> +uint16_t
> +esp_inb_trs_sync_crypto_pkt_process(const struct rte_ipsec_session *ss,
> +		struct rte_mbuf *mb[], uint16_t num)
> +{
> +	struct rte_ipsec_sa *sa = ss->sa;
> +
> +	return esp_inb_sync_crypto_pkt_process(ss, mb, num, sa->sqh_len,
> +			trs_process);
> +}
> +
>  /*
>   * process group of ESP inbound tunnel packets.
>   */

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [PATCH 01/10] security: introduce CPU Crypto action type and API
  2019-09-06 13:13   ` [dpdk-dev] [PATCH 01/10] security: introduce CPU Crypto action type and API Fan Zhang
  2019-09-18 12:45     ` Ananyev, Konstantin
@ 2019-09-29  6:00     ` Hemant Agrawal
  2019-09-29 16:59       ` Ananyev, Konstantin
  1 sibling, 1 reply; 87+ messages in thread
From: Hemant Agrawal @ 2019-09-29  6:00 UTC (permalink / raw)
  To: Fan Zhang, dev; +Cc: konstantin.ananyev, declan.doherty, Akhil Goyal

Some comments inline.

On 06-Sep-19 6:43 PM, Fan Zhang wrote:
> This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO action type to
> security library. The type represents performing crypto operation with CPU
> cycles. The patch also includes a new API to process crypto operations in
> bulk and the function pointers for PMDs.
>
> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> ---
>   lib/librte_security/rte_security.c           | 16 +++++++++
>   lib/librte_security/rte_security.h           | 51 +++++++++++++++++++++++++++-
>   lib/librte_security/rte_security_driver.h    | 19 +++++++++++
>   lib/librte_security/rte_security_version.map |  1 +
>   4 files changed, 86 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_security/rte_security.c b/lib/librte_security/rte_security.c
> index bc81ce15d..0f85c1b59 100644
> --- a/lib/librte_security/rte_security.c
> +++ b/lib/librte_security/rte_security.c
> @@ -141,3 +141,19 @@ rte_security_capability_get(struct rte_security_ctx *instance,
>   
>   	return NULL;
>   }
> +
> +void
> +rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
> +		struct rte_security_session *sess,
> +		struct rte_security_vec buf[], void *iv[], void *aad[],
> +		void *digest[], int status[], uint32_t num)
> +{
> +	uint32_t i;
> +
> +	for (i = 0; i < num; i++)
> +		status[i] = -1;
> +
> +	RTE_FUNC_PTR_OR_RET(*instance->ops->process_cpu_crypto_bulk);
> +	instance->ops->process_cpu_crypto_bulk(sess, buf, iv,
> +			aad, digest, status, num);
> +}
> diff --git a/lib/librte_security/rte_security.h b/lib/librte_security/rte_security.h
> index 96806e3a2..5a0f8901b 100644
> --- a/lib/librte_security/rte_security.h
> +++ b/lib/librte_security/rte_security.h
> @@ -18,6 +18,7 @@ extern "C" {
>   #endif
>   
>   #include <sys/types.h>
> +#include <sys/uio.h>
>   
>   #include <netinet/in.h>
>   #include <netinet/ip.h>
> @@ -272,6 +273,20 @@ struct rte_security_pdcp_xform {
>   	uint32_t hfn_threshold;
>   };
>   
> +struct rte_security_cpu_crypto_xform {
> +	/** For cipher/authentication crypto operation the authentication may
> +	 * cover more content then the cipher. E.g., for IPSec ESP encryption
> +	 * with AES-CBC and SHA1-HMAC, the encryption happens after the ESP
> +	 * header but whole packet (apart from MAC header) is authenticated.
> +	 * The cipher_offset field is used to deduct the cipher data pointer
> +	 * from the buffer to be processed.
> +	 *
> +	 * NOTE this parameter shall be ignored by AEAD algorithms, since it
> +	 * uses the same offset for cipher and authentication.
> +	 */
> +	int32_t cipher_offset;
> +};
> +
>   /**
>    * Security session action type.
>    */
> @@ -286,10 +301,14 @@ enum rte_security_session_action_type {
>   	/**< All security protocol processing is performed inline during
>   	 * transmission
>   	 */
> -	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
> +	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL,
>   	/**< All security protocol processing including crypto is performed
>   	 * on a lookaside accelerator
>   	 */
> +	RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> +	/**< Crypto processing for security protocol is processed by CPU
> +	 * synchronously
> +	 */
though you are naming it cpu crypto, but it is more like raw packet 
crypto, where you want to skip mbuf/crypto ops and directly wants to 
work on raw buffer.
>   };
>   
>   /** Security session protocol definition */
> @@ -315,6 +334,7 @@ struct rte_security_session_conf {
>   		struct rte_security_ipsec_xform ipsec;
>   		struct rte_security_macsec_xform macsec;
>   		struct rte_security_pdcp_xform pdcp;
> +		struct rte_security_cpu_crypto_xform cpucrypto;
>   	};
>   	/**< Configuration parameters for security session */
>   	struct rte_crypto_sym_xform *crypto_xform;
> @@ -639,6 +659,35 @@ const struct rte_security_capability *
>   rte_security_capability_get(struct rte_security_ctx *instance,
>   			    struct rte_security_capability_idx *idx);
>   
> +/**
> + * Security vector structure, contains pointer to vector array and the length
> + * of the array
> + */
> +struct rte_security_vec {
> +	struct iovec *vec;
> +	uint32_t num;
> +};
> +

Just wondering if you want to change it to *in_vec and *out_vec, that 
will be helpful in future, if the out-of-place processing is required 
for CPU usecase as well?

> +/**
> + * Processing bulk crypto workload with CPU
> + *
> + * @param	instance	security instance.
> + * @param	sess		security session
> + * @param	buf		array of buffer SGL vectors
> + * @param	iv		array of IV pointers
> + * @param	aad		array of AAD pointers
> + * @param	digest		array of digest pointers
> + * @param	status		array of status for the function to return
> + * @param	num		number of elements in each array
> + *
> + */
> +__rte_experimental
> +void
> +rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
> +		struct rte_security_session *sess,
> +		struct rte_security_vec buf[], void *iv[], void *aad[],
> +		void *digest[], int status[], uint32_t num);
> +

Why not make the return as int, to indicate whether this API completely 
failed or processed or have some valid status to look into?


>   #ifdef __cplusplus
>   }
>   #endif
> diff --git a/lib/librte_security/rte_security_driver.h b/lib/librte_security/rte_security_driver.h
> index 1b561f852..70fcb0c26 100644
> --- a/lib/librte_security/rte_security_driver.h
> +++ b/lib/librte_security/rte_security_driver.h
> @@ -132,6 +132,23 @@ typedef int (*security_get_userdata_t)(void *device,
>   typedef const struct rte_security_capability *(*security_capabilities_get_t)(
>   		void *device);
>   
> +/**
> + * Process security operations in bulk using CPU accelerated method.
> + *
> + * @param	sess		Security session structure.
> + * @param	buf		Buffer to the vectors to be processed.
> + * @param	iv		IV pointers.
> + * @param	aad		AAD pointers.
> + * @param	digest		Digest pointers.
> + * @param	status		Array of status value.
> + * @param	num		Number of elements in each array.
> + */
> +
> +typedef void (*security_process_cpu_crypto_bulk_t)(
> +		struct rte_security_session *sess,
> +		struct rte_security_vec buf[], void *iv[], void *aad[],
> +		void *digest[], int status[], uint32_t num);
> +
>   /** Security operations function pointer table */
>   struct rte_security_ops {
>   	security_session_create_t session_create;
> @@ -150,6 +167,8 @@ struct rte_security_ops {
>   	/**< Get userdata associated with session which processed the packet. */
>   	security_capabilities_get_t capabilities_get;
>   	/**< Get security capabilities. */
> +	security_process_cpu_crypto_bulk_t process_cpu_crypto_bulk;
> +	/**< Process data in bulk. */
>   };
>   
>   #ifdef __cplusplus
> diff --git a/lib/librte_security/rte_security_version.map b/lib/librte_security/rte_security_version.map
> index 53267bf3c..2132e7a00 100644
> --- a/lib/librte_security/rte_security_version.map
> +++ b/lib/librte_security/rte_security_version.map
> @@ -18,4 +18,5 @@ EXPERIMENTAL {
>   	rte_security_get_userdata;
>   	rte_security_session_stats_get;
>   	rte_security_session_update;
> +	rte_security_process_cpu_crypto_bulk;
>   };

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [PATCH 01/10] security: introduce CPU Crypto action type and API
  2019-09-29  6:00     ` Hemant Agrawal
@ 2019-09-29 16:59       ` Ananyev, Konstantin
  2019-09-30  9:43         ` Hemant Agrawal
  0 siblings, 1 reply; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-09-29 16:59 UTC (permalink / raw)
  To: Hemant Agrawal, Zhang, Roy Fan, dev; +Cc: Doherty, Declan, Akhil Goyal

Hi Hemant,

> 
> On 06-Sep-19 6:43 PM, Fan Zhang wrote:
> > This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO action type to
> > security library. The type represents performing crypto operation with CPU
> > cycles. The patch also includes a new API to process crypto operations in
> > bulk and the function pointers for PMDs.
> >
> > Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> > ---
> >   lib/librte_security/rte_security.c           | 16 +++++++++
> >   lib/librte_security/rte_security.h           | 51 +++++++++++++++++++++++++++-
> >   lib/librte_security/rte_security_driver.h    | 19 +++++++++++
> >   lib/librte_security/rte_security_version.map |  1 +
> >   4 files changed, 86 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_security/rte_security.c b/lib/librte_security/rte_security.c
> > index bc81ce15d..0f85c1b59 100644
> > --- a/lib/librte_security/rte_security.c
> > +++ b/lib/librte_security/rte_security.c
> > @@ -141,3 +141,19 @@ rte_security_capability_get(struct rte_security_ctx *instance,
> >
> >   	return NULL;
> >   }
> > +
> > +void
> > +rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
> > +		struct rte_security_session *sess,
> > +		struct rte_security_vec buf[], void *iv[], void *aad[],
> > +		void *digest[], int status[], uint32_t num)
> > +{
> > +	uint32_t i;
> > +
> > +	for (i = 0; i < num; i++)
> > +		status[i] = -1;
> > +
> > +	RTE_FUNC_PTR_OR_RET(*instance->ops->process_cpu_crypto_bulk);
> > +	instance->ops->process_cpu_crypto_bulk(sess, buf, iv,
> > +			aad, digest, status, num);
> > +}
> > diff --git a/lib/librte_security/rte_security.h b/lib/librte_security/rte_security.h
> > index 96806e3a2..5a0f8901b 100644
> > --- a/lib/librte_security/rte_security.h
> > +++ b/lib/librte_security/rte_security.h
> > @@ -18,6 +18,7 @@ extern "C" {
> >   #endif
> >
> >   #include <sys/types.h>
> > +#include <sys/uio.h>
> >
> >   #include <netinet/in.h>
> >   #include <netinet/ip.h>
> > @@ -272,6 +273,20 @@ struct rte_security_pdcp_xform {
> >   	uint32_t hfn_threshold;
> >   };
> >
> > +struct rte_security_cpu_crypto_xform {
> > +	/** For cipher/authentication crypto operation the authentication may
> > +	 * cover more content then the cipher. E.g., for IPSec ESP encryption
> > +	 * with AES-CBC and SHA1-HMAC, the encryption happens after the ESP
> > +	 * header but whole packet (apart from MAC header) is authenticated.
> > +	 * The cipher_offset field is used to deduct the cipher data pointer
> > +	 * from the buffer to be processed.
> > +	 *
> > +	 * NOTE this parameter shall be ignored by AEAD algorithms, since it
> > +	 * uses the same offset for cipher and authentication.
> > +	 */
> > +	int32_t cipher_offset;
> > +};
> > +
> >   /**
> >    * Security session action type.
> >    */
> > @@ -286,10 +301,14 @@ enum rte_security_session_action_type {
> >   	/**< All security protocol processing is performed inline during
> >   	 * transmission
> >   	 */
> > -	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
> > +	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL,
> >   	/**< All security protocol processing including crypto is performed
> >   	 * on a lookaside accelerator
> >   	 */
> > +	RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > +	/**< Crypto processing for security protocol is processed by CPU
> > +	 * synchronously
> > +	 */
> though you are naming it cpu crypto, but it is more like raw packet
> crypto, where you want to skip mbuf/crypto ops and directly wants to
> work on raw buffer.

Yes, but we do wat to do that (skip mbuf/crypto ops and use raw buffer),
because this API is destined for SW backed implementation.
For that case crypto-ops , mbuf, enqueue/dequeue are just unnecessary overhead. 

> >   };
> >
> >   /** Security session protocol definition */
> > @@ -315,6 +334,7 @@ struct rte_security_session_conf {
> >   		struct rte_security_ipsec_xform ipsec;
> >   		struct rte_security_macsec_xform macsec;
> >   		struct rte_security_pdcp_xform pdcp;
> > +		struct rte_security_cpu_crypto_xform cpucrypto;
> >   	};
> >   	/**< Configuration parameters for security session */
> >   	struct rte_crypto_sym_xform *crypto_xform;
> > @@ -639,6 +659,35 @@ const struct rte_security_capability *
> >   rte_security_capability_get(struct rte_security_ctx *instance,
> >   			    struct rte_security_capability_idx *idx);
> >
> > +/**
> > + * Security vector structure, contains pointer to vector array and the length
> > + * of the array
> > + */
> > +struct rte_security_vec {
> > +	struct iovec *vec;
> > +	uint32_t num;
> > +};
> > +
> 
> Just wondering if you want to change it to *in_vec and *out_vec, that
> will be helpful in future, if the out-of-place processing is required
> for CPU usecase as well?

I suppose this is doable, though right now we don't plan to support such model.

> 
> > +/**
> > + * Processing bulk crypto workload with CPU
> > + *
> > + * @param	instance	security instance.
> > + * @param	sess		security session
> > + * @param	buf		array of buffer SGL vectors
> > + * @param	iv		array of IV pointers
> > + * @param	aad		array of AAD pointers
> > + * @param	digest		array of digest pointers
> > + * @param	status		array of status for the function to return
> > + * @param	num		number of elements in each array
> > + *
> > + */
> > +__rte_experimental
> > +void
> > +rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
> > +		struct rte_security_session *sess,
> > +		struct rte_security_vec buf[], void *iv[], void *aad[],
> > +		void *digest[], int status[], uint32_t num);
> > +
> 
> Why not make the return as int, to indicate whether this API completely
> failed or processed or have some valid status to look into?

Good point, will change as suggested.

> 
> 
> >   #ifdef __cplusplus
> >   }
> >   #endif
> > diff --git a/lib/librte_security/rte_security_driver.h b/lib/librte_security/rte_security_driver.h
> > index 1b561f852..70fcb0c26 100644
> > --- a/lib/librte_security/rte_security_driver.h
> > +++ b/lib/librte_security/rte_security_driver.h
> > @@ -132,6 +132,23 @@ typedef int (*security_get_userdata_t)(void *device,
> >   typedef const struct rte_security_capability *(*security_capabilities_get_t)(
> >   		void *device);
> >
> > +/**
> > + * Process security operations in bulk using CPU accelerated method.
> > + *
> > + * @param	sess		Security session structure.
> > + * @param	buf		Buffer to the vectors to be processed.
> > + * @param	iv		IV pointers.
> > + * @param	aad		AAD pointers.
> > + * @param	digest		Digest pointers.
> > + * @param	status		Array of status value.
> > + * @param	num		Number of elements in each array.
> > + */
> > +
> > +typedef void (*security_process_cpu_crypto_bulk_t)(
> > +		struct rte_security_session *sess,
> > +		struct rte_security_vec buf[], void *iv[], void *aad[],
> > +		void *digest[], int status[], uint32_t num);
> > +
> >   /** Security operations function pointer table */
> >   struct rte_security_ops {
> >   	security_session_create_t session_create;
> > @@ -150,6 +167,8 @@ struct rte_security_ops {
> >   	/**< Get userdata associated with session which processed the packet. */
> >   	security_capabilities_get_t capabilities_get;
> >   	/**< Get security capabilities. */
> > +	security_process_cpu_crypto_bulk_t process_cpu_crypto_bulk;
> > +	/**< Process data in bulk. */
> >   };
> >
> >   #ifdef __cplusplus
> > diff --git a/lib/librte_security/rte_security_version.map b/lib/librte_security/rte_security_version.map
> > index 53267bf3c..2132e7a00 100644
> > --- a/lib/librte_security/rte_security_version.map
> > +++ b/lib/librte_security/rte_security_version.map
> > @@ -18,4 +18,5 @@ EXPERIMENTAL {
> >   	rte_security_get_userdata;
> >   	rte_security_session_stats_get;
> >   	rte_security_session_update;
> > +	rte_security_process_cpu_crypto_bulk;
> >   };

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [PATCH 01/10] security: introduce CPU Crypto action type and API
  2019-09-29 16:59       ` Ananyev, Konstantin
@ 2019-09-30  9:43         ` Hemant Agrawal
  2019-10-01 15:27           ` Ananyev, Konstantin
  0 siblings, 1 reply; 87+ messages in thread
From: Hemant Agrawal @ 2019-09-30  9:43 UTC (permalink / raw)
  To: Ananyev, Konstantin, Zhang, Roy Fan, dev; +Cc: Doherty, Declan, Akhil Goyal

Hi Konstantin,

n 06-Sep-19 6:43 PM, Fan Zhang wrote:
>>> This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO action type to
>>> security library. The type represents performing crypto operation with CPU
>>> cycles. The patch also includes a new API to process crypto operations in
>>> bulk and the function pointers for PMDs.
>>>
>>> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
>>> ---
>>>    lib/librte_security/rte_security.c           | 16 +++++++++
>>>    lib/librte_security/rte_security.h           | 51 +++++++++++++++++++++++++++-
>>>    lib/librte_security/rte_security_driver.h    | 19 +++++++++++
>>>    lib/librte_security/rte_security_version.map |  1 +
>>>    4 files changed, 86 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/librte_security/rte_security.c b/lib/librte_security/rte_security.c
>>> index bc81ce15d..0f85c1b59 100644
>>> --- a/lib/librte_security/rte_security.c
>>> +++ b/lib/librte_security/rte_security.c
>>> @@ -141,3 +141,19 @@ rte_security_capability_get(struct rte_security_ctx *instance,
>>>
>>>    	return NULL;
>>>    }
>>> +
>>> +void
>>> +rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
>>> +		struct rte_security_session *sess,
>>> +		struct rte_security_vec buf[], void *iv[], void *aad[],
>>> +		void *digest[], int status[], uint32_t num)
>>> +{
>>> +	uint32_t i;
>>> +
>>> +	for (i = 0; i < num; i++)
>>> +		status[i] = -1;
>>> +
>>> +	RTE_FUNC_PTR_OR_RET(*instance->ops->process_cpu_crypto_bulk);
>>> +	instance->ops->process_cpu_crypto_bulk(sess, buf, iv,
>>> +			aad, digest, status, num);
>>> +}
>>> diff --git a/lib/librte_security/rte_security.h b/lib/librte_security/rte_security.h
>>> index 96806e3a2..5a0f8901b 100644
>>> --- a/lib/librte_security/rte_security.h
>>> +++ b/lib/librte_security/rte_security.h
>>> @@ -18,6 +18,7 @@ extern "C" {
>>>    #endif
>>>
>>>    #include <sys/types.h>
>>> +#include <sys/uio.h>
>>>
>>>    #include <netinet/in.h>
>>>    #include <netinet/ip.h>
>>> @@ -272,6 +273,20 @@ struct rte_security_pdcp_xform {
>>>    	uint32_t hfn_threshold;
>>>    };
>>>
>>> +struct rte_security_cpu_crypto_xform {
>>> +	/** For cipher/authentication crypto operation the authentication may
>>> +	 * cover more content then the cipher. E.g., for IPSec ESP encryption
>>> +	 * with AES-CBC and SHA1-HMAC, the encryption happens after the ESP
>>> +	 * header but whole packet (apart from MAC header) is authenticated.
>>> +	 * The cipher_offset field is used to deduct the cipher data pointer
>>> +	 * from the buffer to be processed.
>>> +	 *
>>> +	 * NOTE this parameter shall be ignored by AEAD algorithms, since it
>>> +	 * uses the same offset for cipher and authentication.
>>> +	 */
>>> +	int32_t cipher_offset;
>>> +};
>>> +
>>>    /**
>>>     * Security session action type.
>>>     */
>>> @@ -286,10 +301,14 @@ enum rte_security_session_action_type {
>>>    	/**< All security protocol processing is performed inline during
>>>    	 * transmission
>>>    	 */
>>> -	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
>>> +	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL,
>>>    	/**< All security protocol processing including crypto is performed
>>>    	 * on a lookaside accelerator
>>>    	 */
>>> +	RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
>>> +	/**< Crypto processing for security protocol is processed by CPU
>>> +	 * synchronously
>>> +	 */
>> though you are naming it cpu crypto, but it is more like raw packet
>> crypto, where you want to skip mbuf/crypto ops and directly wants to
>> work on raw buffer.
> Yes, but we do wat to do that (skip mbuf/crypto ops and use raw buffer),
> because this API is destined for SW backed implementation.
> For that case crypto-ops , mbuf, enqueue/dequeue are just unnecessary overhead.
I agree, we are also planning to take advantage of it for some specific 
use-cases in future.
>>>    };
>>>
>>>    /** Security session protocol definition */
>>> @@ -315,6 +334,7 @@ struct rte_security_session_conf {
>>>    		struct rte_security_ipsec_xform ipsec;
>>>    		struct rte_security_macsec_xform macsec;
>>>    		struct rte_security_pdcp_xform pdcp;
>>> +		struct rte_security_cpu_crypto_xform cpucrypto;
>>>    	};
>>>    	/**< Configuration parameters for security session */
>>>    	struct rte_crypto_sym_xform *crypto_xform;
>>> @@ -639,6 +659,35 @@ const struct rte_security_capability *
>>>    rte_security_capability_get(struct rte_security_ctx *instance,
>>>    			    struct rte_security_capability_idx *idx);
>>>
>>> +/**
>>> + * Security vector structure, contains pointer to vector array and the length
>>> + * of the array
>>> + */
>>> +struct rte_security_vec {
>>> +	struct iovec *vec;
>>> +	uint32_t num;
>>> +};
>>> +
>> Just wondering if you want to change it to *in_vec and *out_vec, that
>> will be helpful in future, if the out-of-place processing is required
>> for CPU usecase as well?
> I suppose this is doable, though right now we don't plan to support such model.
They will come handy in future. I plan to use it in future and we can 
skip the API/ABI breakage, if the placeholder are present
>
>>> +/**
>>> + * Processing bulk crypto workload with CPU
>>> + *
>>> + * @param	instance	security instance.
>>> + * @param	sess		security session
>>> + * @param	buf		array of buffer SGL vectors
>>> + * @param	iv		array of IV pointers
>>> + * @param	aad		array of AAD pointers
>>> + * @param	digest		array of digest pointers
>>> + * @param	status		array of status for the function to return
>>> + * @param	num		number of elements in each array
>>> + *
>>> + */
>>> +__rte_experimental
>>> +void
>>> +rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
>>> +		struct rte_security_session *sess,
>>> +		struct rte_security_vec buf[], void *iv[], void *aad[],
>>> +		void *digest[], int status[], uint32_t num);
>>> +
>> Why not make the return as int, to indicate whether this API completely
>> failed or processed or have some valid status to look into?
> Good point, will change as suggested.

I have another suggestions w.r.t iv, aad, digest etc. Why not put them 
in a structure, so that you will

be able to add/remove the variable without breaking the API prototype.

>
>>
>>>    #ifdef __cplusplus
>>>    }
>>>    #endif
>>> diff --git a/lib/librte_security/rte_security_driver.h b/lib/librte_security/rte_security_driver.h
>>> index 1b561f852..70fcb0c26 100644
>>> --- a/lib/librte_security/rte_security_driver.h
>>> +++ b/lib/librte_security/rte_security_driver.h
>>> @@ -132,6 +132,23 @@ typedef int (*security_get_userdata_t)(void *device,
>>>    typedef const struct rte_security_capability *(*security_capabilities_get_t)(
>>>    		void *device);
>>>
>>> +/**
>>> + * Process security operations in bulk using CPU accelerated method.
>>> + *
>>> + * @param	sess		Security session structure.
>>> + * @param	buf		Buffer to the vectors to be processed.
>>> + * @param	iv		IV pointers.
>>> + * @param	aad		AAD pointers.
>>> + * @param	digest		Digest pointers.
>>> + * @param	status		Array of status value.
>>> + * @param	num		Number of elements in each array.
>>> + */
>>> +
>>> +typedef void (*security_process_cpu_crypto_bulk_t)(
>>> +		struct rte_security_session *sess,
>>> +		struct rte_security_vec buf[], void *iv[], void *aad[],
>>> +		void *digest[], int status[], uint32_t num);
>>> +
>>>    /** Security operations function pointer table */
>>>    struct rte_security_ops {
>>>    	security_session_create_t session_create;
>>> @@ -150,6 +167,8 @@ struct rte_security_ops {
>>>    	/**< Get userdata associated with session which processed the packet. */
>>>    	security_capabilities_get_t capabilities_get;
>>>    	/**< Get security capabilities. */
>>> +	security_process_cpu_crypto_bulk_t process_cpu_crypto_bulk;
>>> +	/**< Process data in bulk. */
>>>    };
>>>
>>>    #ifdef __cplusplus
>>> diff --git a/lib/librte_security/rte_security_version.map b/lib/librte_security/rte_security_version.map
>>> index 53267bf3c..2132e7a00 100644
>>> --- a/lib/librte_security/rte_security_version.map
>>> +++ b/lib/librte_security/rte_security_version.map
>>> @@ -18,4 +18,5 @@ EXPERIMENTAL {
>>>    	rte_security_get_userdata;
>>>    	rte_security_session_stats_get;
>>>    	rte_security_session_update;
>>> +	rte_security_process_cpu_crypto_bulk;
>>>    };

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-09-27  9:26                         ` Akhil Goyal
@ 2019-09-30 12:22                           ` Ananyev, Konstantin
  2019-09-30 13:43                             ` Akhil Goyal
  0 siblings, 1 reply; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-09-30 12:22 UTC (permalink / raw)
  To: Akhil Goyal, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon'
  Cc: Zhang, Roy Fan, Doherty, Declan, 'Anoob Joseph'

Hi Akhil,

> > > > > > > > > > > This action type allows the burst of symmetric crypto workload
> > using
> > > > > the
> > > > > > > > > same
> > > > > > > > > > > algorithm, key, and direction being processed by CPU cycles
> > > > > > > synchronously.
> > > > > > > > > > > This flexible action type does not require external hardware
> > > > > involvement,
> > > > > > > > > > > having the crypto workload processed synchronously, and is
> > more
> > > > > > > > > performant
> > > > > > > > > > > than Cryptodev SW PMD due to the saved cycles on removed
> > "async
> > > > > > > mode
> > > > > > > > > > > simulation" as well as 3 cacheline access of the crypto ops.
> > > > > > > > > >
> > > > > > > > > > Does that mean application will not call the
> > cryptodev_enqueue_burst
> > > > > and
> > > > > > > > > corresponding dequeue burst.
> > > > > > > > >
> > > > > > > > > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> > > > > > > > >
> > > > > > > > > > It would be a new API something like process_packets and it will
> > have
> > > > > the
> > > > > > > > > crypto processed packets while returning from the API?
> > > > > > > > >
> > > > > > > > > Yes, though the plan is that API will operate on raw data buffers,
> > not
> > > > > mbufs.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I still do not understand why we cannot do with the conventional
> > > > > crypto lib
> > > > > > > > > only.
> > > > > > > > > > As far as I can understand, you are not doing any protocol
> > processing
> > > > > or
> > > > > > > any
> > > > > > > > > value add
> > > > > > > > > > To the crypto processing. IMO, you just need a synchronous
> > crypto
> > > > > > > processing
> > > > > > > > > API which
> > > > > > > > > > Can be defined in cryptodev, you don't need to re-create a crypto
> > > > > session
> > > > > > > in
> > > > > > > > > the name of
> > > > > > > > > > Security session in the driver just to do a synchronous processing.
> > > > > > > > >
> > > > > > > > > I suppose your question is why not to have
> > > > > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > > > > The main reason is that would require disruptive changes in existing
> > > > > > > cryptodev
> > > > > > > > > API
> > > > > > > > > (would cause ABI/API breakage).
> > > > > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need
> > some
> > > > > extra
> > > > > > > > > information
> > > > > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > > > > (cipher offset from the start of the buffer, might be something extra
> > in
> > > > > > > future).
> > > > > > > >
> > > > > > > > Cipher offset will be part of rte_crypto_op.
> > > > > > >
> > > > > > > fill/read (+ alloc/free) is one of the main things that slowdown current
> > > > > crypto-op
> > > > > > > approach.
> > > > > > > That's why the general idea - have all data that wouldn't change from
> > packet
> > > > > to
> > > > > > > packet
> > > > > > > included into the session and setup it once at session_init().
> > > > > >
> > > > > > I agree that you cannot use crypto-op.
> > > > > > You can have the new API in crypto.
> > > > > > As per the current patch, you only need cipher_offset which you can have
> > it as
> > > > > a parameter until
> > > > > > You get it approved in the crypto xform. I believe it will be beneficial in
> > case of
> > > > > other crypto cases as well.
> > > > > > We can have cipher offset at both places(crypto-op and cipher_xform). It
> > will
> > > > > give flexibility to the user to
> > > > > > override it.
> > > > >
> > > > > After having another thought on your proposal:
> > > > > Probably we can introduce new rte_crypto_sym_xform_types for CPU
> > related
> > > > > stuff here?
> > > >
> > > > I also thought of adding new xforms, but that wont serve the purpose for
> > may be all the cases.
> > > > You would be needing all information currently available in the current
> > xforms.
> > > > So if you are adding new fields in the new xform, the size will be more than
> > that of the union of xforms.
> > > > ABI breakage would still be there.
> > > >
> > > > If you think a valid compression of the AEAD xform can be done, then that
> > can be done for each of the
> > > > Xforms and we can have a solution to this issue.
> > >
> > > I think that we can re-use iv.offset for our purposes (for crypto offset).
> > > So for now we can make that path work without any ABI breakage.
> > > Fan, please feel free to correct me here, if I missed something.
> > > If in future we would need to add some extra information it might
> > > require ABI breakage, though by now I don't envision anything particular to
> > add.
> > > Anyway, if there is no objection to go that way, we can try to make
> > > these changes for v2.
> > >
> >
> > Actually, after looking at it more deeply it appears not that easy as I thought it
> > would be :)
> > Below is a very draft version of proposed API additions.
> > I think it avoids ABI breakages right now and provides enough flexibility for
> > future extensions (if any).
> > For now, it doesn't address your comments about naming conventions (_CPU_
> > vs _SYNC_) , etc.
> > but I suppose is comprehensive enough to provide a main idea beyond it.
> > Akhil and other interested parties, please try to review and provide feedback
> > ASAP,
> > as related changes would take some time and we still like to hit 19.11 deadline.
> > Konstantin
> >
> >  diff --git a/lib/librte_cryptodev/rte_crypto_sym.h
> > b/lib/librte_cryptodev/rte_crypto_sym.h
> > index bc8da2466..c03069e23 100644
> > --- a/lib/librte_cryptodev/rte_crypto_sym.h
> > +++ b/lib/librte_cryptodev/rte_crypto_sym.h
> > @@ -103,6 +103,9 @@ rte_crypto_cipher_operation_strings[];
> >   *
> >   * This structure contains data relating to Cipher (Encryption and Decryption)
> >   *  use to create a session.
> > + * Actually I was wrong saying that we don't have free space inside xforms.
> > + * Making key struct packed (see below) allow us to regain 6B that could be
> > + * used for future extensions.
> >   */
> >  struct rte_crypto_cipher_xform {
> >         enum rte_crypto_cipher_operation op;
> > @@ -116,7 +119,25 @@ struct rte_crypto_cipher_xform {
> >         struct {
> >                 const uint8_t *data;    /**< pointer to key data */
> >                 uint16_t length;        /**< key length in bytes */
> > -       } key;
> > +       } __attribute__((__packed__)) key;
> > +
> > +       /**
> > +         * offset for cipher to start within user provided data buffer.
> > +        * Fan suggested another (and less space consuming way) -
> > +         * reuse iv.offset space below, by changing:
> > +        * struct {uint16_t offset, length;} iv;
> > +        * to uunamed union:
> > +        * union {
> > +        *      struct {uint16_t offset, length;} iv;
> > +        *      struct {uint16_t iv_len, crypto_offset} cpu_crypto_param;
> > +        * };
> > +        * Both approaches seems ok to me in general.
> 
> No strong opinions here. OK with this one.
> 
> > +        * Comments/suggestions are welcome.
> > +         */
> > +       uint16_t offset;

After another thought - it is probably a bit better to have offset as a separate field.
In that case we can use the same xforms to create both type of sessions.

> > +
> > +       uint8_t reserved1[4];
> > +
> >         /**< Cipher key
> >          *
> >          * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operation, key.data will
> > @@ -284,7 +305,7 @@ struct rte_crypto_auth_xform {
> >         struct {
> >                 const uint8_t *data;    /**< pointer to key data */
> >                 uint16_t length;        /**< key length in bytes */
> > -       } key;
> > +       } __attribute__((__packed__)) key;
> >         /**< Authentication key data.
> >          * The authentication key length MUST be less than or equal to the
> >          * block size of the algorithm. It is the callers responsibility to
> > @@ -292,6 +313,8 @@ struct rte_crypto_auth_xform {
> >          * (for example RFC 2104, FIPS 198a).
> >          */
> >
> > +       uint8_t reserved1[6];
> > +
> >         struct {
> >                 uint16_t offset;
> >                 /**< Starting point for Initialisation Vector or Counter,
> > @@ -376,7 +399,12 @@ struct rte_crypto_aead_xform {
> >         struct {
> >                 const uint8_t *data;    /**< pointer to key data */
> >                 uint16_t length;        /**< key length in bytes */
> > -       } key;
> > +       } __attribute__((__packed__)) key;
> > +
> > +       /** offset for cipher to start within data buffer */
> > +       uint16_t cipher_offset;
> > +
> > +       uint8_t reserved1[4];
> >
> >         struct {
> >                 uint16_t offset;
> > diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> > b/lib/librte_cryptodev/rte_cryptodev.h
> > index e175b838c..c0c7bfed7 100644
> > --- a/lib/librte_cryptodev/rte_cryptodev.h
> > +++ b/lib/librte_cryptodev/rte_cryptodev.h
> > @@ -1272,6 +1272,101 @@ void *
> >  rte_cryptodev_sym_session_get_user_data(
> >                                         struct rte_cryptodev_sym_session *sess);
> >
> > +/*
> > + * After several thoughts decided not to try to squeeze CPU_CRYPTO
> > + * into existing rte_crypto_sym_session structure/API, but instead
> > + * introduce an extentsion to it via new fully opaque
> > + * struct rte_crypto_cpu_sym_session and additional related API.
> 
> 
> What all things do we need to squeeze?
> In this proposal I do not see the new struct cpu_sym_session  defined here.

The plan is to have it totally opaque to the user, i.e. just:
struct rte_crypto_cpu_sym_session;
in public header files.

> I believe you will have same lib API/struct for cpu_sym_session  and sym_session.

I thought about such way, but there are few things that looks clumsy to me:
1. Right now there is no 'type' (or so) field inside rte_cryptodev_sym_session,
so it is not possible to easy distinguish what session do you have: lksd_sym or cpu_sym.
In theory, there is a hole of 4B inside rte_cryptodev_sym_session, so we can add some extra field
here, but in that case  we wouldn't be able to use the same xform for both  lksd_sym or cpu_sym
(which seems really plausible thing for me).
2.  Majority of rte_cryptodev_sym_session fields I think are unnecessary for rte_crypto_cpu_sym_session:
sess_data[], opaque_data, user_data, nb_drivers.
All that consumes space, that could be used somewhere else instead.
3. I am a bit reluctant to touch existing rte_cryptodev API - to avoid any breakages I can't foresee right now.
From other side - if we'll add new functions/structs for cpu_sym_session we can mark it
and keep it for some time as experimental, so further changes (if needed) would still be possible.

> I am not sure if that would be needed.
> It would be internal to the driver that if synchronous processing is supported(from feature flag) and
> Have relevant fields in xform(the newly added ones which are packed as per your suggestions) set,
> It will create that type of session.
> 
> 
> > + * Main points:
> > + * - Current crypto-dev API is reasonably mature and it is desirable
> > + *   to keep it unchanged (API/ABI stability). From other side, this
> > + *   new sync API is new one and probably would require extra changes.
> > + *   Having it as a new one allows to mark it as experimental, without
> > + *   affecting existing one.
> > + * - Fully opaque cpu_sym_session structure gives more flexibility
> > + *   to the PMD writers and again allows to avoid ABI breakages in future.
> > + * - process() function per set of xforms
> > + *   allows to expose different process() functions for different
> > + *   xform combinations. PMD writer can decide, does he wants to
> > + *   push all supported algorithms into one process() function,
> > + *   or spread it across several ones.
> > + *   I.E. More flexibility for PMD writer.
> 
> Which process function should be chosen is internal to PMD, how would that info
> be visible to the application or the library. These will get stored in the session private
> data. It would be upto the PMD writer, to store the per session process function in
> the session private data.
> 
> Process function would be a dev ops just like enc/deq operations and it should call
> The respective process API stored in the session private data.

That model (via devops) is possible, but has several drawbacks from my perspective:

1. It means we'll need to pass dev_id as a parameter to process() function.
Though in fact dev_id is not a relevant information for us here
(all we need is pointer to the session and pointer to the fuction to call)
and I tried to avoid using it in data-path functions for that API.
2. As you pointed in that case it will be just one process() function per device.
So if PMD would like to have several process() functions for different type of sessions  
(let say one per alg) first thing it has to do inside it's process() - read session data and
based on that, do a jump/call to particular internal sub-routine.
Something like:
driver_id = get_pmd_driver_id();
priv_ses = ses->sess_data[driver_id];
Then either:
switch(priv_sess->alg) {case XXX: process_XXX(priv_sess, ...);break;...}
OR 
priv_ses->process(priv_sess, ...);

to select and call the proper function.
Looks like totally unnecessary overhead to me.
Though if we'll have ability to query/extract some sort session_ops based on the xform -
we can avoid  this extra de-refererence+jump/call thing.

> 
> I am not sure if you would need a new session init API for this as nothing would be visible to
> the app or lib.
> 
> > + * - Not storing process() pointer inside the session -
> > + *   Allows user to choose does he want to store a process() pointer
> > + *   per session, or per group of sessions for that device that share
> > + *   the same input xforms. I.E. extra flexibility for the user,
> > + *   plus allows us to keep cpu_sym_session totally opaque, see above.
> 
> If multiple sessions need to be processed via the same process function,
> PMD would save the same process in all the sessions, I don't think there would
> be any perf overhead with that.

I think it would, see above.

> 
> > + * Sketched usage model:
> > + * ....
> > + * /* control path, alloc/init session */
> > + * int32_t sz = rte_crypto_cpu_sym_session_size(dev_id, &xform);
> > + * struct rte_crypto_cpu_sym_session *ses = user_alloc(..., sz);
> > + * rte_crypto_cpu_sym_process_t process =
> > + *     rte_crypto_cpu_sym_session_func(dev_id, &xform);
> > + * rte_crypto_cpu_sym_session_init(dev_id, ses, &xform);
> > + * ...
> > + * /* data-path*/
> > + * process(ses, ....);
> > + * ....
> > + * /* control path, termiante/free session */
> > + * rte_crypto_cpu_sym_session_fini(dev_id, ses);
> > + */
> > +
> > +/**
> > + * vector structure, contains pointer to vector array and the length
> > + * of the array
> > + */
> > +struct rte_crypto_vec {
> > +       struct iovec *vec;
> > +       uint32_t num;
> > +};
> > +
> > +/*
> > + * Data-path bulk process crypto function.
> > + */
> > +typedef void (*rte_crypto_cpu_sym_process_t)(
> > +               struct rte_crypto_cpu_sym_session *sess,
> > +               struct rte_crypto_vec buf[], void *iv[], void *aad[],
> > +               void *digest[], int status[], uint32_t num);
> > +/*
> > + * for given device return process function specific to input xforms
> > + * on error - return NULL and set rte_errno value.
> > + * Note that for same input xfroms for the same device should return
> > + * the same process function.
> > + */
> > +__rte_experimental
> > +rte_crypto_cpu_sym_process_t
> > +rte_crypto_cpu_sym_session_func(uint8_t dev_id,
> > +                       const struct rte_crypto_sym_xform *xforms);
> > +
> > +/*
> > + * Return required session size in bytes for given set of xforms.
> > + * if xforms == NULL, then return the max possible session size,
> > + * that would fit session for any supported by the device algorithm.
> > + * if CPU mode is not supported at all, or requeted in xform
> > + * algorithm is not supported, then return -ENOTSUP.
> > + */
> > +__rte_experimental
> > +int
> > +rte_crypto_cpu_sym_session_size(uint8_t dev_id,
> > +                       const struct rte_crypto_sym_xform *xforms);
> > +
> > +/*
> > + * Initialize session.
> > + * It is caller responsibility to allocate enough space for it.
> > + * See rte_crypto_cpu_sym_session_size above.
> > + */
> > +__rte_experimental
> > +int rte_crypto_cpu_sym_session_init(uint8_t dev_id,
> > +                       struct rte_crypto_cpu_sym_session *sess,
> > +                       const struct rte_crypto_sym_xform *xforms);
> > +
> > +__rte_experimental
> > +void
> > +rte_crypto_cpu_sym_session_fini(uint8_t dev_id,
> > +                       struct rte_crypto_cpu_sym_session *sess);
> > +
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > index defe05ea0..ed7e63fab 100644
> > --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > @@ -310,6 +310,20 @@ typedef void (*cryptodev_sym_free_session_t)(struct
> > rte_cryptodev *dev,
> >  typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev *dev,
> >                 struct rte_cryptodev_asym_session *sess);
> >
> > +typedef int (*cryptodev_cpu_sym_session_size_t) (struct rte_cryptodev *dev,
> > +                       const struct rte_crypto_sym_xform *xforms);
> > +
> > +typedef int (*cryptodev_cpu_sym_session_init_t) (struct rte_cryptodev *dev,
> > +                       struct rte_crypto_cpu_sym_session *sess,
> > +                       const struct rte_crypto_sym_xform *xforms);
> > +
> > +typedef void (*cryptodev_cpu_sym_session_fini_t) (struct rte_cryptodev *dev,
> > +                       struct rte_crypto_cpu_sym_session *sess);
> > +
> > +typedef rte_crypto_cpu_sym_process_t (*cryptodev_cpu_sym_session_func_t)
> > (
> > +                       struct rte_cryptodev *dev,
> > +                       const struct rte_crypto_sym_xform *xforms);
> > +
> >  /** Crypto device operations function pointer table */
> >  struct rte_cryptodev_ops {
> >         cryptodev_configure_t dev_configure;    /**< Configure device. */
> > @@ -343,6 +357,11 @@ struct rte_cryptodev_ops {
> >         /**< Clear a Crypto sessions private data. */
> >         cryptodev_asym_free_session_t asym_session_clear;
> >         /**< Clear a Crypto sessions private data. */
> > +
> > +       cryptodev_cpu_sym_session_size_t sym_cpu_session_get_size;
> > +       cryptodev_cpu_sym_session_func_t sym_cpu_session_get_func;
> > +       cryptodev_cpu_sym_session_init_t sym_cpu_session_init;
> > +       cryptodev_cpu_sym_session_fini_t sym_cpu_session_fini;
> >  };
> >
> >
> >


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-09-30 12:22                           ` Ananyev, Konstantin
@ 2019-09-30 13:43                             ` Akhil Goyal
  2019-10-01 14:49                               ` Ananyev, Konstantin
  0 siblings, 1 reply; 87+ messages in thread
From: Akhil Goyal @ 2019-09-30 13:43 UTC (permalink / raw)
  To: Ananyev, Konstantin, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon'
  Cc: Zhang, Roy Fan, Doherty, Declan, 'Anoob Joseph'


Hi Konstantin,
> 
> Hi Akhil,
> 
> > > > > > > > > > > > This action type allows the burst of symmetric crypto
> workload
> > > using
> > > > > > the
> > > > > > > > > > same
> > > > > > > > > > > > algorithm, key, and direction being processed by CPU cycles
> > > > > > > > synchronously.
> > > > > > > > > > > > This flexible action type does not require external hardware
> > > > > > involvement,
> > > > > > > > > > > > having the crypto workload processed synchronously, and is
> > > more
> > > > > > > > > > performant
> > > > > > > > > > > > than Cryptodev SW PMD due to the saved cycles on removed
> > > "async
> > > > > > > > mode
> > > > > > > > > > > > simulation" as well as 3 cacheline access of the crypto ops.
> > > > > > > > > > >
> > > > > > > > > > > Does that mean application will not call the
> > > cryptodev_enqueue_burst
> > > > > > and
> > > > > > > > > > corresponding dequeue burst.
> > > > > > > > > >
> > > > > > > > > > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> > > > > > > > > >
> > > > > > > > > > > It would be a new API something like process_packets and it
> will
> > > have
> > > > > > the
> > > > > > > > > > crypto processed packets while returning from the API?
> > > > > > > > > >
> > > > > > > > > > Yes, though the plan is that API will operate on raw data buffers,
> > > not
> > > > > > mbufs.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I still do not understand why we cannot do with the
> conventional
> > > > > > crypto lib
> > > > > > > > > > only.
> > > > > > > > > > > As far as I can understand, you are not doing any protocol
> > > processing
> > > > > > or
> > > > > > > > any
> > > > > > > > > > value add
> > > > > > > > > > > To the crypto processing. IMO, you just need a synchronous
> > > crypto
> > > > > > > > processing
> > > > > > > > > > API which
> > > > > > > > > > > Can be defined in cryptodev, you don't need to re-create a
> crypto
> > > > > > session
> > > > > > > > in
> > > > > > > > > > the name of
> > > > > > > > > > > Security session in the driver just to do a synchronous
> processing.
> > > > > > > > > >
> > > > > > > > > > I suppose your question is why not to have
> > > > > > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > > > > > The main reason is that would require disruptive changes in
> existing
> > > > > > > > cryptodev
> > > > > > > > > > API
> > > > > > > > > > (would cause ABI/API breakage).
> > > > > > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need
> > > some
> > > > > > extra
> > > > > > > > > > information
> > > > > > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > > > > > (cipher offset from the start of the buffer, might be something
> extra
> > > in
> > > > > > > > future).
> > > > > > > > >
> > > > > > > > > Cipher offset will be part of rte_crypto_op.
> > > > > > > >
> > > > > > > > fill/read (+ alloc/free) is one of the main things that slowdown
> current
> > > > > > crypto-op
> > > > > > > > approach.
> > > > > > > > That's why the general idea - have all data that wouldn't change
> from
> > > packet
> > > > > > to
> > > > > > > > packet
> > > > > > > > included into the session and setup it once at session_init().
> > > > > > >
> > > > > > > I agree that you cannot use crypto-op.
> > > > > > > You can have the new API in crypto.
> > > > > > > As per the current patch, you only need cipher_offset which you can
> have
> > > it as
> > > > > > a parameter until
> > > > > > > You get it approved in the crypto xform. I believe it will be beneficial
> in
> > > case of
> > > > > > other crypto cases as well.
> > > > > > > We can have cipher offset at both places(crypto-op and
> cipher_xform). It
> > > will
> > > > > > give flexibility to the user to
> > > > > > > override it.
> > > > > >
> > > > > > After having another thought on your proposal:
> > > > > > Probably we can introduce new rte_crypto_sym_xform_types for CPU
> > > related
> > > > > > stuff here?
> > > > >
> > > > > I also thought of adding new xforms, but that wont serve the purpose for
> > > may be all the cases.
> > > > > You would be needing all information currently available in the current
> > > xforms.
> > > > > So if you are adding new fields in the new xform, the size will be more
> than
> > > that of the union of xforms.
> > > > > ABI breakage would still be there.
> > > > >
> > > > > If you think a valid compression of the AEAD xform can be done, then
> that
> > > can be done for each of the
> > > > > Xforms and we can have a solution to this issue.
> > > >
> > > > I think that we can re-use iv.offset for our purposes (for crypto offset).
> > > > So for now we can make that path work without any ABI breakage.
> > > > Fan, please feel free to correct me here, if I missed something.
> > > > If in future we would need to add some extra information it might
> > > > require ABI breakage, though by now I don't envision anything particular to
> > > add.
> > > > Anyway, if there is no objection to go that way, we can try to make
> > > > these changes for v2.
> > > >
> > >
> > > Actually, after looking at it more deeply it appears not that easy as I thought
> it
> > > would be :)
> > > Below is a very draft version of proposed API additions.
> > > I think it avoids ABI breakages right now and provides enough flexibility for
> > > future extensions (if any).
> > > For now, it doesn't address your comments about naming conventions
> (_CPU_
> > > vs _SYNC_) , etc.
> > > but I suppose is comprehensive enough to provide a main idea beyond it.
> > > Akhil and other interested parties, please try to review and provide feedback
> > > ASAP,
> > > as related changes would take some time and we still like to hit 19.11
> deadline.
> > > Konstantin
> > >
> > >  diff --git a/lib/librte_cryptodev/rte_crypto_sym.h
> > > b/lib/librte_cryptodev/rte_crypto_sym.h
> > > index bc8da2466..c03069e23 100644
> > > --- a/lib/librte_cryptodev/rte_crypto_sym.h
> > > +++ b/lib/librte_cryptodev/rte_crypto_sym.h
> > > @@ -103,6 +103,9 @@ rte_crypto_cipher_operation_strings[];
> > >   *
> > >   * This structure contains data relating to Cipher (Encryption and Decryption)
> > >   *  use to create a session.
> > > + * Actually I was wrong saying that we don't have free space inside xforms.
> > > + * Making key struct packed (see below) allow us to regain 6B that could be
> > > + * used for future extensions.
> > >   */
> > >  struct rte_crypto_cipher_xform {
> > >         enum rte_crypto_cipher_operation op;
> > > @@ -116,7 +119,25 @@ struct rte_crypto_cipher_xform {
> > >         struct {
> > >                 const uint8_t *data;    /**< pointer to key data */
> > >                 uint16_t length;        /**< key length in bytes */
> > > -       } key;
> > > +       } __attribute__((__packed__)) key;
> > > +
> > > +       /**
> > > +         * offset for cipher to start within user provided data buffer.
> > > +        * Fan suggested another (and less space consuming way) -
> > > +         * reuse iv.offset space below, by changing:
> > > +        * struct {uint16_t offset, length;} iv;
> > > +        * to uunamed union:
> > > +        * union {
> > > +        *      struct {uint16_t offset, length;} iv;
> > > +        *      struct {uint16_t iv_len, crypto_offset} cpu_crypto_param;
> > > +        * };
> > > +        * Both approaches seems ok to me in general.
> >
> > No strong opinions here. OK with this one.
> >
> > > +        * Comments/suggestions are welcome.
> > > +         */
> > > +       uint16_t offset;
> 
> After another thought - it is probably a bit better to have offset as a separate
> field.
> In that case we can use the same xforms to create both type of sessions.
ok
> 
> > > +
> > > +       uint8_t reserved1[4];
> > > +
> > >         /**< Cipher key
> > >          *
> > >          * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operation, key.data
> will
> > > @@ -284,7 +305,7 @@ struct rte_crypto_auth_xform {
> > >         struct {
> > >                 const uint8_t *data;    /**< pointer to key data */
> > >                 uint16_t length;        /**< key length in bytes */
> > > -       } key;
> > > +       } __attribute__((__packed__)) key;
> > >         /**< Authentication key data.
> > >          * The authentication key length MUST be less than or equal to the
> > >          * block size of the algorithm. It is the callers responsibility to
> > > @@ -292,6 +313,8 @@ struct rte_crypto_auth_xform {
> > >          * (for example RFC 2104, FIPS 198a).
> > >          */
> > >
> > > +       uint8_t reserved1[6];
> > > +
> > >         struct {
> > >                 uint16_t offset;
> > >                 /**< Starting point for Initialisation Vector or Counter,
> > > @@ -376,7 +399,12 @@ struct rte_crypto_aead_xform {
> > >         struct {
> > >                 const uint8_t *data;    /**< pointer to key data */
> > >                 uint16_t length;        /**< key length in bytes */
> > > -       } key;
> > > +       } __attribute__((__packed__)) key;
> > > +
> > > +       /** offset for cipher to start within data buffer */
> > > +       uint16_t cipher_offset;
> > > +
> > > +       uint8_t reserved1[4];
> > >
> > >         struct {
> > >                 uint16_t offset;
> > > diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> > > b/lib/librte_cryptodev/rte_cryptodev.h
> > > index e175b838c..c0c7bfed7 100644
> > > --- a/lib/librte_cryptodev/rte_cryptodev.h
> > > +++ b/lib/librte_cryptodev/rte_cryptodev.h
> > > @@ -1272,6 +1272,101 @@ void *
> > >  rte_cryptodev_sym_session_get_user_data(
> > >                                         struct rte_cryptodev_sym_session *sess);
> > >
> > > +/*
> > > + * After several thoughts decided not to try to squeeze CPU_CRYPTO
> > > + * into existing rte_crypto_sym_session structure/API, but instead
> > > + * introduce an extentsion to it via new fully opaque
> > > + * struct rte_crypto_cpu_sym_session and additional related API.
> >
> >
> > What all things do we need to squeeze?
> > In this proposal I do not see the new struct cpu_sym_session  defined here.
> 
> The plan is to have it totally opaque to the user, i.e. just:
> struct rte_crypto_cpu_sym_session;
> in public header files.
> 
> > I believe you will have same lib API/struct for cpu_sym_session  and
> sym_session.
> 
> I thought about such way, but there are few things that looks clumsy to me:
> 1. Right now there is no 'type' (or so) field inside rte_cryptodev_sym_session,
> so it is not possible to easy distinguish what session do you have: lksd_sym or
> cpu_sym.
> In theory, there is a hole of 4B inside rte_cryptodev_sym_session, so we can add
> some extra field
> here, but in that case  we wouldn't be able to use the same xform for both
> lksd_sym or cpu_sym
> (which seems really plausible thing for me).
> 2.  Majority of rte_cryptodev_sym_session fields I think are unnecessary for
> rte_crypto_cpu_sym_session:
> sess_data[], opaque_data, user_data, nb_drivers.
> All that consumes space, that could be used somewhere else instead.
> 3. I am a bit reluctant to touch existing rte_cryptodev API - to avoid any
> breakages I can't foresee right now.
> From other side - if we'll add new functions/structs for cpu_sym_session we can
> mark it
> and keep it for some time as experimental, so further changes (if needed) would
> still be possible.
> 

OK let us assume that you have a separate structure. But I have a few queries:
1. how can multiple drivers use a same session
2. Can somebody use the scheduler pmd for scheduling the different type of payloads for the same session?

With your proposal the APIs would be very specific to your use case only.
When you would add more functionality to this sync API/struct, it will end up being the same API/struct.

Let us  see how close/ far we are from the existing APIs when the actual implementation is done.

> > I am not sure if that would be needed.
> > It would be internal to the driver that if synchronous processing is
> supported(from feature flag) and
> > Have relevant fields in xform(the newly added ones which are packed as per
> your suggestions) set,
> > It will create that type of session.
> >
> >
> > > + * Main points:
> > > + * - Current crypto-dev API is reasonably mature and it is desirable
> > > + *   to keep it unchanged (API/ABI stability). From other side, this
> > > + *   new sync API is new one and probably would require extra changes.
> > > + *   Having it as a new one allows to mark it as experimental, without
> > > + *   affecting existing one.
> > > + * - Fully opaque cpu_sym_session structure gives more flexibility
> > > + *   to the PMD writers and again allows to avoid ABI breakages in future.
> > > + * - process() function per set of xforms
> > > + *   allows to expose different process() functions for different
> > > + *   xform combinations. PMD writer can decide, does he wants to
> > > + *   push all supported algorithms into one process() function,
> > > + *   or spread it across several ones.
> > > + *   I.E. More flexibility for PMD writer.
> >
> > Which process function should be chosen is internal to PMD, how would that
> info
> > be visible to the application or the library. These will get stored in the session
> private
> > data. It would be upto the PMD writer, to store the per session process
> function in
> > the session private data.
> >
> > Process function would be a dev ops just like enc/deq operations and it should
> call
> > The respective process API stored in the session private data.
> 
> That model (via devops) is possible, but has several drawbacks from my
> perspective:
> 
> 1. It means we'll need to pass dev_id as a parameter to process() function.
> Though in fact dev_id is not a relevant information for us here
> (all we need is pointer to the session and pointer to the fuction to call)
> and I tried to avoid using it in data-path functions for that API.

You have a single vdev, but someone may have multiple vdevs for each thread, or may
Have same dev with multiple queues for each core.

> 2. As you pointed in that case it will be just one process() function per device.
> So if PMD would like to have several process() functions for different type of
> sessions
> (let say one per alg) first thing it has to do inside it's process() - read session data
> and
> based on that, do a jump/call to particular internal sub-routine.
> Something like:
> driver_id = get_pmd_driver_id();
> priv_ses = ses->sess_data[driver_id];
> Then either:
> switch(priv_sess->alg) {case XXX: process_XXX(priv_sess, ...);break;...}
> OR
> priv_ses->process(priv_sess, ...);
> 
> to select and call the proper function.
> Looks like totally unnecessary overhead to me.
> Though if we'll have ability to query/extract some sort session_ops based on the
> xform -
> we can avoid  this extra de-refererence+jump/call thing.

What is the issue in the priv_ses->process(); approach?
I don't understand what are you saving by not doing this.
In any case you would need to identify which session correspond to which process().
For that you would be doing it somewhere in your data path.

> 
> >
> > I am not sure if you would need a new session init API for this as nothing would
> be visible to
> > the app or lib.
> >
> > > + * - Not storing process() pointer inside the session -
> > > + *   Allows user to choose does he want to store a process() pointer
> > > + *   per session, or per group of sessions for that device that share
> > > + *   the same input xforms. I.E. extra flexibility for the user,
> > > + *   plus allows us to keep cpu_sym_session totally opaque, see above.
> >
> > If multiple sessions need to be processed via the same process function,
> > PMD would save the same process in all the sessions, I don't think there would
> > be any perf overhead with that.
> 
> I think it would, see above.
> 
> >
> > > + * Sketched usage model:
> > > + * ....
> > > + * /* control path, alloc/init session */
> > > + * int32_t sz = rte_crypto_cpu_sym_session_size(dev_id, &xform);
> > > + * struct rte_crypto_cpu_sym_session *ses = user_alloc(..., sz);
> > > + * rte_crypto_cpu_sym_process_t process =
> > > + *     rte_crypto_cpu_sym_session_func(dev_id, &xform);
> > > + * rte_crypto_cpu_sym_session_init(dev_id, ses, &xform);
> > > + * ...
> > > + * /* data-path*/
> > > + * process(ses, ....);
> > > + * ....
> > > + * /* control path, termiante/free session */
> > > + * rte_crypto_cpu_sym_session_fini(dev_id, ses);
> > > + */
> > > +
> > > +/**
> > > + * vector structure, contains pointer to vector array and the length
> > > + * of the array
> > > + */
> > > +struct rte_crypto_vec {
> > > +       struct iovec *vec;
> > > +       uint32_t num;
> > > +};
> > > +
> > > +/*
> > > + * Data-path bulk process crypto function.
> > > + */
> > > +typedef void (*rte_crypto_cpu_sym_process_t)(
> > > +               struct rte_crypto_cpu_sym_session *sess,
> > > +               struct rte_crypto_vec buf[], void *iv[], void *aad[],
> > > +               void *digest[], int status[], uint32_t num);
> > > +/*
> > > + * for given device return process function specific to input xforms
> > > + * on error - return NULL and set rte_errno value.
> > > + * Note that for same input xfroms for the same device should return
> > > + * the same process function.
> > > + */
> > > +__rte_experimental
> > > +rte_crypto_cpu_sym_process_t
> > > +rte_crypto_cpu_sym_session_func(uint8_t dev_id,
> > > +                       const struct rte_crypto_sym_xform *xforms);
> > > +
> > > +/*
> > > + * Return required session size in bytes for given set of xforms.
> > > + * if xforms == NULL, then return the max possible session size,
> > > + * that would fit session for any supported by the device algorithm.
> > > + * if CPU mode is not supported at all, or requeted in xform
> > > + * algorithm is not supported, then return -ENOTSUP.
> > > + */
> > > +__rte_experimental
> > > +int
> > > +rte_crypto_cpu_sym_session_size(uint8_t dev_id,
> > > +                       const struct rte_crypto_sym_xform *xforms);
> > > +
> > > +/*
> > > + * Initialize session.
> > > + * It is caller responsibility to allocate enough space for it.
> > > + * See rte_crypto_cpu_sym_session_size above.
> > > + */
> > > +__rte_experimental
> > > +int rte_crypto_cpu_sym_session_init(uint8_t dev_id,
> > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > +                       const struct rte_crypto_sym_xform *xforms);
> > > +
> > > +__rte_experimental
> > > +void
> > > +rte_crypto_cpu_sym_session_fini(uint8_t dev_id,
> > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > +
> > > +
> > >  #ifdef __cplusplus
> > >  }
> > >  #endif
> > > diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > index defe05ea0..ed7e63fab 100644
> > > --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > @@ -310,6 +310,20 @@ typedef void
> (*cryptodev_sym_free_session_t)(struct
> > > rte_cryptodev *dev,
> > >  typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev *dev,
> > >                 struct rte_cryptodev_asym_session *sess);
> > >
> > > +typedef int (*cryptodev_cpu_sym_session_size_t) (struct rte_cryptodev
> *dev,
> > > +                       const struct rte_crypto_sym_xform *xforms);
> > > +
> > > +typedef int (*cryptodev_cpu_sym_session_init_t) (struct rte_cryptodev
> *dev,
> > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > +                       const struct rte_crypto_sym_xform *xforms);
> > > +
> > > +typedef void (*cryptodev_cpu_sym_session_fini_t) (struct rte_cryptodev
> *dev,
> > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > +
> > > +typedef rte_crypto_cpu_sym_process_t
> (*cryptodev_cpu_sym_session_func_t)
> > > (
> > > +                       struct rte_cryptodev *dev,
> > > +                       const struct rte_crypto_sym_xform *xforms);
> > > +
> > >  /** Crypto device operations function pointer table */
> > >  struct rte_cryptodev_ops {
> > >         cryptodev_configure_t dev_configure;    /**< Configure device. */
> > > @@ -343,6 +357,11 @@ struct rte_cryptodev_ops {
> > >         /**< Clear a Crypto sessions private data. */
> > >         cryptodev_asym_free_session_t asym_session_clear;
> > >         /**< Clear a Crypto sessions private data. */
> > > +
> > > +       cryptodev_cpu_sym_session_size_t sym_cpu_session_get_size;
> > > +       cryptodev_cpu_sym_session_func_t sym_cpu_session_get_func;
> > > +       cryptodev_cpu_sym_session_init_t sym_cpu_session_init;
> > > +       cryptodev_cpu_sym_session_fini_t sym_cpu_session_fini;
> > >  };
> > >
> > >
> > >


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-09-30 13:43                             ` Akhil Goyal
@ 2019-10-01 14:49                               ` Ananyev, Konstantin
  2019-10-03 13:24                                 ` Akhil Goyal
  0 siblings, 1 reply; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-10-01 14:49 UTC (permalink / raw)
  To: Akhil Goyal, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon'
  Cc: Zhang, Roy Fan, Doherty, Declan, 'Anoob Joseph'


Hi Akhil,

> > > > > > > > > > > > > This action type allows the burst of symmetric crypto
> > workload
> > > > using
> > > > > > > the
> > > > > > > > > > > same
> > > > > > > > > > > > > algorithm, key, and direction being processed by CPU cycles
> > > > > > > > > synchronously.
> > > > > > > > > > > > > This flexible action type does not require external hardware
> > > > > > > involvement,
> > > > > > > > > > > > > having the crypto workload processed synchronously, and is
> > > > more
> > > > > > > > > > > performant
> > > > > > > > > > > > > than Cryptodev SW PMD due to the saved cycles on removed
> > > > "async
> > > > > > > > > mode
> > > > > > > > > > > > > simulation" as well as 3 cacheline access of the crypto ops.
> > > > > > > > > > > >
> > > > > > > > > > > > Does that mean application will not call the
> > > > cryptodev_enqueue_burst
> > > > > > > and
> > > > > > > > > > > corresponding dequeue burst.
> > > > > > > > > > >
> > > > > > > > > > > Yes, instead it just call rte_security_process_cpu_crypto_bulk(...)
> > > > > > > > > > >
> > > > > > > > > > > > It would be a new API something like process_packets and it
> > will
> > > > have
> > > > > > > the
> > > > > > > > > > > crypto processed packets while returning from the API?
> > > > > > > > > > >
> > > > > > > > > > > Yes, though the plan is that API will operate on raw data buffers,
> > > > not
> > > > > > > mbufs.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I still do not understand why we cannot do with the
> > conventional
> > > > > > > crypto lib
> > > > > > > > > > > only.
> > > > > > > > > > > > As far as I can understand, you are not doing any protocol
> > > > processing
> > > > > > > or
> > > > > > > > > any
> > > > > > > > > > > value add
> > > > > > > > > > > > To the crypto processing. IMO, you just need a synchronous
> > > > crypto
> > > > > > > > > processing
> > > > > > > > > > > API which
> > > > > > > > > > > > Can be defined in cryptodev, you don't need to re-create a
> > crypto
> > > > > > > session
> > > > > > > > > in
> > > > > > > > > > > the name of
> > > > > > > > > > > > Security session in the driver just to do a synchronous
> > processing.
> > > > > > > > > > >
> > > > > > > > > > > I suppose your question is why not to have
> > > > > > > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > > > > > > The main reason is that would require disruptive changes in
> > existing
> > > > > > > > > cryptodev
> > > > > > > > > > > API
> > > > > > > > > > > (would cause ABI/API breakage).
> > > > > > > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO need
> > > > some
> > > > > > > extra
> > > > > > > > > > > information
> > > > > > > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > > > > > > (cipher offset from the start of the buffer, might be something
> > extra
> > > > in
> > > > > > > > > future).
> > > > > > > > > >
> > > > > > > > > > Cipher offset will be part of rte_crypto_op.
> > > > > > > > >
> > > > > > > > > fill/read (+ alloc/free) is one of the main things that slowdown
> > current
> > > > > > > crypto-op
> > > > > > > > > approach.
> > > > > > > > > That's why the general idea - have all data that wouldn't change
> > from
> > > > packet
> > > > > > > to
> > > > > > > > > packet
> > > > > > > > > included into the session and setup it once at session_init().
> > > > > > > >
> > > > > > > > I agree that you cannot use crypto-op.
> > > > > > > > You can have the new API in crypto.
> > > > > > > > As per the current patch, you only need cipher_offset which you can
> > have
> > > > it as
> > > > > > > a parameter until
> > > > > > > > You get it approved in the crypto xform. I believe it will be beneficial
> > in
> > > > case of
> > > > > > > other crypto cases as well.
> > > > > > > > We can have cipher offset at both places(crypto-op and
> > cipher_xform). It
> > > > will
> > > > > > > give flexibility to the user to
> > > > > > > > override it.
> > > > > > >
> > > > > > > After having another thought on your proposal:
> > > > > > > Probably we can introduce new rte_crypto_sym_xform_types for CPU
> > > > related
> > > > > > > stuff here?
> > > > > >
> > > > > > I also thought of adding new xforms, but that wont serve the purpose for
> > > > may be all the cases.
> > > > > > You would be needing all information currently available in the current
> > > > xforms.
> > > > > > So if you are adding new fields in the new xform, the size will be more
> > than
> > > > that of the union of xforms.
> > > > > > ABI breakage would still be there.
> > > > > >
> > > > > > If you think a valid compression of the AEAD xform can be done, then
> > that
> > > > can be done for each of the
> > > > > > Xforms and we can have a solution to this issue.
> > > > >
> > > > > I think that we can re-use iv.offset for our purposes (for crypto offset).
> > > > > So for now we can make that path work without any ABI breakage.
> > > > > Fan, please feel free to correct me here, if I missed something.
> > > > > If in future we would need to add some extra information it might
> > > > > require ABI breakage, though by now I don't envision anything particular to
> > > > add.
> > > > > Anyway, if there is no objection to go that way, we can try to make
> > > > > these changes for v2.
> > > > >
> > > >
> > > > Actually, after looking at it more deeply it appears not that easy as I thought
> > it
> > > > would be :)
> > > > Below is a very draft version of proposed API additions.
> > > > I think it avoids ABI breakages right now and provides enough flexibility for
> > > > future extensions (if any).
> > > > For now, it doesn't address your comments about naming conventions
> > (_CPU_
> > > > vs _SYNC_) , etc.
> > > > but I suppose is comprehensive enough to provide a main idea beyond it.
> > > > Akhil and other interested parties, please try to review and provide feedback
> > > > ASAP,
> > > > as related changes would take some time and we still like to hit 19.11
> > deadline.
> > > > Konstantin
> > > >
> > > >  diff --git a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > index bc8da2466..c03069e23 100644
> > > > --- a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > +++ b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > @@ -103,6 +103,9 @@ rte_crypto_cipher_operation_strings[];
> > > >   *
> > > >   * This structure contains data relating to Cipher (Encryption and Decryption)
> > > >   *  use to create a session.
> > > > + * Actually I was wrong saying that we don't have free space inside xforms.
> > > > + * Making key struct packed (see below) allow us to regain 6B that could be
> > > > + * used for future extensions.
> > > >   */
> > > >  struct rte_crypto_cipher_xform {
> > > >         enum rte_crypto_cipher_operation op;
> > > > @@ -116,7 +119,25 @@ struct rte_crypto_cipher_xform {
> > > >         struct {
> > > >                 const uint8_t *data;    /**< pointer to key data */
> > > >                 uint16_t length;        /**< key length in bytes */
> > > > -       } key;
> > > > +       } __attribute__((__packed__)) key;
> > > > +
> > > > +       /**
> > > > +         * offset for cipher to start within user provided data buffer.
> > > > +        * Fan suggested another (and less space consuming way) -
> > > > +         * reuse iv.offset space below, by changing:
> > > > +        * struct {uint16_t offset, length;} iv;
> > > > +        * to uunamed union:
> > > > +        * union {
> > > > +        *      struct {uint16_t offset, length;} iv;
> > > > +        *      struct {uint16_t iv_len, crypto_offset} cpu_crypto_param;
> > > > +        * };
> > > > +        * Both approaches seems ok to me in general.
> > >
> > > No strong opinions here. OK with this one.
> > >
> > > > +        * Comments/suggestions are welcome.
> > > > +         */
> > > > +       uint16_t offset;
> >
> > After another thought - it is probably a bit better to have offset as a separate
> > field.
> > In that case we can use the same xforms to create both type of sessions.
> ok
> >
> > > > +
> > > > +       uint8_t reserved1[4];
> > > > +
> > > >         /**< Cipher key
> > > >          *
> > > >          * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operation, key.data
> > will
> > > > @@ -284,7 +305,7 @@ struct rte_crypto_auth_xform {
> > > >         struct {
> > > >                 const uint8_t *data;    /**< pointer to key data */
> > > >                 uint16_t length;        /**< key length in bytes */
> > > > -       } key;
> > > > +       } __attribute__((__packed__)) key;
> > > >         /**< Authentication key data.
> > > >          * The authentication key length MUST be less than or equal to the
> > > >          * block size of the algorithm. It is the callers responsibility to
> > > > @@ -292,6 +313,8 @@ struct rte_crypto_auth_xform {
> > > >          * (for example RFC 2104, FIPS 198a).
> > > >          */
> > > >
> > > > +       uint8_t reserved1[6];
> > > > +
> > > >         struct {
> > > >                 uint16_t offset;
> > > >                 /**< Starting point for Initialisation Vector or Counter,
> > > > @@ -376,7 +399,12 @@ struct rte_crypto_aead_xform {
> > > >         struct {
> > > >                 const uint8_t *data;    /**< pointer to key data */
> > > >                 uint16_t length;        /**< key length in bytes */
> > > > -       } key;
> > > > +       } __attribute__((__packed__)) key;
> > > > +
> > > > +       /** offset for cipher to start within data buffer */
> > > > +       uint16_t cipher_offset;
> > > > +
> > > > +       uint8_t reserved1[4];
> > > >
> > > >         struct {
> > > >                 uint16_t offset;
> > > > diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> > > > b/lib/librte_cryptodev/rte_cryptodev.h
> > > > index e175b838c..c0c7bfed7 100644
> > > > --- a/lib/librte_cryptodev/rte_cryptodev.h
> > > > +++ b/lib/librte_cryptodev/rte_cryptodev.h
> > > > @@ -1272,6 +1272,101 @@ void *
> > > >  rte_cryptodev_sym_session_get_user_data(
> > > >                                         struct rte_cryptodev_sym_session *sess);
> > > >
> > > > +/*
> > > > + * After several thoughts decided not to try to squeeze CPU_CRYPTO
> > > > + * into existing rte_crypto_sym_session structure/API, but instead
> > > > + * introduce an extentsion to it via new fully opaque
> > > > + * struct rte_crypto_cpu_sym_session and additional related API.
> > >
> > >
> > > What all things do we need to squeeze?
> > > In this proposal I do not see the new struct cpu_sym_session  defined here.
> >
> > The plan is to have it totally opaque to the user, i.e. just:
> > struct rte_crypto_cpu_sym_session;
> > in public header files.
> >
> > > I believe you will have same lib API/struct for cpu_sym_session  and
> > sym_session.
> >
> > I thought about such way, but there are few things that looks clumsy to me:
> > 1. Right now there is no 'type' (or so) field inside rte_cryptodev_sym_session,
> > so it is not possible to easy distinguish what session do you have: lksd_sym or
> > cpu_sym.
> > In theory, there is a hole of 4B inside rte_cryptodev_sym_session, so we can add
> > some extra field
> > here, but in that case  we wouldn't be able to use the same xform for both
> > lksd_sym or cpu_sym
> > (which seems really plausible thing for me).
> > 2.  Majority of rte_cryptodev_sym_session fields I think are unnecessary for
> > rte_crypto_cpu_sym_session:
> > sess_data[], opaque_data, user_data, nb_drivers.
> > All that consumes space, that could be used somewhere else instead.
> > 3. I am a bit reluctant to touch existing rte_cryptodev API - to avoid any
> > breakages I can't foresee right now.
> > From other side - if we'll add new functions/structs for cpu_sym_session we can
> > mark it
> > and keep it for some time as experimental, so further changes (if needed) would
> > still be possible.
> >
> 
> OK let us assume that you have a separate structure. But I have a few queries:
> 1. how can multiple drivers use a same session

As a short answer: they can't.
It is pretty much the same approach as with rte_security - each device needs to create/init its own session.
So upper layer would need to maintain its own array (or so) for such case.
Though the question is why would you like to have same session over multiple SW backed devices?
As it would be anyway just a synchronous function call that will be executed on the same cpu. 

> 2. Can somebody use the scheduler pmd for scheduling the different type of payloads for the same session?

In theory yes. 
Though for that scheduler pmd should have inside it's rte_crypto_cpu_sym_session an array of pointers to
the underlying devices sessions.

> 
> With your proposal the APIs would be very specific to your use case only.

Yes in some way.
I consider that API specific for SW backed crypto PMDs.
I can hardly see how any 'real HW' PMDs (lksd-none, lksd-proto) will benefit from it.
Current crypto-op API is very much HW oriented. 
Which is ok, that's for it was intended for, but I think we also need one that would be designed
for SW backed implementation in mind.

> When you would add more functionality to this sync API/struct, it will end up being the same API/struct.
> 
> Let us  see how close/ far we are from the existing APIs when the actual implementation is done.
> 
> > > I am not sure if that would be needed.
> > > It would be internal to the driver that if synchronous processing is
> > supported(from feature flag) and
> > > Have relevant fields in xform(the newly added ones which are packed as per
> > your suggestions) set,
> > > It will create that type of session.
> > >
> > >
> > > > + * Main points:
> > > > + * - Current crypto-dev API is reasonably mature and it is desirable
> > > > + *   to keep it unchanged (API/ABI stability). From other side, this
> > > > + *   new sync API is new one and probably would require extra changes.
> > > > + *   Having it as a new one allows to mark it as experimental, without
> > > > + *   affecting existing one.
> > > > + * - Fully opaque cpu_sym_session structure gives more flexibility
> > > > + *   to the PMD writers and again allows to avoid ABI breakages in future.
> > > > + * - process() function per set of xforms
> > > > + *   allows to expose different process() functions for different
> > > > + *   xform combinations. PMD writer can decide, does he wants to
> > > > + *   push all supported algorithms into one process() function,
> > > > + *   or spread it across several ones.
> > > > + *   I.E. More flexibility for PMD writer.
> > >
> > > Which process function should be chosen is internal to PMD, how would that
> > info
> > > be visible to the application or the library. These will get stored in the session
> > private
> > > data. It would be upto the PMD writer, to store the per session process
> > function in
> > > the session private data.
> > >
> > > Process function would be a dev ops just like enc/deq operations and it should
> > call
> > > The respective process API stored in the session private data.
> >
> > That model (via devops) is possible, but has several drawbacks from my
> > perspective:
> >
> > 1. It means we'll need to pass dev_id as a parameter to process() function.
> > Though in fact dev_id is not a relevant information for us here
> > (all we need is pointer to the session and pointer to the fuction to call)
> > and I tried to avoid using it in data-path functions for that API.
> 
> You have a single vdev, but someone may have multiple vdevs for each thread, or may
> Have same dev with multiple queues for each core.

That's fine. As I said above it is a SW backed implementation.
Each session has to be a separate entity that contains all necessary information
(keys, alg/mode info,  etc.)  to process input buffers.
Plus we need the actual function pointer to call.
I just don't see what for we need a dev_id in that situation.
Again, here we don't need care about queues and their pinning to cores.
If let say someone would like to process buffers from the same IPsec SA on 2
different cores in parallel, he can just create 2 sessions for the same xform,
give one to thread #1  and second to thread #2.
After that both threads are free to call process(this_thread_ses, ...) at will.  

> 
> > 2. As you pointed in that case it will be just one process() function per device.
> > So if PMD would like to have several process() functions for different type of
> > sessions
> > (let say one per alg) first thing it has to do inside it's process() - read session data
> > and
> > based on that, do a jump/call to particular internal sub-routine.
> > Something like:
> > driver_id = get_pmd_driver_id();
> > priv_ses = ses->sess_data[driver_id];
> > Then either:
> > switch(priv_sess->alg) {case XXX: process_XXX(priv_sess, ...);break;...}
> > OR
> > priv_ses->process(priv_sess, ...);
> >
> > to select and call the proper function.
> > Looks like totally unnecessary overhead to me.
> > Though if we'll have ability to query/extract some sort session_ops based on the
> > xform -
> > we can avoid  this extra de-refererence+jump/call thing.
> 
> What is the issue in the priv_ses->process(); approach?

Nothing at all.
What I am saying that schema with dev_ops 
dev[dev_id]->dev_ops.process(ses->priv_ses[driver_id], ...)
   |
   |-> priv_ses->process(...)

Has bigger overhead then just:
process(ses,...);

So what for to introduce extra-level of indirection here?

> I don't understand what are you saving by not doing this.
> In any case you would need to identify which session correspond to which process().

Yes, sure, but I think we can make user to store information that relationship,
in a way he likes: store process() pointer for each session, or group sessions
that share the same process() somehow, or...

> For that you would be doing it somewhere in your data path.

Why at data-path?
Only once at session creation/initialization time.
Or might be even once per group of sessions.

> 
> >
> > >
> > > I am not sure if you would need a new session init API for this as nothing would
> > be visible to
> > > the app or lib.
> > >
> > > > + * - Not storing process() pointer inside the session -
> > > > + *   Allows user to choose does he want to store a process() pointer
> > > > + *   per session, or per group of sessions for that device that share
> > > > + *   the same input xforms. I.E. extra flexibility for the user,
> > > > + *   plus allows us to keep cpu_sym_session totally opaque, see above.
> > >
> > > If multiple sessions need to be processed via the same process function,
> > > PMD would save the same process in all the sessions, I don't think there would
> > > be any perf overhead with that.
> >
> > I think it would, see above.
> >
> > >
> > > > + * Sketched usage model:
> > > > + * ....
> > > > + * /* control path, alloc/init session */
> > > > + * int32_t sz = rte_crypto_cpu_sym_session_size(dev_id, &xform);
> > > > + * struct rte_crypto_cpu_sym_session *ses = user_alloc(..., sz);
> > > > + * rte_crypto_cpu_sym_process_t process =
> > > > + *     rte_crypto_cpu_sym_session_func(dev_id, &xform);
> > > > + * rte_crypto_cpu_sym_session_init(dev_id, ses, &xform);
> > > > + * ...
> > > > + * /* data-path*/
> > > > + * process(ses, ....);
> > > > + * ....
> > > > + * /* control path, termiante/free session */
> > > > + * rte_crypto_cpu_sym_session_fini(dev_id, ses);
> > > > + */
> > > > +
> > > > +/**
> > > > + * vector structure, contains pointer to vector array and the length
> > > > + * of the array
> > > > + */
> > > > +struct rte_crypto_vec {
> > > > +       struct iovec *vec;
> > > > +       uint32_t num;
> > > > +};
> > > > +
> > > > +/*
> > > > + * Data-path bulk process crypto function.
> > > > + */
> > > > +typedef void (*rte_crypto_cpu_sym_process_t)(
> > > > +               struct rte_crypto_cpu_sym_session *sess,
> > > > +               struct rte_crypto_vec buf[], void *iv[], void *aad[],
> > > > +               void *digest[], int status[], uint32_t num);
> > > > +/*
> > > > + * for given device return process function specific to input xforms
> > > > + * on error - return NULL and set rte_errno value.
> > > > + * Note that for same input xfroms for the same device should return
> > > > + * the same process function.
> > > > + */
> > > > +__rte_experimental
> > > > +rte_crypto_cpu_sym_process_t
> > > > +rte_crypto_cpu_sym_session_func(uint8_t dev_id,
> > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > +
> > > > +/*
> > > > + * Return required session size in bytes for given set of xforms.
> > > > + * if xforms == NULL, then return the max possible session size,
> > > > + * that would fit session for any supported by the device algorithm.
> > > > + * if CPU mode is not supported at all, or requeted in xform
> > > > + * algorithm is not supported, then return -ENOTSUP.
> > > > + */
> > > > +__rte_experimental
> > > > +int
> > > > +rte_crypto_cpu_sym_session_size(uint8_t dev_id,
> > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > +
> > > > +/*
> > > > + * Initialize session.
> > > > + * It is caller responsibility to allocate enough space for it.
> > > > + * See rte_crypto_cpu_sym_session_size above.
> > > > + */
> > > > +__rte_experimental
> > > > +int rte_crypto_cpu_sym_session_init(uint8_t dev_id,
> > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > +
> > > > +__rte_experimental
> > > > +void
> > > > +rte_crypto_cpu_sym_session_fini(uint8_t dev_id,
> > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > +
> > > > +
> > > >  #ifdef __cplusplus
> > > >  }
> > > >  #endif
> > > > diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > index defe05ea0..ed7e63fab 100644
> > > > --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > @@ -310,6 +310,20 @@ typedef void
> > (*cryptodev_sym_free_session_t)(struct
> > > > rte_cryptodev *dev,
> > > >  typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev *dev,
> > > >                 struct rte_cryptodev_asym_session *sess);
> > > >
> > > > +typedef int (*cryptodev_cpu_sym_session_size_t) (struct rte_cryptodev
> > *dev,
> > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > +
> > > > +typedef int (*cryptodev_cpu_sym_session_init_t) (struct rte_cryptodev
> > *dev,
> > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > +
> > > > +typedef void (*cryptodev_cpu_sym_session_fini_t) (struct rte_cryptodev
> > *dev,
> > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > +
> > > > +typedef rte_crypto_cpu_sym_process_t
> > (*cryptodev_cpu_sym_session_func_t)
> > > > (
> > > > +                       struct rte_cryptodev *dev,
> > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > +
> > > >  /** Crypto device operations function pointer table */
> > > >  struct rte_cryptodev_ops {
> > > >         cryptodev_configure_t dev_configure;    /**< Configure device. */
> > > > @@ -343,6 +357,11 @@ struct rte_cryptodev_ops {
> > > >         /**< Clear a Crypto sessions private data. */
> > > >         cryptodev_asym_free_session_t asym_session_clear;
> > > >         /**< Clear a Crypto sessions private data. */
> > > > +
> > > > +       cryptodev_cpu_sym_session_size_t sym_cpu_session_get_size;
> > > > +       cryptodev_cpu_sym_session_func_t sym_cpu_session_get_func;
> > > > +       cryptodev_cpu_sym_session_init_t sym_cpu_session_init;
> > > > +       cryptodev_cpu_sym_session_fini_t sym_cpu_session_fini;
> > > >  };
> > > >
> > > >
> > > >


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [PATCH 01/10] security: introduce CPU Crypto action type and API
  2019-09-30  9:43         ` Hemant Agrawal
@ 2019-10-01 15:27           ` Ananyev, Konstantin
  2019-10-02  2:47             ` Hemant Agrawal
  0 siblings, 1 reply; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-10-01 15:27 UTC (permalink / raw)
  To: Hemant Agrawal, Zhang, Roy Fan, dev; +Cc: Doherty, Declan, Akhil Goyal


Hi Hemant,

> >>> This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO action type to
> >>> security library. The type represents performing crypto operation with CPU
> >>> cycles. The patch also includes a new API to process crypto operations in
> >>> bulk and the function pointers for PMDs.
> >>>
> >>> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> >>> ---
> >>>    lib/librte_security/rte_security.c           | 16 +++++++++
> >>>    lib/librte_security/rte_security.h           | 51 +++++++++++++++++++++++++++-
> >>>    lib/librte_security/rte_security_driver.h    | 19 +++++++++++
> >>>    lib/librte_security/rte_security_version.map |  1 +
> >>>    4 files changed, 86 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/lib/librte_security/rte_security.c b/lib/librte_security/rte_security.c
> >>> index bc81ce15d..0f85c1b59 100644
> >>> --- a/lib/librte_security/rte_security.c
> >>> +++ b/lib/librte_security/rte_security.c
> >>> @@ -141,3 +141,19 @@ rte_security_capability_get(struct rte_security_ctx *instance,
> >>>
> >>>    	return NULL;
> >>>    }
> >>> +
> >>> +void
> >>> +rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
> >>> +		struct rte_security_session *sess,
> >>> +		struct rte_security_vec buf[], void *iv[], void *aad[],
> >>> +		void *digest[], int status[], uint32_t num)
> >>> +{
> >>> +	uint32_t i;
> >>> +
> >>> +	for (i = 0; i < num; i++)
> >>> +		status[i] = -1;
> >>> +
> >>> +	RTE_FUNC_PTR_OR_RET(*instance->ops->process_cpu_crypto_bulk);
> >>> +	instance->ops->process_cpu_crypto_bulk(sess, buf, iv,
> >>> +			aad, digest, status, num);
> >>> +}
> >>> diff --git a/lib/librte_security/rte_security.h b/lib/librte_security/rte_security.h
> >>> index 96806e3a2..5a0f8901b 100644
> >>> --- a/lib/librte_security/rte_security.h
> >>> +++ b/lib/librte_security/rte_security.h
> >>> @@ -18,6 +18,7 @@ extern "C" {
> >>>    #endif
> >>>
> >>>    #include <sys/types.h>
> >>> +#include <sys/uio.h>
> >>>
> >>>    #include <netinet/in.h>
> >>>    #include <netinet/ip.h>
> >>> @@ -272,6 +273,20 @@ struct rte_security_pdcp_xform {
> >>>    	uint32_t hfn_threshold;
> >>>    };
> >>>
> >>> +struct rte_security_cpu_crypto_xform {
> >>> +	/** For cipher/authentication crypto operation the authentication may
> >>> +	 * cover more content then the cipher. E.g., for IPSec ESP encryption
> >>> +	 * with AES-CBC and SHA1-HMAC, the encryption happens after the ESP
> >>> +	 * header but whole packet (apart from MAC header) is authenticated.
> >>> +	 * The cipher_offset field is used to deduct the cipher data pointer
> >>> +	 * from the buffer to be processed.
> >>> +	 *
> >>> +	 * NOTE this parameter shall be ignored by AEAD algorithms, since it
> >>> +	 * uses the same offset for cipher and authentication.
> >>> +	 */
> >>> +	int32_t cipher_offset;
> >>> +};
> >>> +
> >>>    /**
> >>>     * Security session action type.
> >>>     */
> >>> @@ -286,10 +301,14 @@ enum rte_security_session_action_type {
> >>>    	/**< All security protocol processing is performed inline during
> >>>    	 * transmission
> >>>    	 */
> >>> -	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
> >>> +	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL,
> >>>    	/**< All security protocol processing including crypto is performed
> >>>    	 * on a lookaside accelerator
> >>>    	 */
> >>> +	RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> >>> +	/**< Crypto processing for security protocol is processed by CPU
> >>> +	 * synchronously
> >>> +	 */
> >> though you are naming it cpu crypto, but it is more like raw packet
> >> crypto, where you want to skip mbuf/crypto ops and directly wants to
> >> work on raw buffer.
> > Yes, but we do wat to do that (skip mbuf/crypto ops and use raw buffer),
> > because this API is destined for SW backed implementation.
> > For that case crypto-ops , mbuf, enqueue/dequeue are just unnecessary overhead.
> I agree, we are also planning to take advantage of it for some specific
> use-cases in future.
> >>>    };
> >>>
> >>>    /** Security session protocol definition */
> >>> @@ -315,6 +334,7 @@ struct rte_security_session_conf {
> >>>    		struct rte_security_ipsec_xform ipsec;
> >>>    		struct rte_security_macsec_xform macsec;
> >>>    		struct rte_security_pdcp_xform pdcp;
> >>> +		struct rte_security_cpu_crypto_xform cpucrypto;
> >>>    	};
> >>>    	/**< Configuration parameters for security session */
> >>>    	struct rte_crypto_sym_xform *crypto_xform;
> >>> @@ -639,6 +659,35 @@ const struct rte_security_capability *
> >>>    rte_security_capability_get(struct rte_security_ctx *instance,
> >>>    			    struct rte_security_capability_idx *idx);
> >>>
> >>> +/**
> >>> + * Security vector structure, contains pointer to vector array and the length
> >>> + * of the array
> >>> + */
> >>> +struct rte_security_vec {
> >>> +	struct iovec *vec;
> >>> +	uint32_t num;
> >>> +};
> >>> +
> >> Just wondering if you want to change it to *in_vec and *out_vec, that
> >> will be helpful in future, if the out-of-place processing is required
> >> for CPU usecase as well?
> > I suppose this is doable, though right now we don't plan to support such model.
> They will come handy in future. I plan to use it in future and we can
> skip the API/ABI breakage, if the placeholder are present
> >
> >>> +/**
> >>> + * Processing bulk crypto workload with CPU
> >>> + *
> >>> + * @param	instance	security instance.
> >>> + * @param	sess		security session
> >>> + * @param	buf		array of buffer SGL vectors
> >>> + * @param	iv		array of IV pointers
> >>> + * @param	aad		array of AAD pointers
> >>> + * @param	digest		array of digest pointers
> >>> + * @param	status		array of status for the function to return
> >>> + * @param	num		number of elements in each array
> >>> + *
> >>> + */
> >>> +__rte_experimental
> >>> +void
> >>> +rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
> >>> +		struct rte_security_session *sess,
> >>> +		struct rte_security_vec buf[], void *iv[], void *aad[],
> >>> +		void *digest[], int status[], uint32_t num);
> >>> +
> >> Why not make the return as int, to indicate whether this API completely
> >> failed or processed or have some valid status to look into?
> > Good point, will change as suggested.
> 
> I have another suggestions w.r.t iv, aad, digest etc. Why not put them
> in a structure, so that you will
> 
> be able to add/remove the variable without breaking the API prototype.


Just to confirm, you are talking about something like:

struct rte_security_vec {
   struct iovec *vec;
   uint32_t num;
};

struct rte_security_sym_vec {
      struct rte_security_vec buf;
      void *iv;
      void *aad;
      void *digest;
};

rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
	struct rte_security_session *sess, struct rte_security_sym_vec buf[],
               int status[], uint32_t num);

?
We thought about such way, though for PMD it would be
more plausible to have same type of params grouped together,
i.e. void *in[], void *out[], void *digest[], ...
Another thing - above grouping wouldn't help to avoid ABI breakage,
in case we'll need to add new field into rte_security_sym_vec
(though it might help to avoid API breakage).

In theory other way is also possible:
struct rte_security_sym_vec {
      struct rte_security_vec *buf;
      void **iv;
      void **aad;
      void **digest;
};

rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
	struct rte_security_session *sess, struct rte_security_sym_vec *buf,
               int status[], uint32_t num);

And that might help for both ABI and API stability, 
but it looks really weird that way (at least to me).
Also this API is experimental and I suppose needs to stay experimental for
few releases before we are sure nothing important is missing,
so probably API/ABI stability is not that high concern for it right now. 

Konstantin

 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [PATCH 01/10] security: introduce CPU Crypto action type and API
  2019-10-01 15:27           ` Ananyev, Konstantin
@ 2019-10-02  2:47             ` Hemant Agrawal
  0 siblings, 0 replies; 87+ messages in thread
From: Hemant Agrawal @ 2019-10-02  2:47 UTC (permalink / raw)
  To: Ananyev, Konstantin, Zhang, Roy Fan, dev; +Cc: Doherty, Declan, Akhil Goyal

Hi Konstantin,

> > >>> This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > >>> action type to security library. The type represents performing
> > >>> crypto operation with CPU cycles. The patch also includes a new
> > >>> API to process crypto operations in bulk and the function pointers for
> PMDs.
> > >>>
> > >>> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> > >>> ---
> > >>>    lib/librte_security/rte_security.c           | 16 +++++++++
> > >>>    lib/librte_security/rte_security.h           | 51
> +++++++++++++++++++++++++++-
> > >>>    lib/librte_security/rte_security_driver.h    | 19 +++++++++++
> > >>>    lib/librte_security/rte_security_version.map |  1 +
> > >>>    4 files changed, 86 insertions(+), 1 deletion(-)
> > >>>
> > >>> diff --git a/lib/librte_security/rte_security.c
> > >>> b/lib/librte_security/rte_security.c
> > >>> index bc81ce15d..0f85c1b59 100644
> > >>> --- a/lib/librte_security/rte_security.c
> > >>> +++ b/lib/librte_security/rte_security.c
> > >>> @@ -141,3 +141,19 @@ rte_security_capability_get(struct
> > >>> rte_security_ctx *instance,
> > >>>
> > >>>    	return NULL;
> > >>>    }
> > >>> +
> > >>> +void
> > >>> +rte_security_process_cpu_crypto_bulk(struct rte_security_ctx
> *instance,
> > >>> +		struct rte_security_session *sess,
> > >>> +		struct rte_security_vec buf[], void *iv[], void *aad[],
> > >>> +		void *digest[], int status[], uint32_t num) {
> > >>> +	uint32_t i;
> > >>> +
> > >>> +	for (i = 0; i < num; i++)
> > >>> +		status[i] = -1;
> > >>> +
> > >>> +	RTE_FUNC_PTR_OR_RET(*instance->ops->process_cpu_crypto_bulk);
> > >>> +	instance->ops->process_cpu_crypto_bulk(sess, buf, iv,
> > >>> +			aad, digest, status, num);
> > >>> +}
> > >>> diff --git a/lib/librte_security/rte_security.h
> > >>> b/lib/librte_security/rte_security.h
> > >>> index 96806e3a2..5a0f8901b 100644
> > >>> --- a/lib/librte_security/rte_security.h
> > >>> +++ b/lib/librte_security/rte_security.h
> > >>> @@ -18,6 +18,7 @@ extern "C" {
> > >>>    #endif
> > >>>
> > >>>    #include <sys/types.h>
> > >>> +#include <sys/uio.h>
> > >>>
> > >>>    #include <netinet/in.h>
> > >>>    #include <netinet/ip.h>
> > >>> @@ -272,6 +273,20 @@ struct rte_security_pdcp_xform {
> > >>>    	uint32_t hfn_threshold;
> > >>>    };
> > >>>
> > >>> +struct rte_security_cpu_crypto_xform {
> > >>> +	/** For cipher/authentication crypto operation the authentication
> may
> > >>> +	 * cover more content then the cipher. E.g., for IPSec ESP encryption
> > >>> +	 * with AES-CBC and SHA1-HMAC, the encryption happens after the
> ESP
> > >>> +	 * header but whole packet (apart from MAC header) is
> authenticated.
> > >>> +	 * The cipher_offset field is used to deduct the cipher data pointer
> > >>> +	 * from the buffer to be processed.
> > >>> +	 *
> > >>> +	 * NOTE this parameter shall be ignored by AEAD algorithms, since it
> > >>> +	 * uses the same offset for cipher and authentication.
> > >>> +	 */
> > >>> +	int32_t cipher_offset;
> > >>> +};
> > >>> +
> > >>>    /**
> > >>>     * Security session action type.
> > >>>     */
> > >>> @@ -286,10 +301,14 @@ enum rte_security_session_action_type {
> > >>>    	/**< All security protocol processing is performed inline during
> > >>>    	 * transmission
> > >>>    	 */
> > >>> -	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
> > >>> +	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL,
> > >>>    	/**< All security protocol processing including crypto is performed
> > >>>    	 * on a lookaside accelerator
> > >>>    	 */
> > >>> +	RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > >>> +	/**< Crypto processing for security protocol is processed by CPU
> > >>> +	 * synchronously
> > >>> +	 */
> > >> though you are naming it cpu crypto, but it is more like raw packet
> > >> crypto, where you want to skip mbuf/crypto ops and directly wants
> > >> to work on raw buffer.
> > > Yes, but we do wat to do that (skip mbuf/crypto ops and use raw
> > > buffer), because this API is destined for SW backed implementation.
> > > For that case crypto-ops , mbuf, enqueue/dequeue are just unnecessary
> overhead.
> > I agree, we are also planning to take advantage of it for some
> > specific use-cases in future.
> > >>>    };
> > >>>
> > >>>    /** Security session protocol definition */ @@ -315,6 +334,7 @@
> > >>> struct rte_security_session_conf {
> > >>>    		struct rte_security_ipsec_xform ipsec;
> > >>>    		struct rte_security_macsec_xform macsec;
> > >>>    		struct rte_security_pdcp_xform pdcp;
> > >>> +		struct rte_security_cpu_crypto_xform cpucrypto;
> > >>>    	};
> > >>>    	/**< Configuration parameters for security session */
> > >>>    	struct rte_crypto_sym_xform *crypto_xform; @@ -639,6 +659,35
> > >>> @@ const struct rte_security_capability *
> > >>>    rte_security_capability_get(struct rte_security_ctx *instance,
> > >>>    			    struct rte_security_capability_idx *idx);
> > >>>
> > >>> +/**
> > >>> + * Security vector structure, contains pointer to vector array
> > >>> +and the length
> > >>> + * of the array
> > >>> + */
> > >>> +struct rte_security_vec {
> > >>> +	struct iovec *vec;
> > >>> +	uint32_t num;
> > >>> +};
> > >>> +
> > >> Just wondering if you want to change it to *in_vec and *out_vec,
> > >> that will be helpful in future, if the out-of-place processing is
> > >> required for CPU usecase as well?
> > > I suppose this is doable, though right now we don't plan to support such
> model.
> > They will come handy in future. I plan to use it in future and we can
> > skip the API/ABI breakage, if the placeholder are present
> > >
> > >>> +/**
> > >>> + * Processing bulk crypto workload with CPU
> > >>> + *
> > >>> + * @param	instance	security instance.
> > >>> + * @param	sess		security session
> > >>> + * @param	buf		array of buffer SGL vectors
> > >>> + * @param	iv		array of IV pointers
> > >>> + * @param	aad		array of AAD pointers
> > >>> + * @param	digest		array of digest pointers
> > >>> + * @param	status		array of status for the function to
> return
> > >>> + * @param	num		number of elements in each array
> > >>> + *
> > >>> + */
> > >>> +__rte_experimental
> > >>> +void
> > >>> +rte_security_process_cpu_crypto_bulk(struct rte_security_ctx
> *instance,
> > >>> +		struct rte_security_session *sess,
> > >>> +		struct rte_security_vec buf[], void *iv[], void *aad[],
> > >>> +		void *digest[], int status[], uint32_t num);
> > >>> +
> > >> Why not make the return as int, to indicate whether this API
> > >> completely failed or processed or have some valid status to look into?
> > > Good point, will change as suggested.
> >
> > I have another suggestions w.r.t iv, aad, digest etc. Why not put them
> > in a structure, so that you will
> >
> > be able to add/remove the variable without breaking the API prototype.
> 
> 
> Just to confirm, you are talking about something like:
> 
> struct rte_security_vec {
>    struct iovec *vec;
>    uint32_t num;
> };

[Hemant] My idea is:
 struct rte_security_vec {
    struct iovec *vec;
    struct iovec *out_vec;
    uint32_t num_in;
    uint32_t num_out; 
};

> 
> struct rte_security_sym_vec {
>       struct rte_security_vec buf;
>       void *iv;
>       void *aad;
>       void *digest;
> };
> 
[Hemant]  or leave the rte_security_vec altogether and make it part of rte_security_sym_vec itself.

> rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
> 	struct rte_security_session *sess, struct rte_security_sym_vec buf[],
>                int status[], uint32_t num);
> 
> ?
> We thought about such way, though for PMD it would be more plausible to
> have same type of params grouped together, i.e. void *in[], void *out[], void
> *digest[], ...
> Another thing - above grouping wouldn't help to avoid ABI breakage, in case
> we'll need to add new field into rte_security_sym_vec (though it might help
> to avoid API breakage).
> 
> In theory other way is also possible:
> struct rte_security_sym_vec {
>       struct rte_security_vec *buf;
>       void **iv;
>       void **aad;
>       void **digest;
> };
> 
> rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
> 	struct rte_security_session *sess, struct rte_security_sym_vec *buf,
>                int status[], uint32_t num);
> 
> And that might help for both ABI and API stability, but it looks really weird
> that way (at least to me).

[Hemant] I am fine either way. 

> Also this API is experimental and I suppose needs to stay experimental for
> few releases before we are sure nothing important is missing, so probably
> API/ABI stability is not that high concern for it right now.
> 
> Konstantin
> 
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-10-01 14:49                               ` Ananyev, Konstantin
@ 2019-10-03 13:24                                 ` Akhil Goyal
  2019-10-07 12:53                                   ` Ananyev, Konstantin
  0 siblings, 1 reply; 87+ messages in thread
From: Akhil Goyal @ 2019-10-03 13:24 UTC (permalink / raw)
  To: Ananyev, Konstantin, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon'
  Cc: Zhang, Roy Fan, Doherty, Declan, 'Anoob Joseph'


Hi Konstantin,
> 
> Hi Akhil,
> 
> > > > > > > > > > > > > > This action type allows the burst of symmetric crypto
> > > workload
> > > > > using
> > > > > > > > the
> > > > > > > > > > > > same
> > > > > > > > > > > > > > algorithm, key, and direction being processed by CPU
> cycles
> > > > > > > > > > synchronously.
> > > > > > > > > > > > > > This flexible action type does not require external
> hardware
> > > > > > > > involvement,
> > > > > > > > > > > > > > having the crypto workload processed synchronously,
> and is
> > > > > more
> > > > > > > > > > > > performant
> > > > > > > > > > > > > > than Cryptodev SW PMD due to the saved cycles on
> removed
> > > > > "async
> > > > > > > > > > mode
> > > > > > > > > > > > > > simulation" as well as 3 cacheline access of the crypto
> ops.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Does that mean application will not call the
> > > > > cryptodev_enqueue_burst
> > > > > > > > and
> > > > > > > > > > > > corresponding dequeue burst.
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, instead it just call
> rte_security_process_cpu_crypto_bulk(...)
> > > > > > > > > > > >
> > > > > > > > > > > > > It would be a new API something like process_packets and
> it
> > > will
> > > > > have
> > > > > > > > the
> > > > > > > > > > > > crypto processed packets while returning from the API?
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, though the plan is that API will operate on raw data
> buffers,
> > > > > not
> > > > > > > > mbufs.
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I still do not understand why we cannot do with the
> > > conventional
> > > > > > > > crypto lib
> > > > > > > > > > > > only.
> > > > > > > > > > > > > As far as I can understand, you are not doing any protocol
> > > > > processing
> > > > > > > > or
> > > > > > > > > > any
> > > > > > > > > > > > value add
> > > > > > > > > > > > > To the crypto processing. IMO, you just need a
> synchronous
> > > > > crypto
> > > > > > > > > > processing
> > > > > > > > > > > > API which
> > > > > > > > > > > > > Can be defined in cryptodev, you don't need to re-create a
> > > crypto
> > > > > > > > session
> > > > > > > > > > in
> > > > > > > > > > > > the name of
> > > > > > > > > > > > > Security session in the driver just to do a synchronous
> > > processing.
> > > > > > > > > > > >
> > > > > > > > > > > > I suppose your question is why not to have
> > > > > > > > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > > > > > > > The main reason is that would require disruptive changes in
> > > existing
> > > > > > > > > > cryptodev
> > > > > > > > > > > > API
> > > > > > > > > > > > (would cause ABI/API breakage).
> > > > > > > > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> need
> > > > > some
> > > > > > > > extra
> > > > > > > > > > > > information
> > > > > > > > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > > > > > > > (cipher offset from the start of the buffer, might be
> something
> > > extra
> > > > > in
> > > > > > > > > > future).
> > > > > > > > > > >
> > > > > > > > > > > Cipher offset will be part of rte_crypto_op.
> > > > > > > > > >
> > > > > > > > > > fill/read (+ alloc/free) is one of the main things that slowdown
> > > current
> > > > > > > > crypto-op
> > > > > > > > > > approach.
> > > > > > > > > > That's why the general idea - have all data that wouldn't change
> > > from
> > > > > packet
> > > > > > > > to
> > > > > > > > > > packet
> > > > > > > > > > included into the session and setup it once at session_init().
> > > > > > > > >
> > > > > > > > > I agree that you cannot use crypto-op.
> > > > > > > > > You can have the new API in crypto.
> > > > > > > > > As per the current patch, you only need cipher_offset which you
> can
> > > have
> > > > > it as
> > > > > > > > a parameter until
> > > > > > > > > You get it approved in the crypto xform. I believe it will be
> beneficial
> > > in
> > > > > case of
> > > > > > > > other crypto cases as well.
> > > > > > > > > We can have cipher offset at both places(crypto-op and
> > > cipher_xform). It
> > > > > will
> > > > > > > > give flexibility to the user to
> > > > > > > > > override it.
> > > > > > > >
> > > > > > > > After having another thought on your proposal:
> > > > > > > > Probably we can introduce new rte_crypto_sym_xform_types for
> CPU
> > > > > related
> > > > > > > > stuff here?
> > > > > > >
> > > > > > > I also thought of adding new xforms, but that wont serve the purpose
> for
> > > > > may be all the cases.
> > > > > > > You would be needing all information currently available in the
> current
> > > > > xforms.
> > > > > > > So if you are adding new fields in the new xform, the size will be more
> > > than
> > > > > that of the union of xforms.
> > > > > > > ABI breakage would still be there.
> > > > > > >
> > > > > > > If you think a valid compression of the AEAD xform can be done, then
> > > that
> > > > > can be done for each of the
> > > > > > > Xforms and we can have a solution to this issue.
> > > > > >
> > > > > > I think that we can re-use iv.offset for our purposes (for crypto offset).
> > > > > > So for now we can make that path work without any ABI breakage.
> > > > > > Fan, please feel free to correct me here, if I missed something.
> > > > > > If in future we would need to add some extra information it might
> > > > > > require ABI breakage, though by now I don't envision anything
> particular to
> > > > > add.
> > > > > > Anyway, if there is no objection to go that way, we can try to make
> > > > > > these changes for v2.
> > > > > >
> > > > >
> > > > > Actually, after looking at it more deeply it appears not that easy as I
> thought
> > > it
> > > > > would be :)
> > > > > Below is a very draft version of proposed API additions.
> > > > > I think it avoids ABI breakages right now and provides enough flexibility
> for
> > > > > future extensions (if any).
> > > > > For now, it doesn't address your comments about naming conventions
> > > (_CPU_
> > > > > vs _SYNC_) , etc.
> > > > > but I suppose is comprehensive enough to provide a main idea beyond it.
> > > > > Akhil and other interested parties, please try to review and provide
> feedback
> > > > > ASAP,
> > > > > as related changes would take some time and we still like to hit 19.11
> > > deadline.
> > > > > Konstantin
> > > > >
> > > > >  diff --git a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > index bc8da2466..c03069e23 100644
> > > > > --- a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > +++ b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > @@ -103,6 +103,9 @@ rte_crypto_cipher_operation_strings[];
> > > > >   *
> > > > >   * This structure contains data relating to Cipher (Encryption and
> Decryption)
> > > > >   *  use to create a session.
> > > > > + * Actually I was wrong saying that we don't have free space inside
> xforms.
> > > > > + * Making key struct packed (see below) allow us to regain 6B that could
> be
> > > > > + * used for future extensions.
> > > > >   */
> > > > >  struct rte_crypto_cipher_xform {
> > > > >         enum rte_crypto_cipher_operation op;
> > > > > @@ -116,7 +119,25 @@ struct rte_crypto_cipher_xform {
> > > > >         struct {
> > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > -       } key;
> > > > > +       } __attribute__((__packed__)) key;
> > > > > +
> > > > > +       /**
> > > > > +         * offset for cipher to start within user provided data buffer.
> > > > > +        * Fan suggested another (and less space consuming way) -
> > > > > +         * reuse iv.offset space below, by changing:
> > > > > +        * struct {uint16_t offset, length;} iv;
> > > > > +        * to uunamed union:
> > > > > +        * union {
> > > > > +        *      struct {uint16_t offset, length;} iv;
> > > > > +        *      struct {uint16_t iv_len, crypto_offset} cpu_crypto_param;
> > > > > +        * };
> > > > > +        * Both approaches seems ok to me in general.
> > > >
> > > > No strong opinions here. OK with this one.
> > > >
> > > > > +        * Comments/suggestions are welcome.
> > > > > +         */
> > > > > +       uint16_t offset;
> > >
> > > After another thought - it is probably a bit better to have offset as a separate
> > > field.
> > > In that case we can use the same xforms to create both type of sessions.
> > ok
> > >
> > > > > +
> > > > > +       uint8_t reserved1[4];
> > > > > +
> > > > >         /**< Cipher key
> > > > >          *
> > > > >          * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operation,
> key.data
> > > will
> > > > > @@ -284,7 +305,7 @@ struct rte_crypto_auth_xform {
> > > > >         struct {
> > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > -       } key;
> > > > > +       } __attribute__((__packed__)) key;
> > > > >         /**< Authentication key data.
> > > > >          * The authentication key length MUST be less than or equal to the
> > > > >          * block size of the algorithm. It is the callers responsibility to
> > > > > @@ -292,6 +313,8 @@ struct rte_crypto_auth_xform {
> > > > >          * (for example RFC 2104, FIPS 198a).
> > > > >          */
> > > > >
> > > > > +       uint8_t reserved1[6];
> > > > > +
> > > > >         struct {
> > > > >                 uint16_t offset;
> > > > >                 /**< Starting point for Initialisation Vector or Counter,
> > > > > @@ -376,7 +399,12 @@ struct rte_crypto_aead_xform {
> > > > >         struct {
> > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > -       } key;
> > > > > +       } __attribute__((__packed__)) key;
> > > > > +
> > > > > +       /** offset for cipher to start within data buffer */
> > > > > +       uint16_t cipher_offset;
> > > > > +
> > > > > +       uint8_t reserved1[4];
> > > > >
> > > > >         struct {
> > > > >                 uint16_t offset;
> > > > > diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> > > > > b/lib/librte_cryptodev/rte_cryptodev.h
> > > > > index e175b838c..c0c7bfed7 100644
> > > > > --- a/lib/librte_cryptodev/rte_cryptodev.h
> > > > > +++ b/lib/librte_cryptodev/rte_cryptodev.h
> > > > > @@ -1272,6 +1272,101 @@ void *
> > > > >  rte_cryptodev_sym_session_get_user_data(
> > > > >                                         struct rte_cryptodev_sym_session *sess);
> > > > >
> > > > > +/*
> > > > > + * After several thoughts decided not to try to squeeze CPU_CRYPTO
> > > > > + * into existing rte_crypto_sym_session structure/API, but instead
> > > > > + * introduce an extentsion to it via new fully opaque
> > > > > + * struct rte_crypto_cpu_sym_session and additional related API.
> > > >
> > > >
> > > > What all things do we need to squeeze?
> > > > In this proposal I do not see the new struct cpu_sym_session  defined here.
> > >
> > > The plan is to have it totally opaque to the user, i.e. just:
> > > struct rte_crypto_cpu_sym_session;
> > > in public header files.
> > >
> > > > I believe you will have same lib API/struct for cpu_sym_session  and
> > > sym_session.
> > >
> > > I thought about such way, but there are few things that looks clumsy to me:
> > > 1. Right now there is no 'type' (or so) field inside rte_cryptodev_sym_session,
> > > so it is not possible to easy distinguish what session do you have: lksd_sym or
> > > cpu_sym.
> > > In theory, there is a hole of 4B inside rte_cryptodev_sym_session, so we can
> add
> > > some extra field
> > > here, but in that case  we wouldn't be able to use the same xform for both
> > > lksd_sym or cpu_sym
> > > (which seems really plausible thing for me).
> > > 2.  Majority of rte_cryptodev_sym_session fields I think are unnecessary for
> > > rte_crypto_cpu_sym_session:
> > > sess_data[], opaque_data, user_data, nb_drivers.
> > > All that consumes space, that could be used somewhere else instead.
> > > 3. I am a bit reluctant to touch existing rte_cryptodev API - to avoid any
> > > breakages I can't foresee right now.
> > > From other side - if we'll add new functions/structs for cpu_sym_session we
> can
> > > mark it
> > > and keep it for some time as experimental, so further changes (if needed)
> would
> > > still be possible.
> > >
> >
> > OK let us assume that you have a separate structure. But I have a few queries:
> > 1. how can multiple drivers use a same session
> 
> As a short answer: they can't.
> It is pretty much the same approach as with rte_security - each device needs to
> create/init its own session.
> So upper layer would need to maintain its own array (or so) for such case.
> Though the question is why would you like to have same session over multiple
> SW backed devices?
> As it would be anyway just a synchronous function call that will be executed on
> the same cpu.

I may have single FAT tunnel which may be distributed over multiple
Cores, and each core is affined to a different SW device.
So a single session may be accessed by multiple devices.

One more example would be depending on packet sizes, I may switch between
HW/SW PMDs with the same session.

> 
> > 2. Can somebody use the scheduler pmd for scheduling the different type of
> payloads for the same session?
> 
> In theory yes.
> Though for that scheduler pmd should have inside it's
> rte_crypto_cpu_sym_session an array of pointers to
> the underlying devices sessions.
> 
> >
> > With your proposal the APIs would be very specific to your use case only.
> 
> Yes in some way.
> I consider that API specific for SW backed crypto PMDs.
> I can hardly see how any 'real HW' PMDs (lksd-none, lksd-proto) will benefit
> from it.
> Current crypto-op API is very much HW oriented.
> Which is ok, that's for it was intended for, but I think we also need one that
> would be designed
> for SW backed implementation in mind.

We may re-use your API for HW PMDs as well which do not have requirement of
Crypto-op/mbuf etc.
The return type of your new process API may have a status which say 'processed'
Or can be say 'enqueued'. So if it is  'enqueued', we may have a new API for raw
Bufs dequeue as well.

This requirement can be for any hardware PMDs like QAT as well.
That is why a dev-ops would be a better option.

> 
> > When you would add more functionality to this sync API/struct, it will end up
> being the same API/struct.
> >
> > Let us  see how close/ far we are from the existing APIs when the actual
> implementation is done.
> >
> > > > I am not sure if that would be needed.
> > > > It would be internal to the driver that if synchronous processing is
> > > supported(from feature flag) and
> > > > Have relevant fields in xform(the newly added ones which are packed as
> per
> > > your suggestions) set,
> > > > It will create that type of session.
> > > >
> > > >
> > > > > + * Main points:
> > > > > + * - Current crypto-dev API is reasonably mature and it is desirable
> > > > > + *   to keep it unchanged (API/ABI stability). From other side, this
> > > > > + *   new sync API is new one and probably would require extra changes.
> > > > > + *   Having it as a new one allows to mark it as experimental, without
> > > > > + *   affecting existing one.
> > > > > + * - Fully opaque cpu_sym_session structure gives more flexibility
> > > > > + *   to the PMD writers and again allows to avoid ABI breakages in future.
> > > > > + * - process() function per set of xforms
> > > > > + *   allows to expose different process() functions for different
> > > > > + *   xform combinations. PMD writer can decide, does he wants to
> > > > > + *   push all supported algorithms into one process() function,
> > > > > + *   or spread it across several ones.
> > > > > + *   I.E. More flexibility for PMD writer.
> > > >
> > > > Which process function should be chosen is internal to PMD, how would
> that
> > > info
> > > > be visible to the application or the library. These will get stored in the
> session
> > > private
> > > > data. It would be upto the PMD writer, to store the per session process
> > > function in
> > > > the session private data.
> > > >
> > > > Process function would be a dev ops just like enc/deq operations and it
> should
> > > call
> > > > The respective process API stored in the session private data.
> > >
> > > That model (via devops) is possible, but has several drawbacks from my
> > > perspective:
> > >
> > > 1. It means we'll need to pass dev_id as a parameter to process() function.
> > > Though in fact dev_id is not a relevant information for us here
> > > (all we need is pointer to the session and pointer to the fuction to call)
> > > and I tried to avoid using it in data-path functions for that API.
> >
> > You have a single vdev, but someone may have multiple vdevs for each thread,
> or may
> > Have same dev with multiple queues for each core.
> 
> That's fine. As I said above it is a SW backed implementation.
> Each session has to be a separate entity that contains all necessary information
> (keys, alg/mode info,  etc.)  to process input buffers.
> Plus we need the actual function pointer to call.
> I just don't see what for we need a dev_id in that situation.

To iterate the session private data in the session.

> Again, here we don't need care about queues and their pinning to cores.
> If let say someone would like to process buffers from the same IPsec SA on 2
> different cores in parallel, he can just create 2 sessions for the same xform,
> give one to thread #1  and second to thread #2.
> After that both threads are free to call process(this_thread_ses, ...) at will.

Say you have a 16core device to handle 100G of traffic on a single tunnel.
Will we make 16 sessions with same parameters?

> 
> >
> > > 2. As you pointed in that case it will be just one process() function per device.
> > > So if PMD would like to have several process() functions for different type of
> > > sessions
> > > (let say one per alg) first thing it has to do inside it's process() - read session
> data
> > > and
> > > based on that, do a jump/call to particular internal sub-routine.
> > > Something like:
> > > driver_id = get_pmd_driver_id();
> > > priv_ses = ses->sess_data[driver_id];
> > > Then either:
> > > switch(priv_sess->alg) {case XXX: process_XXX(priv_sess, ...);break;...}
> > > OR
> > > priv_ses->process(priv_sess, ...);
> > >
> > > to select and call the proper function.
> > > Looks like totally unnecessary overhead to me.
> > > Though if we'll have ability to query/extract some sort session_ops based on
> the
> > > xform -
> > > we can avoid  this extra de-refererence+jump/call thing.
> >
> > What is the issue in the priv_ses->process(); approach?
> 
> Nothing at all.
> What I am saying that schema with dev_ops
> dev[dev_id]->dev_ops.process(ses->priv_ses[driver_id], ...)
>    |
>    |-> priv_ses->process(...)
> 
> Has bigger overhead then just:
> process(ses,...);
> 
> So what for to introduce extra-level of indirection here?

Explained above.

> 
> > I don't understand what are you saving by not doing this.
> > In any case you would need to identify which session correspond to which
> process().
> 
> Yes, sure, but I think we can make user to store information that relationship,
> in a way he likes: store process() pointer for each session, or group sessions
> that share the same process() somehow, or...

So whatever relationship that user will make and store will make its life complicated.
If we can hide that information in the driver, then what is the issue in that and user
Will not need to worry. He would just call the process() and driver will choose which
Process need to be called.

I think we should have a POC around this and see the difference in the cycle count.
IMO it would be negligible and we would end up making a generic API set which
can be used by others as well.

> 
> > For that you would be doing it somewhere in your data path.
> 
> Why at data-path?
> Only once at session creation/initialization time.
> Or might be even once per group of sessions.
> 
> >
> > >
> > > >
> > > > I am not sure if you would need a new session init API for this as nothing
> would
> > > be visible to
> > > > the app or lib.
> > > >
> > > > > + * - Not storing process() pointer inside the session -
> > > > > + *   Allows user to choose does he want to store a process() pointer
> > > > > + *   per session, or per group of sessions for that device that share
> > > > > + *   the same input xforms. I.E. extra flexibility for the user,
> > > > > + *   plus allows us to keep cpu_sym_session totally opaque, see above.
> > > >
> > > > If multiple sessions need to be processed via the same process function,
> > > > PMD would save the same process in all the sessions, I don't think there
> would
> > > > be any perf overhead with that.
> > >
> > > I think it would, see above.
> > >
> > > >
> > > > > + * Sketched usage model:
> > > > > + * ....
> > > > > + * /* control path, alloc/init session */
> > > > > + * int32_t sz = rte_crypto_cpu_sym_session_size(dev_id, &xform);
> > > > > + * struct rte_crypto_cpu_sym_session *ses = user_alloc(..., sz);
> > > > > + * rte_crypto_cpu_sym_process_t process =
> > > > > + *     rte_crypto_cpu_sym_session_func(dev_id, &xform);
> > > > > + * rte_crypto_cpu_sym_session_init(dev_id, ses, &xform);
> > > > > + * ...
> > > > > + * /* data-path*/
> > > > > + * process(ses, ....);
> > > > > + * ....
> > > > > + * /* control path, termiante/free session */
> > > > > + * rte_crypto_cpu_sym_session_fini(dev_id, ses);
> > > > > + */
> > > > > +
> > > > > +/**
> > > > > + * vector structure, contains pointer to vector array and the length
> > > > > + * of the array
> > > > > + */
> > > > > +struct rte_crypto_vec {
> > > > > +       struct iovec *vec;
> > > > > +       uint32_t num;
> > > > > +};
> > > > > +
> > > > > +/*
> > > > > + * Data-path bulk process crypto function.
> > > > > + */
> > > > > +typedef void (*rte_crypto_cpu_sym_process_t)(
> > > > > +               struct rte_crypto_cpu_sym_session *sess,
> > > > > +               struct rte_crypto_vec buf[], void *iv[], void *aad[],
> > > > > +               void *digest[], int status[], uint32_t num);
> > > > > +/*
> > > > > + * for given device return process function specific to input xforms
> > > > > + * on error - return NULL and set rte_errno value.
> > > > > + * Note that for same input xfroms for the same device should return
> > > > > + * the same process function.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +rte_crypto_cpu_sym_process_t
> > > > > +rte_crypto_cpu_sym_session_func(uint8_t dev_id,
> > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > +
> > > > > +/*
> > > > > + * Return required session size in bytes for given set of xforms.
> > > > > + * if xforms == NULL, then return the max possible session size,
> > > > > + * that would fit session for any supported by the device algorithm.
> > > > > + * if CPU mode is not supported at all, or requeted in xform
> > > > > + * algorithm is not supported, then return -ENOTSUP.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +int
> > > > > +rte_crypto_cpu_sym_session_size(uint8_t dev_id,
> > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > +
> > > > > +/*
> > > > > + * Initialize session.
> > > > > + * It is caller responsibility to allocate enough space for it.
> > > > > + * See rte_crypto_cpu_sym_session_size above.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +int rte_crypto_cpu_sym_session_init(uint8_t dev_id,
> > > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > +
> > > > > +__rte_experimental
> > > > > +void
> > > > > +rte_crypto_cpu_sym_session_fini(uint8_t dev_id,
> > > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > > +
> > > > > +
> > > > >  #ifdef __cplusplus
> > > > >  }
> > > > >  #endif
> > > > > diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > index defe05ea0..ed7e63fab 100644
> > > > > --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > @@ -310,6 +310,20 @@ typedef void
> > > (*cryptodev_sym_free_session_t)(struct
> > > > > rte_cryptodev *dev,
> > > > >  typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev
> *dev,
> > > > >                 struct rte_cryptodev_asym_session *sess);
> > > > >
> > > > > +typedef int (*cryptodev_cpu_sym_session_size_t) (struct rte_cryptodev
> > > *dev,
> > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > +
> > > > > +typedef int (*cryptodev_cpu_sym_session_init_t) (struct rte_cryptodev
> > > *dev,
> > > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > +
> > > > > +typedef void (*cryptodev_cpu_sym_session_fini_t) (struct rte_cryptodev
> > > *dev,
> > > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > > +
> > > > > +typedef rte_crypto_cpu_sym_process_t
> > > (*cryptodev_cpu_sym_session_func_t)
> > > > > (
> > > > > +                       struct rte_cryptodev *dev,
> > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > +
> > > > >  /** Crypto device operations function pointer table */
> > > > >  struct rte_cryptodev_ops {
> > > > >         cryptodev_configure_t dev_configure;    /**< Configure device. */
> > > > > @@ -343,6 +357,11 @@ struct rte_cryptodev_ops {
> > > > >         /**< Clear a Crypto sessions private data. */
> > > > >         cryptodev_asym_free_session_t asym_session_clear;
> > > > >         /**< Clear a Crypto sessions private data. */
> > > > > +
> > > > > +       cryptodev_cpu_sym_session_size_t sym_cpu_session_get_size;
> > > > > +       cryptodev_cpu_sym_session_func_t sym_cpu_session_get_func;
> > > > > +       cryptodev_cpu_sym_session_init_t sym_cpu_session_init;
> > > > > +       cryptodev_cpu_sym_session_fini_t sym_cpu_session_fini;
> > > > >  };
> > > > >
> > > > >
> > > > >


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-10-03 13:24                                 ` Akhil Goyal
@ 2019-10-07 12:53                                   ` Ananyev, Konstantin
  2019-10-09  7:20                                     ` Akhil Goyal
  0 siblings, 1 reply; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-10-07 12:53 UTC (permalink / raw)
  To: Akhil Goyal, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon'
  Cc: Zhang, Roy Fan, Doherty, Declan, 'Anoob Joseph'


Hi Akhil,

> > > > > > > > > > > > > > > This action type allows the burst of symmetric crypto
> > > > workload
> > > > > > using
> > > > > > > > > the
> > > > > > > > > > > > > same
> > > > > > > > > > > > > > > algorithm, key, and direction being processed by CPU
> > cycles
> > > > > > > > > > > synchronously.
> > > > > > > > > > > > > > > This flexible action type does not require external
> > hardware
> > > > > > > > > involvement,
> > > > > > > > > > > > > > > having the crypto workload processed synchronously,
> > and is
> > > > > > more
> > > > > > > > > > > > > performant
> > > > > > > > > > > > > > > than Cryptodev SW PMD due to the saved cycles on
> > removed
> > > > > > "async
> > > > > > > > > > > mode
> > > > > > > > > > > > > > > simulation" as well as 3 cacheline access of the crypto
> > ops.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Does that mean application will not call the
> > > > > > cryptodev_enqueue_burst
> > > > > > > > > and
> > > > > > > > > > > > > corresponding dequeue burst.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, instead it just call
> > rte_security_process_cpu_crypto_bulk(...)
> > > > > > > > > > > > >
> > > > > > > > > > > > > > It would be a new API something like process_packets and
> > it
> > > > will
> > > > > > have
> > > > > > > > > the
> > > > > > > > > > > > > crypto processed packets while returning from the API?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, though the plan is that API will operate on raw data
> > buffers,
> > > > > > not
> > > > > > > > > mbufs.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I still do not understand why we cannot do with the
> > > > conventional
> > > > > > > > > crypto lib
> > > > > > > > > > > > > only.
> > > > > > > > > > > > > > As far as I can understand, you are not doing any protocol
> > > > > > processing
> > > > > > > > > or
> > > > > > > > > > > any
> > > > > > > > > > > > > value add
> > > > > > > > > > > > > > To the crypto processing. IMO, you just need a
> > synchronous
> > > > > > crypto
> > > > > > > > > > > processing
> > > > > > > > > > > > > API which
> > > > > > > > > > > > > > Can be defined in cryptodev, you don't need to re-create a
> > > > crypto
> > > > > > > > > session
> > > > > > > > > > > in
> > > > > > > > > > > > > the name of
> > > > > > > > > > > > > > Security session in the driver just to do a synchronous
> > > > processing.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I suppose your question is why not to have
> > > > > > > > > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > > > > > > > > The main reason is that would require disruptive changes in
> > > > existing
> > > > > > > > > > > cryptodev
> > > > > > > > > > > > > API
> > > > > > > > > > > > > (would cause ABI/API breakage).
> > > > > > > > > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > need
> > > > > > some
> > > > > > > > > extra
> > > > > > > > > > > > > information
> > > > > > > > > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > > > > > > > > (cipher offset from the start of the buffer, might be
> > something
> > > > extra
> > > > > > in
> > > > > > > > > > > future).
> > > > > > > > > > > >
> > > > > > > > > > > > Cipher offset will be part of rte_crypto_op.
> > > > > > > > > > >
> > > > > > > > > > > fill/read (+ alloc/free) is one of the main things that slowdown
> > > > current
> > > > > > > > > crypto-op
> > > > > > > > > > > approach.
> > > > > > > > > > > That's why the general idea - have all data that wouldn't change
> > > > from
> > > > > > packet
> > > > > > > > > to
> > > > > > > > > > > packet
> > > > > > > > > > > included into the session and setup it once at session_init().
> > > > > > > > > >
> > > > > > > > > > I agree that you cannot use crypto-op.
> > > > > > > > > > You can have the new API in crypto.
> > > > > > > > > > As per the current patch, you only need cipher_offset which you
> > can
> > > > have
> > > > > > it as
> > > > > > > > > a parameter until
> > > > > > > > > > You get it approved in the crypto xform. I believe it will be
> > beneficial
> > > > in
> > > > > > case of
> > > > > > > > > other crypto cases as well.
> > > > > > > > > > We can have cipher offset at both places(crypto-op and
> > > > cipher_xform). It
> > > > > > will
> > > > > > > > > give flexibility to the user to
> > > > > > > > > > override it.
> > > > > > > > >
> > > > > > > > > After having another thought on your proposal:
> > > > > > > > > Probably we can introduce new rte_crypto_sym_xform_types for
> > CPU
> > > > > > related
> > > > > > > > > stuff here?
> > > > > > > >
> > > > > > > > I also thought of adding new xforms, but that wont serve the purpose
> > for
> > > > > > may be all the cases.
> > > > > > > > You would be needing all information currently available in the
> > current
> > > > > > xforms.
> > > > > > > > So if you are adding new fields in the new xform, the size will be more
> > > > than
> > > > > > that of the union of xforms.
> > > > > > > > ABI breakage would still be there.
> > > > > > > >
> > > > > > > > If you think a valid compression of the AEAD xform can be done, then
> > > > that
> > > > > > can be done for each of the
> > > > > > > > Xforms and we can have a solution to this issue.
> > > > > > >
> > > > > > > I think that we can re-use iv.offset for our purposes (for crypto offset).
> > > > > > > So for now we can make that path work without any ABI breakage.
> > > > > > > Fan, please feel free to correct me here, if I missed something.
> > > > > > > If in future we would need to add some extra information it might
> > > > > > > require ABI breakage, though by now I don't envision anything
> > particular to
> > > > > > add.
> > > > > > > Anyway, if there is no objection to go that way, we can try to make
> > > > > > > these changes for v2.
> > > > > > >
> > > > > >
> > > > > > Actually, after looking at it more deeply it appears not that easy as I
> > thought
> > > > it
> > > > > > would be :)
> > > > > > Below is a very draft version of proposed API additions.
> > > > > > I think it avoids ABI breakages right now and provides enough flexibility
> > for
> > > > > > future extensions (if any).
> > > > > > For now, it doesn't address your comments about naming conventions
> > > > (_CPU_
> > > > > > vs _SYNC_) , etc.
> > > > > > but I suppose is comprehensive enough to provide a main idea beyond it.
> > > > > > Akhil and other interested parties, please try to review and provide
> > feedback
> > > > > > ASAP,
> > > > > > as related changes would take some time and we still like to hit 19.11
> > > > deadline.
> > > > > > Konstantin
> > > > > >
> > > > > >  diff --git a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > index bc8da2466..c03069e23 100644
> > > > > > --- a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > +++ b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > @@ -103,6 +103,9 @@ rte_crypto_cipher_operation_strings[];
> > > > > >   *
> > > > > >   * This structure contains data relating to Cipher (Encryption and
> > Decryption)
> > > > > >   *  use to create a session.
> > > > > > + * Actually I was wrong saying that we don't have free space inside
> > xforms.
> > > > > > + * Making key struct packed (see below) allow us to regain 6B that could
> > be
> > > > > > + * used for future extensions.
> > > > > >   */
> > > > > >  struct rte_crypto_cipher_xform {
> > > > > >         enum rte_crypto_cipher_operation op;
> > > > > > @@ -116,7 +119,25 @@ struct rte_crypto_cipher_xform {
> > > > > >         struct {
> > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > -       } key;
> > > > > > +       } __attribute__((__packed__)) key;
> > > > > > +
> > > > > > +       /**
> > > > > > +         * offset for cipher to start within user provided data buffer.
> > > > > > +        * Fan suggested another (and less space consuming way) -
> > > > > > +         * reuse iv.offset space below, by changing:
> > > > > > +        * struct {uint16_t offset, length;} iv;
> > > > > > +        * to uunamed union:
> > > > > > +        * union {
> > > > > > +        *      struct {uint16_t offset, length;} iv;
> > > > > > +        *      struct {uint16_t iv_len, crypto_offset} cpu_crypto_param;
> > > > > > +        * };
> > > > > > +        * Both approaches seems ok to me in general.
> > > > >
> > > > > No strong opinions here. OK with this one.
> > > > >
> > > > > > +        * Comments/suggestions are welcome.
> > > > > > +         */
> > > > > > +       uint16_t offset;
> > > >
> > > > After another thought - it is probably a bit better to have offset as a separate
> > > > field.
> > > > In that case we can use the same xforms to create both type of sessions.
> > > ok
> > > >
> > > > > > +
> > > > > > +       uint8_t reserved1[4];
> > > > > > +
> > > > > >         /**< Cipher key
> > > > > >          *
> > > > > >          * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operation,
> > key.data
> > > > will
> > > > > > @@ -284,7 +305,7 @@ struct rte_crypto_auth_xform {
> > > > > >         struct {
> > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > -       } key;
> > > > > > +       } __attribute__((__packed__)) key;
> > > > > >         /**< Authentication key data.
> > > > > >          * The authentication key length MUST be less than or equal to the
> > > > > >          * block size of the algorithm. It is the callers responsibility to
> > > > > > @@ -292,6 +313,8 @@ struct rte_crypto_auth_xform {
> > > > > >          * (for example RFC 2104, FIPS 198a).
> > > > > >          */
> > > > > >
> > > > > > +       uint8_t reserved1[6];
> > > > > > +
> > > > > >         struct {
> > > > > >                 uint16_t offset;
> > > > > >                 /**< Starting point for Initialisation Vector or Counter,
> > > > > > @@ -376,7 +399,12 @@ struct rte_crypto_aead_xform {
> > > > > >         struct {
> > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > -       } key;
> > > > > > +       } __attribute__((__packed__)) key;
> > > > > > +
> > > > > > +       /** offset for cipher to start within data buffer */
> > > > > > +       uint16_t cipher_offset;
> > > > > > +
> > > > > > +       uint8_t reserved1[4];
> > > > > >
> > > > > >         struct {
> > > > > >                 uint16_t offset;
> > > > > > diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > b/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > index e175b838c..c0c7bfed7 100644
> > > > > > --- a/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > +++ b/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > @@ -1272,6 +1272,101 @@ void *
> > > > > >  rte_cryptodev_sym_session_get_user_data(
> > > > > >                                         struct rte_cryptodev_sym_session *sess);
> > > > > >
> > > > > > +/*
> > > > > > + * After several thoughts decided not to try to squeeze CPU_CRYPTO
> > > > > > + * into existing rte_crypto_sym_session structure/API, but instead
> > > > > > + * introduce an extentsion to it via new fully opaque
> > > > > > + * struct rte_crypto_cpu_sym_session and additional related API.
> > > > >
> > > > >
> > > > > What all things do we need to squeeze?
> > > > > In this proposal I do not see the new struct cpu_sym_session  defined here.
> > > >
> > > > The plan is to have it totally opaque to the user, i.e. just:
> > > > struct rte_crypto_cpu_sym_session;
> > > > in public header files.
> > > >
> > > > > I believe you will have same lib API/struct for cpu_sym_session  and
> > > > sym_session.
> > > >
> > > > I thought about such way, but there are few things that looks clumsy to me:
> > > > 1. Right now there is no 'type' (or so) field inside rte_cryptodev_sym_session,
> > > > so it is not possible to easy distinguish what session do you have: lksd_sym or
> > > > cpu_sym.
> > > > In theory, there is a hole of 4B inside rte_cryptodev_sym_session, so we can
> > add
> > > > some extra field
> > > > here, but in that case  we wouldn't be able to use the same xform for both
> > > > lksd_sym or cpu_sym
> > > > (which seems really plausible thing for me).
> > > > 2.  Majority of rte_cryptodev_sym_session fields I think are unnecessary for
> > > > rte_crypto_cpu_sym_session:
> > > > sess_data[], opaque_data, user_data, nb_drivers.
> > > > All that consumes space, that could be used somewhere else instead.
> > > > 3. I am a bit reluctant to touch existing rte_cryptodev API - to avoid any
> > > > breakages I can't foresee right now.
> > > > From other side - if we'll add new functions/structs for cpu_sym_session we
> > can
> > > > mark it
> > > > and keep it for some time as experimental, so further changes (if needed)
> > would
> > > > still be possible.
> > > >
> > >
> > > OK let us assume that you have a separate structure. But I have a few queries:
> > > 1. how can multiple drivers use a same session
> >
> > As a short answer: they can't.
> > It is pretty much the same approach as with rte_security - each device needs to
> > create/init its own session.
> > So upper layer would need to maintain its own array (or so) for such case.
> > Though the question is why would you like to have same session over multiple
> > SW backed devices?
> > As it would be anyway just a synchronous function call that will be executed on
> > the same cpu.
> 
> I may have single FAT tunnel which may be distributed over multiple
> Cores, and each core is affined to a different SW device.

If it is pure SW, then we don't need multiple devices for such scenario.
Device in that case is pure abstraction that we can skip.

> So a single session may be accessed by multiple devices.
> 
> One more example would be depending on packet sizes, I may switch between
> HW/SW PMDs with the same session.

Sure, but then we'll have multiple sessions.
BTW, we have same thing now - these private session pointers are just stored
inside the same rte_crypto_sym_session.
And if user wants to support this model, he would also need to store <dev_id, queue_id>
pair for each HW device anyway.

> 
> >
> > > 2. Can somebody use the scheduler pmd for scheduling the different type of
> > payloads for the same session?
> >
> > In theory yes.
> > Though for that scheduler pmd should have inside it's
> > rte_crypto_cpu_sym_session an array of pointers to
> > the underlying devices sessions.
> >
> > >
> > > With your proposal the APIs would be very specific to your use case only.
> >
> > Yes in some way.
> > I consider that API specific for SW backed crypto PMDs.
> > I can hardly see how any 'real HW' PMDs (lksd-none, lksd-proto) will benefit
> > from it.
> > Current crypto-op API is very much HW oriented.
> > Which is ok, that's for it was intended for, but I think we also need one that
> > would be designed
> > for SW backed implementation in mind.
> 
> We may re-use your API for HW PMDs as well which do not have requirement of
> Crypto-op/mbuf etc.
> The return type of your new process API may have a status which say 'processed'
> Or can be say 'enqueued'. So if it is  'enqueued', we may have a new API for raw
> Bufs dequeue as well.
> 
> This requirement can be for any hardware PMDs like QAT as well.

I don't think it is a good idea to extend this API for async (lookaside) devices.
You'll need to:
 - provide dev_id and queue_id for each process(enqueue) and dequeuer operation.
 - provide IOVA for all buffers passing to that function (data buffers, digest, IV, aad).
 - On dequeue provide some way to associate dequed data and digest buffers with
   crypto-session that was used  (and probably with mbuf).  
 So most likely we'll end up with another just version of our current crypto-op structure.  
If you'd like to get rid of mbufs dependency within current crypto-op API that understandable,
but I don't think we should have same API for both sync (CPU) and async (lookaside) cases. 
It doesn't seem feasible at all and voids whole purpose of that patch.

> That is why a dev-ops would be a better option.
> 
> >
> > > When you would add more functionality to this sync API/struct, it will end up
> > being the same API/struct.
> > >
> > > Let us  see how close/ far we are from the existing APIs when the actual
> > implementation is done.
> > >
> > > > > I am not sure if that would be needed.
> > > > > It would be internal to the driver that if synchronous processing is
> > > > supported(from feature flag) and
> > > > > Have relevant fields in xform(the newly added ones which are packed as
> > per
> > > > your suggestions) set,
> > > > > It will create that type of session.
> > > > >
> > > > >
> > > > > > + * Main points:
> > > > > > + * - Current crypto-dev API is reasonably mature and it is desirable
> > > > > > + *   to keep it unchanged (API/ABI stability). From other side, this
> > > > > > + *   new sync API is new one and probably would require extra changes.
> > > > > > + *   Having it as a new one allows to mark it as experimental, without
> > > > > > + *   affecting existing one.
> > > > > > + * - Fully opaque cpu_sym_session structure gives more flexibility
> > > > > > + *   to the PMD writers and again allows to avoid ABI breakages in future.
> > > > > > + * - process() function per set of xforms
> > > > > > + *   allows to expose different process() functions for different
> > > > > > + *   xform combinations. PMD writer can decide, does he wants to
> > > > > > + *   push all supported algorithms into one process() function,
> > > > > > + *   or spread it across several ones.
> > > > > > + *   I.E. More flexibility for PMD writer.
> > > > >
> > > > > Which process function should be chosen is internal to PMD, how would
> > that
> > > > info
> > > > > be visible to the application or the library. These will get stored in the
> > session
> > > > private
> > > > > data. It would be upto the PMD writer, to store the per session process
> > > > function in
> > > > > the session private data.
> > > > >
> > > > > Process function would be a dev ops just like enc/deq operations and it
> > should
> > > > call
> > > > > The respective process API stored in the session private data.
> > > >
> > > > That model (via devops) is possible, but has several drawbacks from my
> > > > perspective:
> > > >
> > > > 1. It means we'll need to pass dev_id as a parameter to process() function.
> > > > Though in fact dev_id is not a relevant information for us here
> > > > (all we need is pointer to the session and pointer to the fuction to call)
> > > > and I tried to avoid using it in data-path functions for that API.
> > >
> > > You have a single vdev, but someone may have multiple vdevs for each thread,
> > or may
> > > Have same dev with multiple queues for each core.
> >
> > That's fine. As I said above it is a SW backed implementation.
> > Each session has to be a separate entity that contains all necessary information
> > (keys, alg/mode info,  etc.)  to process input buffers.
> > Plus we need the actual function pointer to call.
> > I just don't see what for we need a dev_id in that situation.
> 
> To iterate the session private data in the session.
> 
> > Again, here we don't need care about queues and their pinning to cores.
> > If let say someone would like to process buffers from the same IPsec SA on 2
> > different cores in parallel, he can just create 2 sessions for the same xform,
> > give one to thread #1  and second to thread #2.
> > After that both threads are free to call process(this_thread_ses, ...) at will.
> 
> Say you have a 16core device to handle 100G of traffic on a single tunnel.
> Will we make 16 sessions with same parameters?

Absolutely same question we can ask for current crypto-op API.
You have lookaside crypto-dev with 16 HW queues, each queue is serviced by different CPU.
For the same SA, do you need a separate session per queue, or is it ok to reuse current one?
AFAIK, right now this is a grey area not clearly defined.
For crypto-devs I am aware - user can reuse the same session (as PMD uses it read-only).
But again, right now I think it is not clearly defined and is implementation specific.

> 
> >
> > >
> > > > 2. As you pointed in that case it will be just one process() function per device.
> > > > So if PMD would like to have several process() functions for different type of
> > > > sessions
> > > > (let say one per alg) first thing it has to do inside it's process() - read session
> > data
> > > > and
> > > > based on that, do a jump/call to particular internal sub-routine.
> > > > Something like:
> > > > driver_id = get_pmd_driver_id();
> > > > priv_ses = ses->sess_data[driver_id];
> > > > Then either:
> > > > switch(priv_sess->alg) {case XXX: process_XXX(priv_sess, ...);break;...}
> > > > OR
> > > > priv_ses->process(priv_sess, ...);
> > > >
> > > > to select and call the proper function.
> > > > Looks like totally unnecessary overhead to me.
> > > > Though if we'll have ability to query/extract some sort session_ops based on
> > the
> > > > xform -
> > > > we can avoid  this extra de-refererence+jump/call thing.
> > >
> > > What is the issue in the priv_ses->process(); approach?
> >
> > Nothing at all.
> > What I am saying that schema with dev_ops
> > dev[dev_id]->dev_ops.process(ses->priv_ses[driver_id], ...)
> >    |
> >    |-> priv_ses->process(...)
> >
> > Has bigger overhead then just:
> > process(ses,...);
> >
> > So what for to introduce extra-level of indirection here?
> 
> Explained above.
> 
> >
> > > I don't understand what are you saving by not doing this.
> > > In any case you would need to identify which session correspond to which
> > process().
> >
> > Yes, sure, but I think we can make user to store information that relationship,
> > in a way he likes: store process() pointer for each session, or group sessions
> > that share the same process() somehow, or...
> 
> So whatever relationship that user will make and store will make its life complicated.
> If we can hide that information in the driver, then what is the issue in that and user
> Will not need to worry. He would just call the process() and driver will choose which
> Process need to be called.

Driver can do that at config/init time.
Then at run-time we can avoid that choice at all and call already chosen function.

> 
> I think we should have a POC around this and see the difference in the cycle count.
> IMO it would be negligible and we would end up making a generic API set which
> can be used by others as well.
> 
> >
> > > For that you would be doing it somewhere in your data path.
> >
> > Why at data-path?
> > Only once at session creation/initialization time.
> > Or might be even once per group of sessions.
> >
> > >
> > > >
> > > > >
> > > > > I am not sure if you would need a new session init API for this as nothing
> > would
> > > > be visible to
> > > > > the app or lib.
> > > > >
> > > > > > + * - Not storing process() pointer inside the session -
> > > > > > + *   Allows user to choose does he want to store a process() pointer
> > > > > > + *   per session, or per group of sessions for that device that share
> > > > > > + *   the same input xforms. I.E. extra flexibility for the user,
> > > > > > + *   plus allows us to keep cpu_sym_session totally opaque, see above.
> > > > >
> > > > > If multiple sessions need to be processed via the same process function,
> > > > > PMD would save the same process in all the sessions, I don't think there
> > would
> > > > > be any perf overhead with that.
> > > >
> > > > I think it would, see above.
> > > >
> > > > >
> > > > > > + * Sketched usage model:
> > > > > > + * ....
> > > > > > + * /* control path, alloc/init session */
> > > > > > + * int32_t sz = rte_crypto_cpu_sym_session_size(dev_id, &xform);
> > > > > > + * struct rte_crypto_cpu_sym_session *ses = user_alloc(..., sz);
> > > > > > + * rte_crypto_cpu_sym_process_t process =
> > > > > > + *     rte_crypto_cpu_sym_session_func(dev_id, &xform);
> > > > > > + * rte_crypto_cpu_sym_session_init(dev_id, ses, &xform);
> > > > > > + * ...
> > > > > > + * /* data-path*/
> > > > > > + * process(ses, ....);
> > > > > > + * ....
> > > > > > + * /* control path, termiante/free session */
> > > > > > + * rte_crypto_cpu_sym_session_fini(dev_id, ses);
> > > > > > + */
> > > > > > +
> > > > > > +/**
> > > > > > + * vector structure, contains pointer to vector array and the length
> > > > > > + * of the array
> > > > > > + */
> > > > > > +struct rte_crypto_vec {
> > > > > > +       struct iovec *vec;
> > > > > > +       uint32_t num;
> > > > > > +};
> > > > > > +
> > > > > > +/*
> > > > > > + * Data-path bulk process crypto function.
> > > > > > + */
> > > > > > +typedef void (*rte_crypto_cpu_sym_process_t)(
> > > > > > +               struct rte_crypto_cpu_sym_session *sess,
> > > > > > +               struct rte_crypto_vec buf[], void *iv[], void *aad[],
> > > > > > +               void *digest[], int status[], uint32_t num);
> > > > > > +/*
> > > > > > + * for given device return process function specific to input xforms
> > > > > > + * on error - return NULL and set rte_errno value.
> > > > > > + * Note that for same input xfroms for the same device should return
> > > > > > + * the same process function.
> > > > > > + */
> > > > > > +__rte_experimental
> > > > > > +rte_crypto_cpu_sym_process_t
> > > > > > +rte_crypto_cpu_sym_session_func(uint8_t dev_id,
> > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > +
> > > > > > +/*
> > > > > > + * Return required session size in bytes for given set of xforms.
> > > > > > + * if xforms == NULL, then return the max possible session size,
> > > > > > + * that would fit session for any supported by the device algorithm.
> > > > > > + * if CPU mode is not supported at all, or requeted in xform
> > > > > > + * algorithm is not supported, then return -ENOTSUP.
> > > > > > + */
> > > > > > +__rte_experimental
> > > > > > +int
> > > > > > +rte_crypto_cpu_sym_session_size(uint8_t dev_id,
> > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > +
> > > > > > +/*
> > > > > > + * Initialize session.
> > > > > > + * It is caller responsibility to allocate enough space for it.
> > > > > > + * See rte_crypto_cpu_sym_session_size above.
> > > > > > + */
> > > > > > +__rte_experimental
> > > > > > +int rte_crypto_cpu_sym_session_init(uint8_t dev_id,
> > > > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > +
> > > > > > +__rte_experimental
> > > > > > +void
> > > > > > +rte_crypto_cpu_sym_session_fini(uint8_t dev_id,
> > > > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > > > +
> > > > > > +
> > > > > >  #ifdef __cplusplus
> > > > > >  }
> > > > > >  #endif
> > > > > > diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > index defe05ea0..ed7e63fab 100644
> > > > > > --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > @@ -310,6 +310,20 @@ typedef void
> > > > (*cryptodev_sym_free_session_t)(struct
> > > > > > rte_cryptodev *dev,
> > > > > >  typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev
> > *dev,
> > > > > >                 struct rte_cryptodev_asym_session *sess);
> > > > > >
> > > > > > +typedef int (*cryptodev_cpu_sym_session_size_t) (struct rte_cryptodev
> > > > *dev,
> > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > +
> > > > > > +typedef int (*cryptodev_cpu_sym_session_init_t) (struct rte_cryptodev
> > > > *dev,
> > > > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > +
> > > > > > +typedef void (*cryptodev_cpu_sym_session_fini_t) (struct rte_cryptodev
> > > > *dev,
> > > > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > > > +
> > > > > > +typedef rte_crypto_cpu_sym_process_t
> > > > (*cryptodev_cpu_sym_session_func_t)
> > > > > > (
> > > > > > +                       struct rte_cryptodev *dev,
> > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > +
> > > > > >  /** Crypto device operations function pointer table */
> > > > > >  struct rte_cryptodev_ops {
> > > > > >         cryptodev_configure_t dev_configure;    /**< Configure device. */
> > > > > > @@ -343,6 +357,11 @@ struct rte_cryptodev_ops {
> > > > > >         /**< Clear a Crypto sessions private data. */
> > > > > >         cryptodev_asym_free_session_t asym_session_clear;
> > > > > >         /**< Clear a Crypto sessions private data. */
> > > > > > +
> > > > > > +       cryptodev_cpu_sym_session_size_t sym_cpu_session_get_size;
> > > > > > +       cryptodev_cpu_sym_session_func_t sym_cpu_session_get_func;
> > > > > > +       cryptodev_cpu_sym_session_init_t sym_cpu_session_init;
> > > > > > +       cryptodev_cpu_sym_session_fini_t sym_cpu_session_fini;
> > > > > >  };
> > > > > >
> > > > > >
> > > > > >


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH v2 00/10] security: add software synchronous crypto process
  2019-09-06 13:13 ` [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process Fan Zhang
                     ` (10 preceding siblings ...)
  2019-09-09 12:43   ` [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process Aaron Conole
@ 2019-10-07 16:28   ` Fan Zhang
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 01/10] security: introduce CPU Crypto action type and API Fan Zhang
                       ` (9 more replies)
  11 siblings, 10 replies; 87+ messages in thread
From: Fan Zhang @ 2019-10-07 16:28 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

This RFC patch adds a way to rte_security to process symmetric crypto
workload in bulk synchronously for SW crypto devices.

Originally both SW and HW crypto PMDs works under rte_cryptodev to
process the crypto workload asynchronously. This way provides uniformity
to both PMD types but also introduce unnecessary performance penalty to
SW PMDs such as extra SW ring enqueue/dequeue steps to "simulate"
asynchronous working manner and unnecessary HW addresses computation.

We introduce a new way for SW crypto devices that perform crypto operation
synchronously with only fields required for the computation as input.

In rte_security, a new action type "RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO"
is introduced. This action type allows the burst of symmetric crypto
workload using the same algorithm, key, and direction being processed by
CPU cycles synchronously. This flexible action type does not require
external hardware involvement.

This patch also includes the announcement of a new API
"rte_security_process_cpu_crypto_bulk". With this API the packet is sent to
the crypto device for symmetric crypto processing. The device will encrypt
or decrypt the buffer based on the session data specified and preprocessed
in the security session. Different than the inline or lookaside modes, when
the function exits, the user will expect the buffers are either processed
successfully, or having the error number assigned to the appropriate index
of the status array.

The proof-of-concept AESNI-GCM and AESNI-MB SW PMDs are updated with the
support of this new method. To demonstrate the performance gain with
this method 2 simple performance evaluation apps under unit-test are added
"app/test: security_aesni_gcm_perftest/security_aesni_mb_perftest". The
users can freely compare their results against crypto perf application
results.

In the end, the ipsec library and ipsec-secgw sample application are also
updated to support this feature. Several test scripts are added to the
ipsec-secgw test-suite to prove the correctness of the implementation.

v2:
- changed API return from "void" to "int"
- rework on ipsec library implementation.
- fixed bugs in aesni-mb PMD.
- fixed bugs in ipsec-secgw application.

Fan Zhang (10):
  security: introduce CPU Crypto action type and API
  crypto/aesni_gcm: add rte_security handler
  app/test: add security cpu crypto autotest
  app/test: add security cpu crypto perftest
  crypto/aesni_mb: add rte_security handler
  app/test: add aesni_mb security cpu crypto autotest
  app/test: add aesni_mb security cpu crypto perftest
  ipsec: add rte_security cpu_crypto action support
  examples/ipsec-secgw: add security cpu_crypto action support
  doc: update security cpu process description

 app/test/Makefile                                  |    1 +
 app/test/meson.build                               |    1 +
 app/test/test_security_cpu_crypto.c                | 1326 ++++++++++++++++++++
 doc/guides/cryptodevs/aesni_gcm.rst                |    6 +
 doc/guides/cryptodevs/aesni_mb.rst                 |    7 +
 doc/guides/prog_guide/rte_security.rst             |  112 +-
 doc/guides/rel_notes/release_19_11.rst             |    7 +
 drivers/crypto/aesni_gcm/aesni_gcm_pmd.c           |   97 +-
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c       |   95 ++
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h   |   23 +
 drivers/crypto/aesni_gcm/meson.build               |    2 +-
 drivers/crypto/aesni_mb/meson.build                |    2 +-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c         |  368 +++++-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c     |   92 +-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h |   21 +-
 examples/ipsec-secgw/ipsec.c                       |   35 +
 examples/ipsec-secgw/ipsec_process.c               |    7 +-
 examples/ipsec-secgw/sa.c                          |   13 +-
 examples/ipsec-secgw/test/run_test.sh              |   10 +
 .../test/trs_3descbc_sha1_common_defs.sh           |    8 +-
 .../test/trs_3descbc_sha1_cpu_crypto_defs.sh       |    5 +
 .../test/trs_aescbc_sha1_common_defs.sh            |    8 +-
 .../test/trs_aescbc_sha1_cpu_crypto_defs.sh        |    5 +
 .../test/trs_aesctr_sha1_common_defs.sh            |    8 +-
 .../test/trs_aesctr_sha1_cpu_crypto_defs.sh        |    5 +
 .../ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh |    5 +
 .../test/trs_aesgcm_mb_cpu_crypto_defs.sh          |    7 +
 .../test/tun_3descbc_sha1_common_defs.sh           |    8 +-
 .../test/tun_3descbc_sha1_cpu_crypto_defs.sh       |    5 +
 .../test/tun_aescbc_sha1_common_defs.sh            |    8 +-
 .../test/tun_aescbc_sha1_cpu_crypto_defs.sh        |    5 +
 .../test/tun_aesctr_sha1_common_defs.sh            |    8 +-
 .../test/tun_aesctr_sha1_cpu_crypto_defs.sh        |    5 +
 .../ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh |    5 +
 .../test/tun_aesgcm_mb_cpu_crypto_defs.sh          |    7 +
 lib/librte_ipsec/crypto.h                          |   24 +
 lib/librte_ipsec/esp_inb.c                         |  200 ++-
 lib/librte_ipsec/esp_outb.c                        |  369 +++++-
 lib/librte_ipsec/sa.c                              |   53 +-
 lib/librte_ipsec/sa.h                              |   29 +
 lib/librte_ipsec/ses.c                             |    4 +-
 lib/librte_security/rte_security.c                 |   11 +
 lib/librte_security/rte_security.h                 |   53 +-
 lib/librte_security/rte_security_driver.h          |   22 +
 lib/librte_security/rte_security_version.map       |    1 +
 45 files changed, 2994 insertions(+), 99 deletions(-)
 create mode 100644 app/test/test_security_cpu_crypto.c
 create mode 100644 examples/ipsec-secgw/test/trs_3descbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aescbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aesctr_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aesgcm_mb_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_3descbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aescbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aesctr_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aesgcm_mb_cpu_crypto_defs.sh

-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH v2 01/10] security: introduce CPU Crypto action type and API
  2019-10-07 16:28   ` [dpdk-dev] [PATCH v2 " Fan Zhang
@ 2019-10-07 16:28     ` Fan Zhang
  2019-10-08 13:42       ` Ananyev, Konstantin
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 02/10] crypto/aesni_gcm: add rte_security handler Fan Zhang
                       ` (8 subsequent siblings)
  9 siblings, 1 reply; 87+ messages in thread
From: Fan Zhang @ 2019-10-07 16:28 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO action type to
security library. The type represents performing crypto operation with CPU
cycles. The patch also includes a new API to process crypto operations in
bulk and the function pointers for PMDs.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 lib/librte_security/rte_security.c           | 11 ++++++
 lib/librte_security/rte_security.h           | 53 +++++++++++++++++++++++++++-
 lib/librte_security/rte_security_driver.h    | 22 ++++++++++++
 lib/librte_security/rte_security_version.map |  1 +
 4 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/lib/librte_security/rte_security.c b/lib/librte_security/rte_security.c
index bc81ce15d..cdd1ee6af 100644
--- a/lib/librte_security/rte_security.c
+++ b/lib/librte_security/rte_security.c
@@ -141,3 +141,14 @@ rte_security_capability_get(struct rte_security_ctx *instance,
 
 	return NULL;
 }
+
+int
+rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
+		struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num)
+{
+	RTE_FUNC_PTR_OR_ERR_RET(*instance->ops->process_cpu_crypto_bulk, -1);
+	return instance->ops->process_cpu_crypto_bulk(sess, buf, iv,
+			aad, digest, status, num);
+}
diff --git a/lib/librte_security/rte_security.h b/lib/librte_security/rte_security.h
index aaafdfcd7..0caf5d697 100644
--- a/lib/librte_security/rte_security.h
+++ b/lib/librte_security/rte_security.h
@@ -18,6 +18,7 @@ extern "C" {
 #endif
 
 #include <sys/types.h>
+#include <sys/uio.h>
 
 #include <netinet/in.h>
 #include <netinet/ip.h>
@@ -289,6 +290,20 @@ struct rte_security_pdcp_xform {
 	uint32_t hfn_ovrd;
 };
 
+struct rte_security_cpu_crypto_xform {
+	/** For cipher/authentication crypto operation the authentication may
+	 * cover more content then the cipher. E.g., for IPSec ESP encryption
+	 * with AES-CBC and SHA1-HMAC, the encryption happens after the ESP
+	 * header but whole packet (apart from MAC header) is authenticated.
+	 * The cipher_offset field is used to deduct the cipher data pointer
+	 * from the buffer to be processed.
+	 *
+	 * NOTE this parameter shall be ignored by AEAD algorithms, since it
+	 * uses the same offset for cipher and authentication.
+	 */
+	int32_t cipher_offset;
+};
+
 /**
  * Security session action type.
  */
@@ -303,10 +318,14 @@ enum rte_security_session_action_type {
 	/**< All security protocol processing is performed inline during
 	 * transmission
 	 */
-	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
+	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL,
 	/**< All security protocol processing including crypto is performed
 	 * on a lookaside accelerator
 	 */
+	RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
+	/**< Crypto processing for security protocol is processed by CPU
+	 * synchronously
+	 */
 };
 
 /** Security session protocol definition */
@@ -332,6 +351,7 @@ struct rte_security_session_conf {
 		struct rte_security_ipsec_xform ipsec;
 		struct rte_security_macsec_xform macsec;
 		struct rte_security_pdcp_xform pdcp;
+		struct rte_security_cpu_crypto_xform cpucrypto;
 	};
 	/**< Configuration parameters for security session */
 	struct rte_crypto_sym_xform *crypto_xform;
@@ -665,6 +685,37 @@ const struct rte_security_capability *
 rte_security_capability_get(struct rte_security_ctx *instance,
 			    struct rte_security_capability_idx *idx);
 
+/**
+ * Security vector structure, contains pointer to vector array and the length
+ * of the array
+ */
+struct rte_security_vec {
+	struct iovec *vec;
+	uint32_t num;
+};
+
+/**
+ * Processing bulk crypto workload with CPU
+ *
+ * @param	instance	security instance.
+ * @param	sess		security session
+ * @param	buf		array of buffer SGL vectors
+ * @param	iv		array of IV pointers
+ * @param	aad		array of AAD pointers
+ * @param	digest		array of digest pointers
+ * @param	status		array of status for the function to return
+ * @param	num		number of elements in each array
+ * @return
+ *  - On success, 0
+ *  - On any failure, -1
+ */
+__rte_experimental
+int
+rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
+		struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_security/rte_security_driver.h b/lib/librte_security/rte_security_driver.h
index 1b561f852..fe940fffa 100644
--- a/lib/librte_security/rte_security_driver.h
+++ b/lib/librte_security/rte_security_driver.h
@@ -132,6 +132,26 @@ typedef int (*security_get_userdata_t)(void *device,
 typedef const struct rte_security_capability *(*security_capabilities_get_t)(
 		void *device);
 
+/**
+ * Process security operations in bulk using CPU accelerated method.
+ *
+ * @param	sess		Security session structure.
+ * @param	buf		Buffer to the vectors to be processed.
+ * @param	iv		IV pointers.
+ * @param	aad		AAD pointers.
+ * @param	digest		Digest pointers.
+ * @param	status		Array of status value.
+ * @param	num		Number of elements in each array.
+ * @return
+ *  - On success, 0
+ *  - On any failure, -1
+ */
+
+typedef int (*security_process_cpu_crypto_bulk_t)(
+		struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num);
+
 /** Security operations function pointer table */
 struct rte_security_ops {
 	security_session_create_t session_create;
@@ -150,6 +170,8 @@ struct rte_security_ops {
 	/**< Get userdata associated with session which processed the packet. */
 	security_capabilities_get_t capabilities_get;
 	/**< Get security capabilities. */
+	security_process_cpu_crypto_bulk_t process_cpu_crypto_bulk;
+	/**< Process data in bulk. */
 };
 
 #ifdef __cplusplus
diff --git a/lib/librte_security/rte_security_version.map b/lib/librte_security/rte_security_version.map
index 53267bf3c..2132e7a00 100644
--- a/lib/librte_security/rte_security_version.map
+++ b/lib/librte_security/rte_security_version.map
@@ -18,4 +18,5 @@ EXPERIMENTAL {
 	rte_security_get_userdata;
 	rte_security_session_stats_get;
 	rte_security_session_update;
+	rte_security_process_cpu_crypto_bulk;
 };
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH v2 02/10] crypto/aesni_gcm: add rte_security handler
  2019-10-07 16:28   ` [dpdk-dev] [PATCH v2 " Fan Zhang
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 01/10] security: introduce CPU Crypto action type and API Fan Zhang
@ 2019-10-07 16:28     ` Fan Zhang
  2019-10-08 13:44       ` Ananyev, Konstantin
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 03/10] app/test: add security cpu crypto autotest Fan Zhang
                       ` (7 subsequent siblings)
  9 siblings, 1 reply; 87+ messages in thread
From: Fan Zhang @ 2019-10-07 16:28 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

This patch add rte_security support support to AESNI-GCM PMD. The PMD now
initialize security context instance, create/delete PMD specific security
sessions, and process crypto workloads in synchronous mode with
scatter-gather list buffer supported.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 drivers/crypto/aesni_gcm/aesni_gcm_pmd.c         | 97 +++++++++++++++++++++++-
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c     | 95 +++++++++++++++++++++++
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h | 23 ++++++
 drivers/crypto/aesni_gcm/meson.build             |  2 +-
 4 files changed, 215 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
index 1006a5c4d..2e91bf149 100644
--- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
+++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
@@ -6,6 +6,7 @@
 #include <rte_hexdump.h>
 #include <rte_cryptodev.h>
 #include <rte_cryptodev_pmd.h>
+#include <rte_security_driver.h>
 #include <rte_bus_vdev.h>
 #include <rte_malloc.h>
 #include <rte_cpuflags.h>
@@ -174,6 +175,56 @@ aesni_gcm_get_session(struct aesni_gcm_qp *qp, struct rte_crypto_op *op)
 	return sess;
 }
 
+static __rte_always_inline int
+process_gcm_security_sgl_buf(struct aesni_gcm_security_session *sess,
+		struct rte_security_vec *buf, uint8_t *iv,
+		uint8_t *aad, uint8_t *digest)
+{
+	struct aesni_gcm_session *session = &sess->sess;
+	uint8_t *tag;
+	uint32_t i;
+
+	sess->init(&session->gdata_key, &sess->gdata_ctx, iv, aad,
+			(uint64_t)session->aad_length);
+
+	for (i = 0; i < buf->num; i++) {
+		struct iovec *vec = &buf->vec[i];
+
+		sess->update(&session->gdata_key, &sess->gdata_ctx,
+				vec->iov_base, vec->iov_base, vec->iov_len);
+	}
+
+	switch (session->op) {
+	case AESNI_GCM_OP_AUTHENTICATED_ENCRYPTION:
+		if (session->req_digest_length != session->gen_digest_length)
+			tag = sess->temp_digest;
+		else
+			tag = digest;
+
+		sess->finalize(&session->gdata_key, &sess->gdata_ctx, tag,
+				session->gen_digest_length);
+
+		if (session->req_digest_length != session->gen_digest_length)
+			memcpy(digest, sess->temp_digest,
+					session->req_digest_length);
+		break;
+
+	case AESNI_GCM_OP_AUTHENTICATED_DECRYPTION:
+		tag = sess->temp_digest;
+
+		sess->finalize(&session->gdata_key, &sess->gdata_ctx, tag,
+				session->gen_digest_length);
+
+		if (memcmp(tag, digest,	session->req_digest_length) != 0)
+			return -1;
+		break;
+	default:
+		return -1;
+	}
+
+	return 0;
+}
+
 /**
  * Process a crypto operation, calling
  * the GCM API from the multi buffer library.
@@ -488,8 +539,10 @@ aesni_gcm_create(const char *name,
 {
 	struct rte_cryptodev *dev;
 	struct aesni_gcm_private *internals;
+	struct rte_security_ctx *sec_ctx;
 	enum aesni_gcm_vector_mode vector_mode;
 	MB_MGR *mb_mgr;
+	char sec_name[RTE_DEV_NAME_MAX_LEN];
 
 	/* Check CPU for support for AES instruction set */
 	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_AES)) {
@@ -524,7 +577,8 @@ aesni_gcm_create(const char *name,
 			RTE_CRYPTODEV_FF_SYM_OPERATION_CHAINING |
 			RTE_CRYPTODEV_FF_CPU_AESNI |
 			RTE_CRYPTODEV_FF_OOP_SGL_IN_LB_OUT |
-			RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT;
+			RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT |
+			RTE_CRYPTODEV_FF_SECURITY;
 
 	mb_mgr = alloc_mb_mgr(0);
 	if (mb_mgr == NULL)
@@ -587,6 +641,21 @@ aesni_gcm_create(const char *name,
 
 	internals->max_nb_queue_pairs = init_params->max_nb_queue_pairs;
 
+	/* setup security operations */
+	snprintf(sec_name, sizeof(sec_name) - 1, "aes_gcm_sec_%u",
+			dev->driver_id);
+	sec_ctx = rte_zmalloc_socket(sec_name,
+			sizeof(struct rte_security_ctx),
+			RTE_CACHE_LINE_SIZE, init_params->socket_id);
+	if (sec_ctx == NULL) {
+		AESNI_GCM_LOG(ERR, "memory allocation failed\n");
+		goto error_exit;
+	}
+
+	sec_ctx->device = (void *)dev;
+	sec_ctx->ops = rte_aesni_gcm_pmd_security_ops;
+	dev->security_ctx = sec_ctx;
+
 #if IMB_VERSION_NUM >= IMB_VERSION(0, 50, 0)
 	AESNI_GCM_LOG(INFO, "IPSec Multi-buffer library version used: %s\n",
 			imb_get_version_str());
@@ -641,6 +710,8 @@ aesni_gcm_remove(struct rte_vdev_device *vdev)
 	if (cryptodev == NULL)
 		return -ENODEV;
 
+	rte_free(cryptodev->security_ctx);
+
 	internals = cryptodev->data->dev_private;
 
 	free_mb_mgr(internals->mb_mgr);
@@ -648,6 +719,30 @@ aesni_gcm_remove(struct rte_vdev_device *vdev)
 	return rte_cryptodev_pmd_destroy(cryptodev);
 }
 
+int
+aesni_gcm_sec_crypto_process_bulk(struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num)
+{
+	struct aesni_gcm_security_session *session =
+			get_sec_session_private_data(sess);
+	uint32_t i;
+	int errcnt = 0;
+
+	if (unlikely(!session))
+		return -num;
+
+	for (i = 0; i < num; i++) {
+		status[i] = process_gcm_security_sgl_buf(session, &buf[i],
+				(uint8_t *)iv[i], (uint8_t *)aad[i],
+				(uint8_t *)digest[i]);
+		if (unlikely(status[i]))
+			errcnt -= 1;
+	}
+
+	return errcnt;
+}
+
 static struct rte_vdev_driver aesni_gcm_pmd_drv = {
 	.probe = aesni_gcm_probe,
 	.remove = aesni_gcm_remove
diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
index 2f66c7c58..cc71dbd60 100644
--- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
+++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
@@ -7,6 +7,7 @@
 #include <rte_common.h>
 #include <rte_malloc.h>
 #include <rte_cryptodev_pmd.h>
+#include <rte_security_driver.h>
 
 #include "aesni_gcm_pmd_private.h"
 
@@ -316,6 +317,85 @@ aesni_gcm_pmd_sym_session_clear(struct rte_cryptodev *dev,
 	}
 }
 
+static int
+aesni_gcm_security_session_create(void *dev,
+		struct rte_security_session_conf *conf,
+		struct rte_security_session *sess,
+		struct rte_mempool *mempool)
+{
+	struct rte_cryptodev *cdev = dev;
+	struct aesni_gcm_private *internals = cdev->data->dev_private;
+	struct aesni_gcm_security_session *sess_priv;
+	int ret;
+
+	if (!conf->crypto_xform) {
+		AESNI_GCM_LOG(ERR, "Invalid security session conf");
+		return -EINVAL;
+	}
+
+	if (conf->crypto_xform->type == RTE_CRYPTO_SYM_XFORM_AUTH) {
+		AESNI_GCM_LOG(ERR, "GMAC is not supported in security session");
+		return -EINVAL;
+	}
+
+
+	if (rte_mempool_get(mempool, (void **)(&sess_priv))) {
+		AESNI_GCM_LOG(ERR,
+				"Couldn't get object from session mempool");
+		return -ENOMEM;
+	}
+
+	ret = aesni_gcm_set_session_parameters(internals->ops,
+				&sess_priv->sess, conf->crypto_xform);
+	if (ret != 0) {
+		AESNI_GCM_LOG(ERR, "Failed configure session parameters");
+
+		/* Return session to mempool */
+		rte_mempool_put(mempool, (void *)sess_priv);
+		return ret;
+	}
+
+	sess_priv->pre = internals->ops[sess_priv->sess.key].pre;
+	sess_priv->init = internals->ops[sess_priv->sess.key].init;
+	if (sess_priv->sess.op == AESNI_GCM_OP_AUTHENTICATED_ENCRYPTION) {
+		sess_priv->update =
+			internals->ops[sess_priv->sess.key].update_enc;
+		sess_priv->finalize =
+			internals->ops[sess_priv->sess.key].finalize_enc;
+	} else {
+		sess_priv->update =
+			internals->ops[sess_priv->sess.key].update_dec;
+		sess_priv->finalize =
+			internals->ops[sess_priv->sess.key].finalize_dec;
+	}
+
+	sess->sess_private_data = sess_priv;
+
+	return 0;
+}
+
+static int
+aesni_gcm_security_session_destroy(void *dev __rte_unused,
+		struct rte_security_session *sess)
+{
+	void *sess_priv = get_sec_session_private_data(sess);
+
+	if (sess_priv) {
+		struct rte_mempool *sess_mp = rte_mempool_from_obj(sess_priv);
+
+		memset(sess, 0, sizeof(struct aesni_gcm_security_session));
+		set_sec_session_private_data(sess, NULL);
+		rte_mempool_put(sess_mp, sess_priv);
+	}
+	return 0;
+}
+
+static unsigned int
+aesni_gcm_sec_session_get_size(__rte_unused void *device)
+{
+	return sizeof(struct aesni_gcm_security_session);
+}
+
 struct rte_cryptodev_ops aesni_gcm_pmd_ops = {
 		.dev_configure		= aesni_gcm_pmd_config,
 		.dev_start		= aesni_gcm_pmd_start,
@@ -336,4 +416,19 @@ struct rte_cryptodev_ops aesni_gcm_pmd_ops = {
 		.sym_session_clear	= aesni_gcm_pmd_sym_session_clear
 };
 
+static struct rte_security_ops aesni_gcm_security_ops = {
+		.session_create = aesni_gcm_security_session_create,
+		.session_get_size = aesni_gcm_sec_session_get_size,
+		.session_update = NULL,
+		.session_stats_get = NULL,
+		.session_destroy = aesni_gcm_security_session_destroy,
+		.set_pkt_metadata = NULL,
+		.capabilities_get = NULL,
+		.process_cpu_crypto_bulk =
+				aesni_gcm_sec_crypto_process_bulk,
+};
+
 struct rte_cryptodev_ops *rte_aesni_gcm_pmd_ops = &aesni_gcm_pmd_ops;
+
+struct rte_security_ops *rte_aesni_gcm_pmd_security_ops =
+		&aesni_gcm_security_ops;
diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
index 56b29e013..ed3f6eb2e 100644
--- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
+++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
@@ -114,5 +114,28 @@ aesni_gcm_set_session_parameters(const struct aesni_gcm_ops *ops,
  * Device specific operations function pointer structure */
 extern struct rte_cryptodev_ops *rte_aesni_gcm_pmd_ops;
 
+/**
+ * Security session structure.
+ */
+struct aesni_gcm_security_session {
+	/** Temp digest for decryption */
+	uint8_t temp_digest[DIGEST_LENGTH_MAX];
+	/** GCM operations */
+	aesni_gcm_pre_t pre;
+	aesni_gcm_init_t init;
+	aesni_gcm_update_t update;
+	aesni_gcm_finalize_t finalize;
+	/** AESNI-GCM session */
+	struct aesni_gcm_session sess;
+	/** AESNI-GCM context */
+	struct gcm_context_data gdata_ctx;
+};
+
+extern int
+aesni_gcm_sec_crypto_process_bulk(struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num);
+
+extern struct rte_security_ops *rte_aesni_gcm_pmd_security_ops;
 
 #endif /* _RTE_AESNI_GCM_PMD_PRIVATE_H_ */
diff --git a/drivers/crypto/aesni_gcm/meson.build b/drivers/crypto/aesni_gcm/meson.build
index 3a6e332dc..f6e160bb3 100644
--- a/drivers/crypto/aesni_gcm/meson.build
+++ b/drivers/crypto/aesni_gcm/meson.build
@@ -22,4 +22,4 @@ endif
 
 allow_experimental_apis = true
 sources = files('aesni_gcm_pmd.c', 'aesni_gcm_pmd_ops.c')
-deps += ['bus_vdev']
+deps += ['bus_vdev', 'security']
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH v2 03/10] app/test: add security cpu crypto autotest
  2019-10-07 16:28   ` [dpdk-dev] [PATCH v2 " Fan Zhang
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 01/10] security: introduce CPU Crypto action type and API Fan Zhang
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 02/10] crypto/aesni_gcm: add rte_security handler Fan Zhang
@ 2019-10-07 16:28     ` Fan Zhang
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 04/10] app/test: add security cpu crypto perftest Fan Zhang
                       ` (6 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Fan Zhang @ 2019-10-07 16:28 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

This patch adds cpu crypto unit test for AESNI_GCM PMD.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 app/test/Makefile                   |   1 +
 app/test/meson.build                |   1 +
 app/test/test_security_cpu_crypto.c | 564 ++++++++++++++++++++++++++++++++++++
 3 files changed, 566 insertions(+)
 create mode 100644 app/test/test_security_cpu_crypto.c

diff --git a/app/test/Makefile b/app/test/Makefile
index df7f77f44..0caff561c 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -197,6 +197,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev_blockcipher.c
 SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev.c
 SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev_asym.c
 SRCS-$(CONFIG_RTE_LIBRTE_SECURITY) += test_cryptodev_security_pdcp.c
+SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_security_cpu_crypto.c
 
 SRCS-$(CONFIG_RTE_LIBRTE_METRICS) += test_metrics.c
 
diff --git a/app/test/meson.build b/app/test/meson.build
index 2c23c6347..0d096c564 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -104,6 +104,7 @@ test_sources = files('commands.c',
 	'test_ring_perf.c',
 	'test_rwlock.c',
 	'test_sched.c',
+	'test_security_cpu_crypto.c',
 	'test_service_cores.c',
 	'test_spinlock.c',
 	'test_stack.c',
diff --git a/app/test/test_security_cpu_crypto.c b/app/test/test_security_cpu_crypto.c
new file mode 100644
index 000000000..d345922b2
--- /dev/null
+++ b/app/test/test_security_cpu_crypto.c
@@ -0,0 +1,564 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation
+ */
+
+#include <rte_common.h>
+#include <rte_hexdump.h>
+#include <rte_mbuf.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_pause.h>
+#include <rte_bus_vdev.h>
+#include <rte_random.h>
+
+#include <rte_security.h>
+
+#include <rte_crypto.h>
+#include <rte_cryptodev.h>
+#include <rte_cryptodev_pmd.h>
+
+#include "test.h"
+#include "test_cryptodev.h"
+#include "test_cryptodev_aead_test_vectors.h"
+
+#define CPU_CRYPTO_TEST_MAX_AAD_LENGTH	16
+#define MAX_NB_SIGMENTS			4
+
+enum buffer_assemble_option {
+	SGL_MAX_SEG,
+	SGL_ONE_SEG,
+};
+
+struct cpu_crypto_test_case {
+	struct {
+		uint8_t seg[MBUF_DATAPAYLOAD_SIZE];
+		uint32_t seg_len;
+	} seg_buf[MAX_NB_SIGMENTS];
+	uint8_t iv[MAXIMUM_IV_LENGTH];
+	uint8_t aad[CPU_CRYPTO_TEST_MAX_AAD_LENGTH];
+	uint8_t digest[DIGEST_BYTE_LENGTH_SHA512];
+} __rte_cache_aligned;
+
+struct cpu_crypto_test_obj {
+	struct iovec vec[MAX_NUM_OPS_INFLIGHT][MAX_NB_SIGMENTS];
+	struct rte_security_vec sec_buf[MAX_NUM_OPS_INFLIGHT];
+	void *iv[MAX_NUM_OPS_INFLIGHT];
+	void *digest[MAX_NUM_OPS_INFLIGHT];
+	void *aad[MAX_NUM_OPS_INFLIGHT];
+	int status[MAX_NUM_OPS_INFLIGHT];
+};
+
+struct cpu_crypto_testsuite_params {
+	struct rte_mempool *buf_pool;
+	struct rte_mempool *session_priv_mpool;
+	struct rte_security_ctx *ctx;
+};
+
+struct cpu_crypto_unittest_params {
+	struct rte_security_session *sess;
+	void *test_datas[MAX_NUM_OPS_INFLIGHT];
+	struct cpu_crypto_test_obj test_obj;
+	uint32_t nb_bufs;
+};
+
+static struct cpu_crypto_testsuite_params testsuite_params = { NULL };
+static struct cpu_crypto_unittest_params unittest_params;
+
+static int gbl_driver_id;
+
+static int
+testsuite_setup(void)
+{
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct rte_cryptodev_info info;
+	uint32_t i;
+	uint32_t nb_devs;
+	uint32_t sess_sz;
+	int ret;
+
+	memset(ts_params, 0, sizeof(*ts_params));
+
+	ts_params->buf_pool = rte_mempool_lookup("CPU_CRYPTO_MBUFPOOL");
+	if (ts_params->buf_pool == NULL) {
+		/* Not already created so create */
+		ts_params->buf_pool = rte_pktmbuf_pool_create(
+				"CRYPTO_MBUFPOOL",
+				NUM_MBUFS, MBUF_CACHE_SIZE, 0,
+				sizeof(struct cpu_crypto_test_case),
+				rte_socket_id());
+		if (ts_params->buf_pool == NULL) {
+			RTE_LOG(ERR, USER1, "Can't create CRYPTO_MBUFPOOL\n");
+			return TEST_FAILED;
+		}
+	}
+
+	/* Create an AESNI MB device if required */
+	if (gbl_driver_id == rte_cryptodev_driver_id_get(
+			RTE_STR(CRYPTODEV_NAME_AESNI_MB_PMD))) {
+		nb_devs = rte_cryptodev_device_count_by_driver(
+				rte_cryptodev_driver_id_get(
+				RTE_STR(CRYPTODEV_NAME_AESNI_MB_PMD)));
+		if (nb_devs < 1) {
+			ret = rte_vdev_init(
+				RTE_STR(CRYPTODEV_NAME_AESNI_MB_PMD), NULL);
+
+			TEST_ASSERT(ret == 0,
+				"Failed to create instance of"
+				" pmd : %s",
+				RTE_STR(CRYPTODEV_NAME_AESNI_MB_PMD));
+		}
+	}
+
+	/* Create an AESNI GCM device if required */
+	if (gbl_driver_id == rte_cryptodev_driver_id_get(
+			RTE_STR(CRYPTODEV_NAME_AESNI_GCM_PMD))) {
+		nb_devs = rte_cryptodev_device_count_by_driver(
+				rte_cryptodev_driver_id_get(
+				RTE_STR(CRYPTODEV_NAME_AESNI_GCM_PMD)));
+		if (nb_devs < 1) {
+			TEST_ASSERT_SUCCESS(rte_vdev_init(
+				RTE_STR(CRYPTODEV_NAME_AESNI_GCM_PMD), NULL),
+				"Failed to create instance of"
+				" pmd : %s",
+				RTE_STR(CRYPTODEV_NAME_AESNI_GCM_PMD));
+		}
+	}
+
+	nb_devs = rte_cryptodev_count();
+	if (nb_devs < 1) {
+		RTE_LOG(ERR, USER1, "No crypto devices found?\n");
+		return TEST_FAILED;
+	}
+
+	/* Get security context */
+	for (i = 0; i < nb_devs; i++) {
+		rte_cryptodev_info_get(i, &info);
+		if (info.driver_id != gbl_driver_id)
+			continue;
+
+		ts_params->ctx = rte_cryptodev_get_sec_ctx(i);
+		if (!ts_params->ctx) {
+			RTE_LOG(ERR, USER1, "Rte_security is not supported\n");
+			return TEST_FAILED;
+		}
+	}
+
+	sess_sz = rte_security_session_get_size(ts_params->ctx);
+	ts_params->session_priv_mpool = rte_mempool_create(
+			"cpu_crypto_test_sess_mp", 2, sess_sz, 0, 0,
+			NULL, NULL, NULL, NULL,
+			SOCKET_ID_ANY, 0);
+	if (!ts_params->session_priv_mpool) {
+		RTE_LOG(ERR, USER1, "Not enough memory\n");
+		return TEST_FAILED;
+	}
+
+	return TEST_SUCCESS;
+}
+
+static void
+testsuite_teardown(void)
+{
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+
+	if (ts_params->buf_pool)
+		rte_mempool_free(ts_params->buf_pool);
+
+	if (ts_params->session_priv_mpool)
+		rte_mempool_free(ts_params->session_priv_mpool);
+}
+
+static int
+ut_setup(void)
+{
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+
+	memset(ut_params, 0, sizeof(*ut_params));
+	return TEST_SUCCESS;
+}
+
+static void
+ut_teardown(void)
+{
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+
+	if (ut_params->sess)
+		rte_security_session_destroy(ts_params->ctx, ut_params->sess);
+
+	if (ut_params->nb_bufs) {
+		uint32_t i;
+
+		for (i = 0; i < ut_params->nb_bufs; i++)
+			memset(ut_params->test_datas[i], 0,
+				sizeof(struct cpu_crypto_test_case));
+
+		rte_mempool_put_bulk(ts_params->buf_pool, ut_params->test_datas,
+				ut_params->nb_bufs);
+	}
+}
+
+static int
+allocate_buf(uint32_t n)
+{
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+	int ret;
+
+	ret = rte_mempool_get_bulk(ts_params->buf_pool, ut_params->test_datas,
+			n);
+
+	if (ret == 0)
+		ut_params->nb_bufs = n;
+
+	return ret;
+}
+
+static int
+check_status(struct cpu_crypto_test_obj *obj, uint32_t n)
+{
+	uint32_t i;
+
+	for (i = 0; i < n; i++)
+		if (obj->status[i] < 0)
+			return -1;
+
+	return 0;
+}
+
+static struct rte_security_session *
+create_aead_session(struct rte_security_ctx *ctx,
+		struct rte_mempool *sess_mp,
+		enum rte_crypto_aead_operation op,
+		const struct aead_test_data *test_data,
+		uint32_t is_unit_test)
+{
+	struct rte_security_session_conf sess_conf = {0};
+	struct rte_crypto_sym_xform xform = {0};
+
+	if (is_unit_test)
+		debug_hexdump(stdout, "key:", test_data->key.data,
+				test_data->key.len);
+
+	/* Setup AEAD Parameters */
+	xform.type = RTE_CRYPTO_SYM_XFORM_AEAD;
+	xform.next = NULL;
+	xform.aead.algo = test_data->algo;
+	xform.aead.op = op;
+	xform.aead.key.data = test_data->key.data;
+	xform.aead.key.length = test_data->key.len;
+	xform.aead.iv.offset = 0;
+	xform.aead.iv.length = test_data->iv.len;
+	xform.aead.digest_length = test_data->auth_tag.len;
+	xform.aead.aad_length = test_data->aad.len;
+
+	sess_conf.action_type = RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO;
+	sess_conf.crypto_xform = &xform;
+
+	return rte_security_session_create(ctx, &sess_conf, sess_mp);
+}
+
+static inline int
+assemble_aead_buf(struct cpu_crypto_test_case *data,
+		struct cpu_crypto_test_obj *obj,
+		uint32_t obj_idx,
+		enum rte_crypto_aead_operation op,
+		const struct aead_test_data *test_data,
+		enum buffer_assemble_option sgl_option,
+		uint32_t is_unit_test)
+{
+	const uint8_t *src;
+	uint32_t src_len;
+	uint32_t seg_idx;
+	uint32_t bytes_per_seg;
+	uint32_t left;
+
+	if (op == RTE_CRYPTO_AEAD_OP_ENCRYPT) {
+		src = test_data->plaintext.data;
+		src_len = test_data->plaintext.len;
+		if (is_unit_test)
+			debug_hexdump(stdout, "plaintext:", src, src_len);
+	} else {
+		src = test_data->ciphertext.data;
+		src_len = test_data->ciphertext.len;
+		memcpy(data->digest, test_data->auth_tag.data,
+				test_data->auth_tag.len);
+		if (is_unit_test) {
+			debug_hexdump(stdout, "ciphertext:", src, src_len);
+			debug_hexdump(stdout, "digest:",
+					test_data->auth_tag.data,
+					test_data->auth_tag.len);
+		}
+	}
+
+	if (src_len > MBUF_DATAPAYLOAD_SIZE)
+		return -ENOMEM;
+
+	switch (sgl_option) {
+	case SGL_MAX_SEG:
+		seg_idx = 0;
+		bytes_per_seg = src_len / MAX_NB_SIGMENTS + 1;
+		left = src_len;
+
+		if (bytes_per_seg > (MBUF_DATAPAYLOAD_SIZE / MAX_NB_SIGMENTS))
+			return -ENOMEM;
+
+		while (left) {
+			uint32_t cp_len = RTE_MIN(left, bytes_per_seg);
+			memcpy(data->seg_buf[seg_idx].seg, src, cp_len);
+			data->seg_buf[seg_idx].seg_len = cp_len;
+			obj->vec[obj_idx][seg_idx].iov_base =
+					(void *)data->seg_buf[seg_idx].seg;
+			obj->vec[obj_idx][seg_idx].iov_len = cp_len;
+			src += cp_len;
+			left -= cp_len;
+			seg_idx++;
+		}
+
+		if (left)
+			return -ENOMEM;
+
+		obj->sec_buf[obj_idx].vec = obj->vec[obj_idx];
+		obj->sec_buf[obj_idx].num = seg_idx;
+
+		break;
+	case SGL_ONE_SEG:
+		memcpy(data->seg_buf[0].seg, src, src_len);
+		data->seg_buf[0].seg_len = src_len;
+		obj->vec[obj_idx][0].iov_base =
+				(void *)data->seg_buf[0].seg;
+		obj->vec[obj_idx][0].iov_len = src_len;
+
+		obj->sec_buf[obj_idx].vec = obj->vec[obj_idx];
+		obj->sec_buf[obj_idx].num = 1;
+		break;
+	default:
+		return -1;
+	}
+
+	if (test_data->algo == RTE_CRYPTO_AEAD_AES_CCM) {
+		memcpy(data->iv + 1, test_data->iv.data, test_data->iv.len);
+		memcpy(data->aad + 18, test_data->aad.data, test_data->aad.len);
+	} else {
+		memcpy(data->iv, test_data->iv.data, test_data->iv.len);
+		memcpy(data->aad, test_data->aad.data, test_data->aad.len);
+	}
+
+	if (is_unit_test) {
+		debug_hexdump(stdout, "iv:", test_data->iv.data,
+				test_data->iv.len);
+		debug_hexdump(stdout, "aad:", test_data->aad.data,
+				test_data->aad.len);
+	}
+
+	obj->iv[obj_idx] = (void *)data->iv;
+	obj->digest[obj_idx] = (void *)data->digest;
+	obj->aad[obj_idx] = (void *)data->aad;
+
+	return 0;
+}
+
+#define CPU_CRYPTO_ERR_EXP_CT	"expect ciphertext:"
+#define CPU_CRYPTO_ERR_GEN_CT	"gen ciphertext:"
+#define CPU_CRYPTO_ERR_EXP_PT	"expect plaintext:"
+#define CPU_CRYPTO_ERR_GEN_PT	"gen plaintext:"
+
+static int
+check_aead_result(struct cpu_crypto_test_case *tcase,
+		enum rte_crypto_aead_operation op,
+		const struct aead_test_data *tdata)
+{
+	const char *err_msg1, *err_msg2;
+	const uint8_t *src_pt_ct;
+	const uint8_t *tmp_src;
+	uint32_t src_len;
+	uint32_t left;
+	uint32_t i = 0;
+	int ret;
+
+	if (op == RTE_CRYPTO_AEAD_OP_ENCRYPT) {
+		err_msg1 = CPU_CRYPTO_ERR_EXP_CT;
+		err_msg2 = CPU_CRYPTO_ERR_GEN_CT;
+
+		src_pt_ct = tdata->ciphertext.data;
+		src_len = tdata->ciphertext.len;
+
+		ret = memcmp(tcase->digest, tdata->auth_tag.data,
+				tdata->auth_tag.len);
+		if (ret != 0) {
+			debug_hexdump(stdout, "expect digest:",
+					tdata->auth_tag.data,
+					tdata->auth_tag.len);
+			debug_hexdump(stdout, "gen digest:",
+					tcase->digest,
+					tdata->auth_tag.len);
+			return -1;
+		}
+	} else {
+		src_pt_ct = tdata->plaintext.data;
+		src_len = tdata->plaintext.len;
+		err_msg1 = CPU_CRYPTO_ERR_EXP_PT;
+		err_msg2 = CPU_CRYPTO_ERR_GEN_PT;
+	}
+
+	tmp_src = src_pt_ct;
+	left = src_len;
+
+	while (left && i < MAX_NB_SIGMENTS) {
+		ret = memcmp(tcase->seg_buf[i].seg, tmp_src,
+				tcase->seg_buf[i].seg_len);
+		if (ret != 0)
+			goto sgl_err_dump;
+		tmp_src += tcase->seg_buf[i].seg_len;
+		left -= tcase->seg_buf[i].seg_len;
+		i++;
+	}
+
+	if (left) {
+		ret = -ENOMEM;
+		goto sgl_err_dump;
+	}
+
+	return 0;
+
+sgl_err_dump:
+	left = src_len;
+	i = 0;
+
+	debug_hexdump(stdout, err_msg1,
+			tdata->ciphertext.data,
+			tdata->ciphertext.len);
+
+	while (left && i < MAX_NB_SIGMENTS) {
+		debug_hexdump(stdout, err_msg2,
+				tcase->seg_buf[i].seg,
+				tcase->seg_buf[i].seg_len);
+		left -= tcase->seg_buf[i].seg_len;
+		i++;
+	}
+	return ret;
+}
+
+static inline void
+run_test(struct rte_security_ctx *ctx, struct rte_security_session *sess,
+		struct cpu_crypto_test_obj *obj, uint32_t n)
+{
+	rte_security_process_cpu_crypto_bulk(ctx, sess, obj->sec_buf,
+			obj->iv, obj->aad, obj->digest, obj->status, n);
+}
+
+static int
+cpu_crypto_test_aead(const struct aead_test_data *tdata,
+		enum rte_crypto_aead_operation dir,
+		enum buffer_assemble_option sgl_option)
+{
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+	struct cpu_crypto_test_obj *obj = &ut_params->test_obj;
+	struct cpu_crypto_test_case *tcase;
+	int ret;
+
+	ut_params->sess = create_aead_session(ts_params->ctx,
+			ts_params->session_priv_mpool,
+			dir,
+			tdata,
+			1);
+	if (!ut_params->sess)
+		return -1;
+
+	ret = allocate_buf(1);
+	if (ret)
+		return ret;
+
+	tcase = ut_params->test_datas[0];
+	ret = assemble_aead_buf(tcase, obj, 0, dir, tdata, sgl_option, 1);
+	if (ret < 0) {
+		printf("Test is not supported by the driver\n");
+		return ret;
+	}
+
+	run_test(ts_params->ctx, ut_params->sess, obj, 1);
+
+	ret = check_status(obj, 1);
+	if (ret < 0)
+		return ret;
+
+	ret = check_aead_result(tcase, dir, tdata);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+/* test-vector/sgl-option */
+#define all_gcm_unit_test_cases(type)		\
+	TEST_EXPAND(gcm_test_case_1, type)	\
+	TEST_EXPAND(gcm_test_case_2, type)	\
+	TEST_EXPAND(gcm_test_case_3, type)	\
+	TEST_EXPAND(gcm_test_case_4, type)	\
+	TEST_EXPAND(gcm_test_case_5, type)	\
+	TEST_EXPAND(gcm_test_case_6, type)	\
+	TEST_EXPAND(gcm_test_case_7, type)	\
+	TEST_EXPAND(gcm_test_case_8, type)	\
+	TEST_EXPAND(gcm_test_case_192_1, type)	\
+	TEST_EXPAND(gcm_test_case_192_2, type)	\
+	TEST_EXPAND(gcm_test_case_192_3, type)	\
+	TEST_EXPAND(gcm_test_case_192_4, type)	\
+	TEST_EXPAND(gcm_test_case_192_5, type)	\
+	TEST_EXPAND(gcm_test_case_192_6, type)	\
+	TEST_EXPAND(gcm_test_case_192_7, type)	\
+	TEST_EXPAND(gcm_test_case_256_1, type)	\
+	TEST_EXPAND(gcm_test_case_256_2, type)	\
+	TEST_EXPAND(gcm_test_case_256_3, type)	\
+	TEST_EXPAND(gcm_test_case_256_4, type)	\
+	TEST_EXPAND(gcm_test_case_256_5, type)	\
+	TEST_EXPAND(gcm_test_case_256_6, type)	\
+	TEST_EXPAND(gcm_test_case_256_7, type)
+
+
+#define TEST_EXPAND(t, o)						\
+static int								\
+cpu_crypto_aead_enc_test_##t##_##o(void)				\
+{									\
+	return cpu_crypto_test_aead(&t, RTE_CRYPTO_AEAD_OP_ENCRYPT, o);	\
+}									\
+static int								\
+cpu_crypto_aead_dec_test_##t##_##o(void)				\
+{									\
+	return cpu_crypto_test_aead(&t, RTE_CRYPTO_AEAD_OP_DECRYPT, o);	\
+}									\
+
+all_gcm_unit_test_cases(SGL_ONE_SEG)
+all_gcm_unit_test_cases(SGL_MAX_SEG)
+#undef TEST_EXPAND
+
+static struct unit_test_suite security_cpu_crypto_aesgcm_testsuite  = {
+	.suite_name = "Security CPU Crypto AESNI-GCM Unit Test Suite",
+	.setup = testsuite_setup,
+	.teardown = testsuite_teardown,
+	.unit_test_cases = {
+#define TEST_EXPAND(t, o)						\
+	TEST_CASE_ST(ut_setup, ut_teardown,				\
+			cpu_crypto_aead_enc_test_##t##_##o),		\
+	TEST_CASE_ST(ut_setup, ut_teardown,				\
+			cpu_crypto_aead_dec_test_##t##_##o),		\
+
+	all_gcm_unit_test_cases(SGL_ONE_SEG)
+	all_gcm_unit_test_cases(SGL_MAX_SEG)
+#undef TEST_EXPAND
+
+	TEST_CASES_END() /**< NULL terminate unit test array */
+	},
+};
+
+static int
+test_security_cpu_crypto_aesni_gcm(void)
+{
+	gbl_driver_id =	rte_cryptodev_driver_id_get(
+			RTE_STR(CRYPTODEV_NAME_AESNI_GCM_PMD));
+
+	return unit_test_suite_runner(&security_cpu_crypto_aesgcm_testsuite);
+}
+
+REGISTER_TEST_COMMAND(security_aesni_gcm_autotest,
+		test_security_cpu_crypto_aesni_gcm);
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH v2 04/10] app/test: add security cpu crypto perftest
  2019-10-07 16:28   ` [dpdk-dev] [PATCH v2 " Fan Zhang
                       ` (2 preceding siblings ...)
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 03/10] app/test: add security cpu crypto autotest Fan Zhang
@ 2019-10-07 16:28     ` Fan Zhang
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 05/10] crypto/aesni_mb: add rte_security handler Fan Zhang
                       ` (5 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Fan Zhang @ 2019-10-07 16:28 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

Since crypto perf application does not support rte_security, this patch
adds a simple GCM CPU crypto performance test to crypto unittest
application. The test includes different key and data sizes test with
single buffer and SGL buffer test items and will display the throughput
as well as cycle count performance information.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 app/test/test_security_cpu_crypto.c | 201 ++++++++++++++++++++++++++++++++++++
 1 file changed, 201 insertions(+)

diff --git a/app/test/test_security_cpu_crypto.c b/app/test/test_security_cpu_crypto.c
index d345922b2..ca9a8dae6 100644
--- a/app/test/test_security_cpu_crypto.c
+++ b/app/test/test_security_cpu_crypto.c
@@ -23,6 +23,7 @@
 
 #define CPU_CRYPTO_TEST_MAX_AAD_LENGTH	16
 #define MAX_NB_SIGMENTS			4
+#define CACHE_WARM_ITER			2048
 
 enum buffer_assemble_option {
 	SGL_MAX_SEG,
@@ -560,5 +561,205 @@ test_security_cpu_crypto_aesni_gcm(void)
 	return unit_test_suite_runner(&security_cpu_crypto_aesgcm_testsuite);
 }
 
+
+static inline void
+gen_rand(uint8_t *data, uint32_t len)
+{
+	uint32_t i;
+
+	for (i = 0; i < len; i++)
+		data[i] = (uint8_t)rte_rand();
+}
+
+static inline void
+switch_aead_enc_to_dec(struct aead_test_data *tdata,
+		struct cpu_crypto_test_case *tcase,
+		enum buffer_assemble_option sgl_option)
+{
+	uint32_t i;
+	uint8_t *dst = tdata->ciphertext.data;
+
+	switch (sgl_option) {
+	case SGL_ONE_SEG:
+		memcpy(dst, tcase->seg_buf[0].seg, tcase->seg_buf[0].seg_len);
+		tdata->ciphertext.len = tcase->seg_buf[0].seg_len;
+		break;
+	case SGL_MAX_SEG:
+		tdata->ciphertext.len = 0;
+		for (i = 0; i < MAX_NB_SIGMENTS; i++) {
+			memcpy(dst, tcase->seg_buf[i].seg,
+					tcase->seg_buf[i].seg_len);
+			tdata->ciphertext.len += tcase->seg_buf[i].seg_len;
+		}
+		break;
+	}
+
+	memcpy(tdata->auth_tag.data, tcase->digest, tdata->auth_tag.len);
+}
+
+static int
+cpu_crypto_test_aead_perf(enum buffer_assemble_option sgl_option,
+		uint32_t key_sz)
+{
+	struct aead_test_data tdata = {0};
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+	struct cpu_crypto_test_obj *obj = &ut_params->test_obj;
+	struct cpu_crypto_test_case *tcase;
+	uint64_t hz = rte_get_tsc_hz(), time_start, time_now;
+	double rate, cycles_per_buf;
+	uint32_t test_data_szs[] = {64, 128, 256, 512, 1024, 2048};
+	uint32_t i, j;
+	uint8_t aad[16];
+	int ret;
+
+	tdata.key.len = key_sz;
+	gen_rand(tdata.key.data, tdata.key.len);
+	tdata.algo = RTE_CRYPTO_AEAD_AES_GCM;
+	tdata.aad.data = aad;
+
+	ut_params->sess = create_aead_session(ts_params->ctx,
+			ts_params->session_priv_mpool,
+			RTE_CRYPTO_AEAD_OP_DECRYPT,
+			&tdata,
+			0);
+	if (!ut_params->sess)
+		return -1;
+
+	ret = allocate_buf(MAX_NUM_OPS_INFLIGHT);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < RTE_DIM(test_data_szs); i++) {
+		for (j = 0; j < MAX_NUM_OPS_INFLIGHT; j++) {
+			tdata.plaintext.len = test_data_szs[i];
+			gen_rand(tdata.plaintext.data,
+					tdata.plaintext.len);
+
+			tdata.aad.len = 12;
+			gen_rand(tdata.aad.data, tdata.aad.len);
+
+			tdata.auth_tag.len = 16;
+
+			tdata.iv.len = 16;
+			gen_rand(tdata.iv.data, tdata.iv.len);
+
+			tcase = ut_params->test_datas[j];
+			ret = assemble_aead_buf(tcase, obj, j,
+					RTE_CRYPTO_AEAD_OP_ENCRYPT,
+					&tdata, sgl_option, 0);
+			if (ret < 0) {
+				printf("Test is not supported by the driver\n");
+				return ret;
+			}
+		}
+
+		/* warm up cache */
+		for (j = 0; j < CACHE_WARM_ITER; j++)
+			run_test(ts_params->ctx, ut_params->sess, obj,
+					MAX_NUM_OPS_INFLIGHT);
+
+		time_start = rte_rdtsc();
+
+		run_test(ts_params->ctx, ut_params->sess, obj,
+				MAX_NUM_OPS_INFLIGHT);
+
+		time_now = rte_rdtsc();
+
+		rate = time_now - time_start;
+		cycles_per_buf = rate / MAX_NUM_OPS_INFLIGHT;
+
+		rate = ((hz / cycles_per_buf)) / 1000000;
+
+		printf("AES-GCM-%u(%4uB) Enc %03.3fMpps (%03.3fGbps) ",
+				key_sz * 8, test_data_szs[i], rate,
+				rate  * test_data_szs[i] * 8 / 1000);
+		printf("cycles per buf %03.3f per byte %03.3f\n",
+				cycles_per_buf,
+				cycles_per_buf / test_data_szs[i]);
+
+		for (j = 0; j < MAX_NUM_OPS_INFLIGHT; j++) {
+			tcase = ut_params->test_datas[j];
+
+			switch_aead_enc_to_dec(&tdata, tcase, sgl_option);
+			ret = assemble_aead_buf(tcase, obj, j,
+					RTE_CRYPTO_AEAD_OP_DECRYPT,
+					&tdata, sgl_option, 0);
+			if (ret < 0) {
+				printf("Test is not supported by the driver\n");
+				return ret;
+			}
+		}
+
+		time_start = rte_get_timer_cycles();
+
+		run_test(ts_params->ctx, ut_params->sess, obj,
+				MAX_NUM_OPS_INFLIGHT);
+
+		time_now = rte_get_timer_cycles();
+
+		rate = time_now - time_start;
+		cycles_per_buf = rate / MAX_NUM_OPS_INFLIGHT;
+
+		rate = ((hz / cycles_per_buf)) / 1000000;
+
+		printf("AES-GCM-%u(%4uB) Dec %03.3fMpps (%03.3fGbps) ",
+				key_sz * 8, test_data_szs[i], rate,
+				rate  * test_data_szs[i] * 8 / 1000);
+		printf("cycles per buf %03.3f per byte %03.3f\n",
+				cycles_per_buf,
+				cycles_per_buf / test_data_szs[i]);
+	}
+
+	return 0;
+}
+
+/* test-perfix/key-size/sgl-type */
+#define all_gcm_perf_test_cases(type)					\
+	TEST_EXPAND(_128, 16, type)					\
+	TEST_EXPAND(_192, 24, type)					\
+	TEST_EXPAND(_256, 32, type)
+
+#define TEST_EXPAND(a, b, c)						\
+static int								\
+cpu_crypto_gcm_perf##a##_##c(void)					\
+{									\
+	return cpu_crypto_test_aead_perf(c, b);				\
+}									\
+
+all_gcm_perf_test_cases(SGL_ONE_SEG)
+all_gcm_perf_test_cases(SGL_MAX_SEG)
+#undef TEST_EXPAND
+
+static struct unit_test_suite security_cpu_crypto_aesgcm_perf_testsuite  = {
+		.suite_name = "Security CPU Crypto AESNI-GCM Perf Test Suite",
+		.setup = testsuite_setup,
+		.teardown = testsuite_teardown,
+		.unit_test_cases = {
+#define TEST_EXPAND(a, b, c)						\
+		TEST_CASE_ST(ut_setup, ut_teardown,			\
+				cpu_crypto_gcm_perf##a##_##c),		\
+
+		all_gcm_perf_test_cases(SGL_ONE_SEG)
+		all_gcm_perf_test_cases(SGL_MAX_SEG)
+#undef TEST_EXPAND
+
+		TEST_CASES_END() /**< NULL terminate unit test array */
+		},
+};
+
+static int
+test_security_cpu_crypto_aesni_gcm_perf(void)
+{
+	gbl_driver_id =	rte_cryptodev_driver_id_get(
+			RTE_STR(CRYPTODEV_NAME_AESNI_GCM_PMD));
+
+	return unit_test_suite_runner(
+			&security_cpu_crypto_aesgcm_perf_testsuite);
+}
+
 REGISTER_TEST_COMMAND(security_aesni_gcm_autotest,
 		test_security_cpu_crypto_aesni_gcm);
+
+REGISTER_TEST_COMMAND(security_aesni_gcm_perftest,
+		test_security_cpu_crypto_aesni_gcm_perf);
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH v2 05/10] crypto/aesni_mb: add rte_security handler
  2019-10-07 16:28   ` [dpdk-dev] [PATCH v2 " Fan Zhang
                       ` (3 preceding siblings ...)
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 04/10] app/test: add security cpu crypto perftest Fan Zhang
@ 2019-10-07 16:28     ` Fan Zhang
  2019-10-08 16:23       ` Ananyev, Konstantin
  2019-10-09  8:29       ` Ananyev, Konstantin
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 06/10] app/test: add aesni_mb security cpu crypto autotest Fan Zhang
                       ` (4 subsequent siblings)
  9 siblings, 2 replies; 87+ messages in thread
From: Fan Zhang @ 2019-10-07 16:28 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

This patch add rte_security support support to AESNI-MB PMD. The PMD now
initialize security context instance, create/delete PMD specific security
sessions, and process crypto workloads in synchronous mode.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 drivers/crypto/aesni_mb/meson.build                |   2 +-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c         | 368 +++++++++++++++++++--
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c     |  92 +++++-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h |  21 +-
 4 files changed, 453 insertions(+), 30 deletions(-)

diff --git a/drivers/crypto/aesni_mb/meson.build b/drivers/crypto/aesni_mb/meson.build
index 3e1687416..e7b585168 100644
--- a/drivers/crypto/aesni_mb/meson.build
+++ b/drivers/crypto/aesni_mb/meson.build
@@ -23,4 +23,4 @@ endif
 
 sources = files('rte_aesni_mb_pmd.c', 'rte_aesni_mb_pmd_ops.c')
 allow_experimental_apis = true
-deps += ['bus_vdev']
+deps += ['bus_vdev', 'security']
diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
index ce1144b95..a4cd518b7 100644
--- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
+++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
@@ -8,6 +8,8 @@
 #include <rte_hexdump.h>
 #include <rte_cryptodev.h>
 #include <rte_cryptodev_pmd.h>
+#include <rte_security.h>
+#include <rte_security_driver.h>
 #include <rte_bus_vdev.h>
 #include <rte_malloc.h>
 #include <rte_cpuflags.h>
@@ -19,6 +21,9 @@
 #define HMAC_MAX_BLOCK_SIZE 128
 static uint8_t cryptodev_driver_id;
 
+static enum aesni_mb_vector_mode vector_mode;
+/**< CPU vector instruction set mode */
+
 typedef void (*hash_one_block_t)(const void *data, void *digest);
 typedef void (*aes_keyexp_t)(const void *key, void *enc_exp_keys, void *dec_exp_keys);
 
@@ -808,6 +813,164 @@ auth_start_offset(struct rte_crypto_op *op, struct aesni_mb_session *session,
 			(UINT64_MAX - u_src + u_dst + 1);
 }
 
+union sec_userdata_field {
+	int status;
+	struct {
+		uint16_t is_gen_digest;
+		uint16_t digest_len;
+	};
+};
+
+struct sec_udata_digest_field {
+	uint32_t is_digest_gen;
+	uint32_t digest_len;
+};
+
+static inline int
+set_mb_job_params_sec(JOB_AES_HMAC *job, struct aesni_mb_sec_session *sec_sess,
+		void *buf, uint32_t buf_len, void *iv, void *aad, void *digest,
+		int *status, uint8_t *digest_idx)
+{
+	struct aesni_mb_session *session = &sec_sess->sess;
+	uint32_t cipher_offset = sec_sess->cipher_offset;
+	union sec_userdata_field udata;
+
+	if (unlikely(cipher_offset > buf_len))
+		return -EINVAL;
+
+	/* Set crypto operation */
+	job->chain_order = session->chain_order;
+
+	/* Set cipher parameters */
+	job->cipher_direction = session->cipher.direction;
+	job->cipher_mode = session->cipher.mode;
+
+	job->aes_key_len_in_bytes = session->cipher.key_length_in_bytes;
+
+	/* Set authentication parameters */
+	job->hash_alg = session->auth.algo;
+	job->iv = iv;
+
+	switch (job->hash_alg) {
+	case AES_XCBC:
+		job->u.XCBC._k1_expanded = session->auth.xcbc.k1_expanded;
+		job->u.XCBC._k2 = session->auth.xcbc.k2;
+		job->u.XCBC._k3 = session->auth.xcbc.k3;
+
+		job->aes_enc_key_expanded =
+				session->cipher.expanded_aes_keys.encode;
+		job->aes_dec_key_expanded =
+				session->cipher.expanded_aes_keys.decode;
+		break;
+
+	case AES_CCM:
+		job->u.CCM.aad = (uint8_t *)aad + 18;
+		job->u.CCM.aad_len_in_bytes = session->aead.aad_len;
+		job->aes_enc_key_expanded =
+				session->cipher.expanded_aes_keys.encode;
+		job->aes_dec_key_expanded =
+				session->cipher.expanded_aes_keys.decode;
+		job->iv++;
+		break;
+
+	case AES_CMAC:
+		job->u.CMAC._key_expanded = session->auth.cmac.expkey;
+		job->u.CMAC._skey1 = session->auth.cmac.skey1;
+		job->u.CMAC._skey2 = session->auth.cmac.skey2;
+		job->aes_enc_key_expanded =
+				session->cipher.expanded_aes_keys.encode;
+		job->aes_dec_key_expanded =
+				session->cipher.expanded_aes_keys.decode;
+		break;
+
+	case AES_GMAC:
+		if (session->cipher.mode == GCM) {
+			job->u.GCM.aad = aad;
+			job->u.GCM.aad_len_in_bytes = session->aead.aad_len;
+		} else {
+			/* For GMAC */
+			job->u.GCM.aad = aad;
+			job->u.GCM.aad_len_in_bytes = buf_len;
+			job->cipher_mode = GCM;
+		}
+		job->aes_enc_key_expanded = &session->cipher.gcm_key;
+		job->aes_dec_key_expanded = &session->cipher.gcm_key;
+		break;
+
+	default:
+		job->u.HMAC._hashed_auth_key_xor_ipad =
+				session->auth.pads.inner;
+		job->u.HMAC._hashed_auth_key_xor_opad =
+				session->auth.pads.outer;
+
+		if (job->cipher_mode == DES3) {
+			job->aes_enc_key_expanded =
+				session->cipher.exp_3des_keys.ks_ptr;
+			job->aes_dec_key_expanded =
+				session->cipher.exp_3des_keys.ks_ptr;
+		} else {
+			job->aes_enc_key_expanded =
+				session->cipher.expanded_aes_keys.encode;
+			job->aes_dec_key_expanded =
+				session->cipher.expanded_aes_keys.decode;
+		}
+	}
+
+	/* Set digest output location */
+	if (job->hash_alg != NULL_HASH &&
+			session->auth.operation == RTE_CRYPTO_AUTH_OP_VERIFY) {
+		job->auth_tag_output = sec_sess->temp_digests[*digest_idx];
+		*digest_idx = (*digest_idx + 1) % MAX_JOBS;
+
+		udata.is_gen_digest = 0;
+		udata.digest_len = session->auth.req_digest_len;
+	} else {
+		udata.is_gen_digest = 1;
+		udata.digest_len = session->auth.req_digest_len;
+
+		if (session->auth.req_digest_len !=
+				session->auth.gen_digest_len) {
+			job->auth_tag_output =
+					sec_sess->temp_digests[*digest_idx];
+			*digest_idx = (*digest_idx + 1) % MAX_JOBS;
+		} else
+			job->auth_tag_output = digest;
+	}
+
+	/* A bit of hack here, since job structure only supports
+	 * 2 user data fields and we need 4 params to be passed
+	 * (status, direction, digest for verify, and length of
+	 * digest), we set the status value as digest length +
+	 * direction here temporarily to avoid creating longer
+	 * buffer to store all 4 params.
+	 */
+	*status = udata.status;
+
+	/*
+	 * Multi-buffer library current only support returning a truncated
+	 * digest length as specified in the relevant IPsec RFCs
+	 */
+
+	/* Set digest length */
+	job->auth_tag_output_len_in_bytes = session->auth.gen_digest_len;
+
+	/* Set IV parameters */
+	job->iv_len_in_bytes = session->iv.length;
+
+	/* Data Parameters */
+	job->src = buf;
+	job->dst = (uint8_t *)buf + cipher_offset;
+	job->cipher_start_src_offset_in_bytes = cipher_offset;
+	job->msg_len_to_cipher_in_bytes = buf_len - cipher_offset;
+	job->hash_start_src_offset_in_bytes = 0;
+	job->msg_len_to_hash_in_bytes = buf_len;
+
+	job->user_data = (void *)status;
+	job->user_data2 = digest;
+
+	return 0;
+}
+
 /**
  * Process a crypto operation and complete a JOB_AES_HMAC job structure for
  * submission to the multi buffer library for processing.
@@ -1100,6 +1263,35 @@ post_process_mb_job(struct aesni_mb_qp *qp, JOB_AES_HMAC *job)
 	return op;
 }
 
+static inline void
+post_process_mb_sec_job(JOB_AES_HMAC *job)
+{
+	void *user_digest = job->user_data2;
+	int *status = job->user_data;
+
+	switch (job->status) {
+	case STS_COMPLETED:
+		if (user_digest) {
+			union sec_userdata_field udata;
+
+			udata.status = *status;
+			if (udata.is_gen_digest) {
+				*status = RTE_CRYPTO_OP_STATUS_SUCCESS;
+				memcpy(user_digest, job->auth_tag_output,
+						udata.digest_len);
+			} else {
+				*status = (memcmp(job->auth_tag_output,
+					user_digest, udata.digest_len) != 0) ?
+						-1 : 0;
+			}
+		} else
+			*status = RTE_CRYPTO_OP_STATUS_SUCCESS;
+		break;
+	default:
+		*status = RTE_CRYPTO_OP_STATUS_ERROR;
+	}
+}
+
 /**
  * Process a completed JOB_AES_HMAC job and keep processing jobs until
  * get_completed_job return NULL
@@ -1136,6 +1328,32 @@ handle_completed_jobs(struct aesni_mb_qp *qp, JOB_AES_HMAC *job,
 	return processed_jobs;
 }
 
+static inline uint32_t
+handle_completed_sec_jobs(JOB_AES_HMAC *job, MB_MGR *mb_mgr)
+{
+	uint32_t processed = 0;
+
+	while (job != NULL) {
+		post_process_mb_sec_job(job);
+		job = IMB_GET_COMPLETED_JOB(mb_mgr);
+		processed++;
+	}
+
+	return processed;
+}
+
+static inline uint32_t
+flush_mb_sec_mgr(MB_MGR *mb_mgr)
+{
+	JOB_AES_HMAC *job = IMB_FLUSH_JOB(mb_mgr);
+	uint32_t processed = 0;
+
+	if (job)
+		processed = handle_completed_sec_jobs(job, mb_mgr);
+
+	return processed;
+}
+
 static inline uint16_t
 flush_mb_mgr(struct aesni_mb_qp *qp, struct rte_crypto_op **ops,
 		uint16_t nb_ops)
@@ -1239,6 +1457,105 @@ aesni_mb_pmd_dequeue_burst(void *queue_pair, struct rte_crypto_op **ops,
 	return processed_jobs;
 }
 
+static MB_MGR *
+alloc_init_mb_mgr(void)
+{
+	MB_MGR *mb_mgr = alloc_mb_mgr(0);
+	if (mb_mgr == NULL)
+		return NULL;
+
+	switch (vector_mode) {
+	case RTE_AESNI_MB_SSE:
+		init_mb_mgr_sse(mb_mgr);
+		break;
+	case RTE_AESNI_MB_AVX:
+		init_mb_mgr_avx(mb_mgr);
+		break;
+	case RTE_AESNI_MB_AVX2:
+		init_mb_mgr_avx2(mb_mgr);
+		break;
+	case RTE_AESNI_MB_AVX512:
+		init_mb_mgr_avx512(mb_mgr);
+		break;
+	default:
+		AESNI_MB_LOG(ERR, "Unsupported vector mode %u\n", vector_mode);
+		free_mb_mgr(mb_mgr);
+		return NULL;
+	}
+
+	return mb_mgr;
+}
+
+static MB_MGR *sec_mb_mgrs[RTE_MAX_LCORE];
+
+int
+aesni_mb_sec_crypto_process_bulk(struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num)
+{
+	struct aesni_mb_sec_session *sec_sess = sess->sess_private_data;
+	JOB_AES_HMAC *job;
+	static MB_MGR *mb_mgr;
+	uint32_t lcore_id = rte_lcore_id();
+	uint8_t digest_idx = sec_sess->digest_idx;
+	uint32_t i, processed = 0;
+	int ret = 0, errcnt = 0;
+
+	if (unlikely(sec_mb_mgrs[lcore_id] == NULL)) {
+		sec_mb_mgrs[lcore_id] = alloc_init_mb_mgr();
+
+		if (sec_mb_mgrs[lcore_id] == NULL) {
+			for (i = 0; i < num; i++)
+				status[i] = -ENOMEM;
+
+			return -num;
+		}
+	}
+
+	mb_mgr = sec_mb_mgrs[lcore_id];
+
+	for (i = 0; i < num; i++) {
+		void *seg_buf = buf[i].vec[0].iov_base;
+		uint32_t buf_len = buf[i].vec[0].iov_len;
+
+		job = IMB_GET_NEXT_JOB(mb_mgr);
+		if (unlikely(job == NULL)) {
+			processed += flush_mb_sec_mgr(mb_mgr);
+
+			job = IMB_GET_NEXT_JOB(mb_mgr);
+			if (!job) {
+				errcnt -= 1;
+				status[i] = -ENOMEM;
+			}
+		}
+
+		ret = set_mb_job_params_sec(job, sec_sess, seg_buf, buf_len,
+				iv[i], aad[i], digest[i], &status[i],
+				&digest_idx);
+				/* Submit job to multi-buffer for processing */
+		if (ret) {
+			processed++;
+			status[i] = ret;
+			errcnt -= 1;
+			continue;
+		}
+
+#ifdef RTE_LIBRTE_PMD_AESNI_MB_DEBUG
+		job = IMB_SUBMIT_JOB(mb_mgr);
+#else
+		job = IMB_SUBMIT_JOB_NOCHECK(mb_mgr);
+#endif
+
+		if (job)
+			processed += handle_completed_sec_jobs(job, mb_mgr);
+	}
+
+	while (processed < num)
+		processed += flush_mb_sec_mgr(mb_mgr);
+
+	return errcnt;
+}
+
 static int cryptodev_aesni_mb_remove(struct rte_vdev_device *vdev);
 
 static int
@@ -1248,8 +1565,9 @@ cryptodev_aesni_mb_create(const char *name,
 {
 	struct rte_cryptodev *dev;
 	struct aesni_mb_private *internals;
-	enum aesni_mb_vector_mode vector_mode;
+	struct rte_security_ctx *sec_ctx;
 	MB_MGR *mb_mgr;
+	char sec_name[RTE_DEV_NAME_MAX_LEN];
 
 	/* Check CPU for support for AES instruction set */
 	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_AES)) {
@@ -1283,35 +1601,14 @@ cryptodev_aesni_mb_create(const char *name,
 	dev->feature_flags = RTE_CRYPTODEV_FF_SYMMETRIC_CRYPTO |
 			RTE_CRYPTODEV_FF_SYM_OPERATION_CHAINING |
 			RTE_CRYPTODEV_FF_CPU_AESNI |
-			RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT;
+			RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT |
+			RTE_CRYPTODEV_FF_SECURITY;
 
 
-	mb_mgr = alloc_mb_mgr(0);
+	mb_mgr = alloc_init_mb_mgr();
 	if (mb_mgr == NULL)
 		return -ENOMEM;
 
-	switch (vector_mode) {
-	case RTE_AESNI_MB_SSE:
-		dev->feature_flags |= RTE_CRYPTODEV_FF_CPU_SSE;
-		init_mb_mgr_sse(mb_mgr);
-		break;
-	case RTE_AESNI_MB_AVX:
-		dev->feature_flags |= RTE_CRYPTODEV_FF_CPU_AVX;
-		init_mb_mgr_avx(mb_mgr);
-		break;
-	case RTE_AESNI_MB_AVX2:
-		dev->feature_flags |= RTE_CRYPTODEV_FF_CPU_AVX2;
-		init_mb_mgr_avx2(mb_mgr);
-		break;
-	case RTE_AESNI_MB_AVX512:
-		dev->feature_flags |= RTE_CRYPTODEV_FF_CPU_AVX512;
-		init_mb_mgr_avx512(mb_mgr);
-		break;
-	default:
-		AESNI_MB_LOG(ERR, "Unsupported vector mode %u\n", vector_mode);
-		goto error_exit;
-	}
-
 	/* Set vector instructions mode supported */
 	internals = dev->data->dev_private;
 
@@ -1322,11 +1619,28 @@ cryptodev_aesni_mb_create(const char *name,
 	AESNI_MB_LOG(INFO, "IPSec Multi-buffer library version used: %s\n",
 			imb_get_version_str());
 
+	/* setup security operations */
+	snprintf(sec_name, sizeof(sec_name) - 1, "aes_mb_sec_%u",
+			dev->driver_id);
+	sec_ctx = rte_zmalloc_socket(sec_name,
+			sizeof(struct rte_security_ctx),
+			RTE_CACHE_LINE_SIZE, init_params->socket_id);
+	if (sec_ctx == NULL) {
+		AESNI_MB_LOG(ERR, "memory allocation failed\n");
+		goto error_exit;
+	}
+
+	sec_ctx->device = (void *)dev;
+	sec_ctx->ops = rte_aesni_mb_pmd_security_ops;
+	dev->security_ctx = sec_ctx;
+
 	return 0;
 
 error_exit:
 	if (mb_mgr)
 		free_mb_mgr(mb_mgr);
+	if (sec_ctx)
+		rte_free(sec_ctx);
 
 	rte_cryptodev_pmd_destroy(dev);
 
@@ -1367,6 +1681,7 @@ cryptodev_aesni_mb_remove(struct rte_vdev_device *vdev)
 	struct rte_cryptodev *cryptodev;
 	struct aesni_mb_private *internals;
 	const char *name;
+	uint32_t i;
 
 	name = rte_vdev_device_name(vdev);
 	if (name == NULL)
@@ -1379,6 +1694,9 @@ cryptodev_aesni_mb_remove(struct rte_vdev_device *vdev)
 	internals = cryptodev->data->dev_private;
 
 	free_mb_mgr(internals->mb_mgr);
+	for (i = 0; i < RTE_MAX_LCORE; i++)
+		if (sec_mb_mgrs[i])
+			free_mb_mgr(sec_mb_mgrs[i]);
 
 	return rte_cryptodev_pmd_destroy(cryptodev);
 }
diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
index 8d15b99d4..f47df2d57 100644
--- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
+++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
@@ -8,6 +8,7 @@
 #include <rte_common.h>
 #include <rte_malloc.h>
 #include <rte_cryptodev_pmd.h>
+#include <rte_security_driver.h>
 
 #include "rte_aesni_mb_pmd_private.h"
 
@@ -732,7 +733,8 @@ aesni_mb_pmd_qp_count(struct rte_cryptodev *dev)
 static unsigned
 aesni_mb_pmd_sym_session_get_size(struct rte_cryptodev *dev __rte_unused)
 {
-	return sizeof(struct aesni_mb_session);
+	return RTE_ALIGN_CEIL(sizeof(struct aesni_mb_session),
+			RTE_CACHE_LINE_SIZE);
 }
 
 /** Configure a aesni multi-buffer session from a crypto xform chain */
@@ -810,4 +812,92 @@ struct rte_cryptodev_ops aesni_mb_pmd_ops = {
 		.sym_session_clear	= aesni_mb_pmd_sym_session_clear
 };
 
+/** Set session authentication parameters */
+
+static int
+aesni_mb_security_session_create(void *dev,
+		struct rte_security_session_conf *conf,
+		struct rte_security_session *sess,
+		struct rte_mempool *mempool)
+{
+	struct rte_cryptodev *cdev = dev;
+	struct aesni_mb_private *internals = cdev->data->dev_private;
+	struct aesni_mb_sec_session *sess_priv;
+	int ret;
+
+	if (!conf->crypto_xform) {
+		AESNI_MB_LOG(ERR, "Invalid security session conf");
+		return -EINVAL;
+	}
+
+	if (conf->cpucrypto.cipher_offset < 0) {
+		AESNI_MB_LOG(ERR, "Invalid security session conf");
+		return -EINVAL;
+	}
+
+	if (rte_mempool_get(mempool, (void **)(&sess_priv))) {
+		AESNI_MB_LOG(ERR,
+				"Couldn't get object from session mempool");
+		return -ENOMEM;
+	}
+
+	sess_priv->cipher_offset = conf->cpucrypto.cipher_offset;
+
+	ret = aesni_mb_set_session_parameters(internals->mb_mgr,
+			&sess_priv->sess, conf->crypto_xform);
+	if (ret != 0) {
+		AESNI_MB_LOG(ERR, "failed configure session parameters");
+
+		rte_mempool_put(mempool, sess_priv);
+	}
+
+	sess->sess_private_data = (void *)sess_priv;
+
+	return ret;
+}
+
+static int
+aesni_mb_security_session_destroy(void *dev __rte_unused,
+		struct rte_security_session *sess)
+{
+	struct aesni_mb_sec_session *sess_priv =
+			get_sec_session_private_data(sess);
+
+	if (sess_priv) {
+		struct rte_mempool *sess_mp = rte_mempool_from_obj(
+				(void *)sess_priv);
+
+		memset(sess, 0, sizeof(struct aesni_mb_sec_session));
+		set_sec_session_private_data(sess, NULL);
+
+		if (sess_mp == NULL) {
+			AESNI_MB_LOG(ERR, "failed fetch session mempool");
+			return -EINVAL;
+		}
+
+		rte_mempool_put(sess_mp, sess_priv);
+	}
+
+	return 0;
+}
+
+static unsigned int
+aesni_mb_sec_session_get_size(__rte_unused void *device)
+{
+	return RTE_ALIGN_CEIL(sizeof(struct aesni_mb_sec_session),
+			RTE_CACHE_LINE_SIZE);
+}
+
+static struct rte_security_ops aesni_mb_security_ops = {
+		.session_create = aesni_mb_security_session_create,
+		.session_get_size = aesni_mb_sec_session_get_size,
+		.session_update = NULL,
+		.session_stats_get = NULL,
+		.session_destroy = aesni_mb_security_session_destroy,
+		.set_pkt_metadata = NULL,
+		.capabilities_get = NULL,
+		.process_cpu_crypto_bulk = aesni_mb_sec_crypto_process_bulk,
+};
+
 struct rte_cryptodev_ops *rte_aesni_mb_pmd_ops = &aesni_mb_pmd_ops;
+struct rte_security_ops *rte_aesni_mb_pmd_security_ops = &aesni_mb_security_ops;
diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h
index b794d4bc1..64b58ca8e 100644
--- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h
+++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h
@@ -176,7 +176,6 @@ struct aesni_mb_qp {
 	 */
 } __rte_cache_aligned;
 
-/** AES-NI multi-buffer private session structure */
 struct aesni_mb_session {
 	JOB_CHAIN_ORDER chain_order;
 	struct {
@@ -265,16 +264,32 @@ struct aesni_mb_session {
 		/** AAD data length */
 		uint16_t aad_len;
 	} aead;
-} __rte_cache_aligned;
+};
+
+/** AES-NI multi-buffer private security session structure */
+struct aesni_mb_sec_session {
+	/**< Unique Queue Pair Name */
+	struct aesni_mb_session sess;
+	uint8_t temp_digests[MAX_JOBS][DIGEST_LENGTH_MAX];
+	uint16_t digest_idx;
+	uint32_t cipher_offset;
+	MB_MGR *mb_mgr;
+};
 
 extern int
 aesni_mb_set_session_parameters(const MB_MGR *mb_mgr,
 		struct aesni_mb_session *sess,
 		const struct rte_crypto_sym_xform *xform);
 
+extern int
+aesni_mb_sec_crypto_process_bulk(struct rte_security_session *sess,
+		struct rte_security_vec buf[], void *iv[], void *aad[],
+		void *digest[], int status[], uint32_t num);
+
 /** device specific operations function pointer structure */
 extern struct rte_cryptodev_ops *rte_aesni_mb_pmd_ops;
 
-
+/** device specific operations function pointer structure for rte_security */
+extern struct rte_security_ops *rte_aesni_mb_pmd_security_ops;
 
 #endif /* _RTE_AESNI_MB_PMD_PRIVATE_H_ */
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH v2 06/10] app/test: add aesni_mb security cpu crypto autotest
  2019-10-07 16:28   ` [dpdk-dev] [PATCH v2 " Fan Zhang
                       ` (4 preceding siblings ...)
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 05/10] crypto/aesni_mb: add rte_security handler Fan Zhang
@ 2019-10-07 16:28     ` Fan Zhang
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 07/10] app/test: add aesni_mb security cpu crypto perftest Fan Zhang
                       ` (3 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Fan Zhang @ 2019-10-07 16:28 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

This patch adds cpu crypto unit test for AESNI_MB PMD.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 app/test/test_security_cpu_crypto.c | 371 +++++++++++++++++++++++++++++++++++-
 1 file changed, 369 insertions(+), 2 deletions(-)

diff --git a/app/test/test_security_cpu_crypto.c b/app/test/test_security_cpu_crypto.c
index ca9a8dae6..a9853a0c0 100644
--- a/app/test/test_security_cpu_crypto.c
+++ b/app/test/test_security_cpu_crypto.c
@@ -19,12 +19,23 @@
 
 #include "test.h"
 #include "test_cryptodev.h"
+#include "test_cryptodev_blockcipher.h"
+#include "test_cryptodev_aes_test_vectors.h"
 #include "test_cryptodev_aead_test_vectors.h"
+#include "test_cryptodev_des_test_vectors.h"
+#include "test_cryptodev_hash_test_vectors.h"
 
 #define CPU_CRYPTO_TEST_MAX_AAD_LENGTH	16
 #define MAX_NB_SIGMENTS			4
 #define CACHE_WARM_ITER			2048
 
+#define TOP_ENC		BLOCKCIPHER_TEST_OP_ENCRYPT
+#define TOP_DEC		BLOCKCIPHER_TEST_OP_DECRYPT
+#define TOP_AUTH_GEN	BLOCKCIPHER_TEST_OP_AUTH_GEN
+#define TOP_AUTH_VER	BLOCKCIPHER_TEST_OP_AUTH_VERIFY
+#define TOP_ENC_AUTH	BLOCKCIPHER_TEST_OP_ENC_AUTH_GEN
+#define TOP_AUTH_DEC	BLOCKCIPHER_TEST_OP_AUTH_VERIFY_DEC
+
 enum buffer_assemble_option {
 	SGL_MAX_SEG,
 	SGL_ONE_SEG,
@@ -35,8 +46,8 @@ struct cpu_crypto_test_case {
 		uint8_t seg[MBUF_DATAPAYLOAD_SIZE];
 		uint32_t seg_len;
 	} seg_buf[MAX_NB_SIGMENTS];
-	uint8_t iv[MAXIMUM_IV_LENGTH];
-	uint8_t aad[CPU_CRYPTO_TEST_MAX_AAD_LENGTH];
+	uint8_t iv[MAXIMUM_IV_LENGTH * 2];
+	uint8_t aad[CPU_CRYPTO_TEST_MAX_AAD_LENGTH * 4];
 	uint8_t digest[DIGEST_BYTE_LENGTH_SHA512];
 } __rte_cache_aligned;
 
@@ -516,6 +527,11 @@ cpu_crypto_test_aead(const struct aead_test_data *tdata,
 	TEST_EXPAND(gcm_test_case_256_6, type)	\
 	TEST_EXPAND(gcm_test_case_256_7, type)
 
+/* test-vector/sgl-option */
+#define all_ccm_unit_test_cases \
+	TEST_EXPAND(ccm_test_case_128_1, SGL_ONE_SEG) \
+	TEST_EXPAND(ccm_test_case_128_2, SGL_ONE_SEG) \
+	TEST_EXPAND(ccm_test_case_128_3, SGL_ONE_SEG)
 
 #define TEST_EXPAND(t, o)						\
 static int								\
@@ -531,6 +547,7 @@ cpu_crypto_aead_dec_test_##t##_##o(void)				\
 
 all_gcm_unit_test_cases(SGL_ONE_SEG)
 all_gcm_unit_test_cases(SGL_MAX_SEG)
+all_ccm_unit_test_cases
 #undef TEST_EXPAND
 
 static struct unit_test_suite security_cpu_crypto_aesgcm_testsuite  = {
@@ -758,8 +775,358 @@ test_security_cpu_crypto_aesni_gcm_perf(void)
 			&security_cpu_crypto_aesgcm_perf_testsuite);
 }
 
+static struct rte_security_session *
+create_blockcipher_session(struct rte_security_ctx *ctx,
+		struct rte_mempool *sess_mp,
+		uint32_t op_mask,
+		const struct blockcipher_test_data *test_data,
+		uint32_t is_unit_test)
+{
+	struct rte_security_session_conf sess_conf = {0};
+	struct rte_crypto_sym_xform xforms[2] = { {0} };
+	struct rte_crypto_sym_xform *cipher_xform = NULL;
+	struct rte_crypto_sym_xform *auth_xform = NULL;
+	struct rte_crypto_sym_xform *xform;
+
+	if (op_mask & BLOCKCIPHER_TEST_OP_CIPHER) {
+		cipher_xform = &xforms[0];
+		cipher_xform->type = RTE_CRYPTO_SYM_XFORM_CIPHER;
+
+		if (op_mask & TOP_ENC)
+			cipher_xform->cipher.op =
+				RTE_CRYPTO_CIPHER_OP_ENCRYPT;
+		else
+			cipher_xform->cipher.op =
+				RTE_CRYPTO_CIPHER_OP_DECRYPT;
+
+		cipher_xform->cipher.algo = test_data->crypto_algo;
+		cipher_xform->cipher.key.data = test_data->cipher_key.data;
+		cipher_xform->cipher.key.length = test_data->cipher_key.len;
+		cipher_xform->cipher.iv.offset = 0;
+		cipher_xform->cipher.iv.length = test_data->iv.len;
+
+		if (is_unit_test)
+			debug_hexdump(stdout, "cipher key:",
+					test_data->cipher_key.data,
+					test_data->cipher_key.len);
+	}
+
+	if (op_mask & BLOCKCIPHER_TEST_OP_AUTH) {
+		auth_xform = &xforms[1];
+		auth_xform->type = RTE_CRYPTO_SYM_XFORM_AUTH;
+
+		if (op_mask & TOP_AUTH_GEN)
+			auth_xform->auth.op = RTE_CRYPTO_AUTH_OP_GENERATE;
+		else
+			auth_xform->auth.op = RTE_CRYPTO_AUTH_OP_VERIFY;
+
+		auth_xform->auth.algo = test_data->auth_algo;
+		auth_xform->auth.key.length = test_data->auth_key.len;
+		auth_xform->auth.key.data = test_data->auth_key.data;
+		auth_xform->auth.digest_length = test_data->digest.len;
+
+		if (is_unit_test)
+			debug_hexdump(stdout, "auth key:",
+					test_data->auth_key.data,
+					test_data->auth_key.len);
+	}
+
+	if (op_mask == TOP_ENC ||
+			op_mask == TOP_DEC)
+		xform = cipher_xform;
+	else if (op_mask == TOP_AUTH_GEN ||
+			op_mask == TOP_AUTH_VER)
+		xform = auth_xform;
+	else if (op_mask == TOP_ENC_AUTH) {
+		xform = cipher_xform;
+		xform->next = auth_xform;
+	} else if (op_mask == TOP_AUTH_DEC) {
+		xform = auth_xform;
+		xform->next = cipher_xform;
+	} else
+		return NULL;
+
+	if (test_data->cipher_offset < test_data->auth_offset)
+		return NULL;
+
+	sess_conf.action_type = RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO;
+	sess_conf.crypto_xform = xform;
+	sess_conf.cpucrypto.cipher_offset = test_data->cipher_offset -
+			test_data->auth_offset;
+
+	return rte_security_session_create(ctx, &sess_conf, sess_mp);
+}
+
+static inline int
+assemble_blockcipher_buf(struct cpu_crypto_test_case *data,
+		struct cpu_crypto_test_obj *obj,
+		uint32_t obj_idx,
+		uint32_t op_mask,
+		const struct blockcipher_test_data *test_data,
+		uint32_t is_unit_test)
+{
+	const uint8_t *src;
+	uint32_t src_len;
+	uint32_t offset;
+
+	if (op_mask == TOP_ENC_AUTH ||
+			op_mask == TOP_AUTH_GEN ||
+			op_mask == BLOCKCIPHER_TEST_OP_AUTH_VERIFY)
+		offset = test_data->auth_offset;
+	else
+		offset = test_data->cipher_offset;
+
+	if (op_mask & TOP_ENC_AUTH) {
+		src = test_data->plaintext.data;
+		src_len = test_data->plaintext.len;
+		if (is_unit_test)
+			debug_hexdump(stdout, "plaintext:", src, src_len);
+	} else {
+		src = test_data->ciphertext.data;
+		src_len = test_data->ciphertext.len;
+		memcpy(data->digest, test_data->digest.data,
+				test_data->digest.len);
+		if (is_unit_test) {
+			debug_hexdump(stdout, "ciphertext:", src, src_len);
+			debug_hexdump(stdout, "digest:", test_data->digest.data,
+					test_data->digest.len);
+		}
+	}
+
+	if (src_len > MBUF_DATAPAYLOAD_SIZE)
+		return -ENOMEM;
+
+	memcpy(data->seg_buf[0].seg, src, src_len);
+	data->seg_buf[0].seg_len = src_len;
+	obj->vec[obj_idx][0].iov_base =
+			(void *)(data->seg_buf[0].seg + offset);
+	obj->vec[obj_idx][0].iov_len = src_len - offset;
+
+	obj->sec_buf[obj_idx].vec = obj->vec[obj_idx];
+	obj->sec_buf[obj_idx].num = 1;
+
+	memcpy(data->iv, test_data->iv.data, test_data->iv.len);
+	if (is_unit_test)
+		debug_hexdump(stdout, "iv:", test_data->iv.data,
+				test_data->iv.len);
+
+	obj->iv[obj_idx] = (void *)data->iv;
+	obj->digest[obj_idx] = (void *)data->digest;
+
+	return 0;
+}
+
+static int
+check_blockcipher_result(struct cpu_crypto_test_case *tcase,
+		uint32_t op_mask,
+		const struct blockcipher_test_data *test_data)
+{
+	int ret;
+
+	if (op_mask & BLOCKCIPHER_TEST_OP_CIPHER) {
+		const char *err_msg1, *err_msg2;
+		const uint8_t *src_pt_ct;
+		uint32_t src_len;
+
+		if (op_mask & TOP_ENC) {
+			src_pt_ct = test_data->ciphertext.data;
+			src_len = test_data->ciphertext.len;
+			err_msg1 = CPU_CRYPTO_ERR_EXP_CT;
+			err_msg2 = CPU_CRYPTO_ERR_GEN_CT;
+		} else {
+			src_pt_ct = test_data->plaintext.data;
+			src_len = test_data->plaintext.len;
+			err_msg1 = CPU_CRYPTO_ERR_EXP_PT;
+			err_msg2 = CPU_CRYPTO_ERR_GEN_PT;
+		}
+
+		ret = memcmp(tcase->seg_buf[0].seg, src_pt_ct, src_len);
+		if (ret != 0) {
+			debug_hexdump(stdout, err_msg1, src_pt_ct, src_len);
+			debug_hexdump(stdout, err_msg2,
+					tcase->seg_buf[0].seg,
+					test_data->ciphertext.len);
+			return -1;
+		}
+	}
+
+	if (op_mask & TOP_AUTH_GEN) {
+		ret = memcmp(tcase->digest, test_data->digest.data,
+				test_data->digest.len);
+		if (ret != 0) {
+			debug_hexdump(stdout, "expect digest:",
+					test_data->digest.data,
+					test_data->digest.len);
+			debug_hexdump(stdout, "gen digest:",
+					tcase->digest,
+					test_data->digest.len);
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
+static int
+cpu_crypto_test_blockcipher(const struct blockcipher_test_data *tdata,
+		uint32_t op_mask)
+{
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+	struct cpu_crypto_test_obj *obj = &ut_params->test_obj;
+	struct cpu_crypto_test_case *tcase;
+	int ret;
+
+	ut_params->sess = create_blockcipher_session(ts_params->ctx,
+			ts_params->session_priv_mpool,
+			op_mask,
+			tdata,
+			1);
+	if (!ut_params->sess)
+		return -1;
+
+	ret = allocate_buf(1);
+	if (ret)
+		return ret;
+
+	tcase = ut_params->test_datas[0];
+	ret = assemble_blockcipher_buf(tcase, obj, 0, op_mask, tdata, 1);
+	if (ret < 0) {
+		printf("Test is not supported by the driver\n");
+		return ret;
+	}
+
+	run_test(ts_params->ctx, ut_params->sess, obj, 1);
+
+	ret = check_status(obj, 1);
+	if (ret < 0)
+		return ret;
+
+	ret = check_blockcipher_result(tcase, op_mask, tdata);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
+
+/* Macro to save code for defining BlockCipher test cases */
+/* test-vector-name/op */
+#define all_blockcipher_test_cases \
+	TEST_EXPAND(aes_test_data_1, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_1, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_1, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_1, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_2, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_2, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_2, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_2, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_3, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_3, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_3, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_3, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_4, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_4, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_4, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_4, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_5, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_5, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_5, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_5, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_6, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_6, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_6, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_6, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_7, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_7, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_7, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_7, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_8, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_8, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_8, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_8, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_9, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_9, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_9, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_9, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_10, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_10, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_11, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_11, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_12, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_12, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_12, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_12, TOP_AUTH_DEC) \
+	TEST_EXPAND(aes_test_data_13, TOP_ENC) \
+	TEST_EXPAND(aes_test_data_13, TOP_DEC) \
+	TEST_EXPAND(aes_test_data_13, TOP_ENC_AUTH) \
+	TEST_EXPAND(aes_test_data_13, TOP_AUTH_DEC) \
+	TEST_EXPAND(des_test_data_1, TOP_ENC) \
+	TEST_EXPAND(des_test_data_1, TOP_DEC) \
+	TEST_EXPAND(des_test_data_2, TOP_ENC) \
+	TEST_EXPAND(des_test_data_2, TOP_DEC) \
+	TEST_EXPAND(des_test_data_3, TOP_ENC) \
+	TEST_EXPAND(des_test_data_3, TOP_DEC) \
+	TEST_EXPAND(triple_des128cbc_hmac_sha1_test_vector, TOP_ENC) \
+	TEST_EXPAND(triple_des128cbc_hmac_sha1_test_vector, TOP_DEC) \
+	TEST_EXPAND(triple_des128cbc_hmac_sha1_test_vector, TOP_ENC_AUTH) \
+	TEST_EXPAND(triple_des128cbc_hmac_sha1_test_vector, TOP_AUTH_DEC) \
+	TEST_EXPAND(triple_des64cbc_test_vector, TOP_ENC) \
+	TEST_EXPAND(triple_des64cbc_test_vector, TOP_DEC) \
+	TEST_EXPAND(triple_des128cbc_test_vector, TOP_ENC) \
+	TEST_EXPAND(triple_des128cbc_test_vector, TOP_DEC) \
+	TEST_EXPAND(triple_des192cbc_test_vector, TOP_ENC) \
+	TEST_EXPAND(triple_des192cbc_test_vector, TOP_DEC) \
+
+#define TEST_EXPAND(t, o)						\
+static int								\
+cpu_crypto_blockcipher_test_##t##_##o(void)				\
+{									\
+	return cpu_crypto_test_blockcipher(&t, o);			\
+}
+
+all_blockcipher_test_cases
+#undef TEST_EXPAND
+
+static struct unit_test_suite security_cpu_crypto_aesni_mb_testsuite  = {
+	.suite_name = "Security CPU Crypto AESNI-MB Unit Test Suite",
+	.setup = testsuite_setup,
+	.teardown = testsuite_teardown,
+	.unit_test_cases = {
+#define TEST_EXPAND(t, o)						\
+	TEST_CASE_ST(ut_setup, ut_teardown,				\
+			cpu_crypto_aead_enc_test_##t##_##o),		\
+	TEST_CASE_ST(ut_setup, ut_teardown,				\
+			cpu_crypto_aead_dec_test_##t##_##o),		\
+
+	all_gcm_unit_test_cases(SGL_ONE_SEG)
+	all_ccm_unit_test_cases
+#undef TEST_EXPAND
+
+#define TEST_EXPAND(t, o)						\
+	TEST_CASE_ST(ut_setup, ut_teardown,				\
+			cpu_crypto_blockcipher_test_##t##_##o),		\
+
+	all_blockcipher_test_cases
+#undef TEST_EXPAND
+
+	TEST_CASES_END() /**< NULL terminate unit test array */
+	},
+};
+
+static int
+test_security_cpu_crypto_aesni_mb(void)
+{
+	gbl_driver_id =	rte_cryptodev_driver_id_get(
+			RTE_STR(CRYPTODEV_NAME_AESNI_MB_PMD));
+
+	return unit_test_suite_runner(&security_cpu_crypto_aesni_mb_testsuite);
+}
+
 REGISTER_TEST_COMMAND(security_aesni_gcm_autotest,
 		test_security_cpu_crypto_aesni_gcm);
 
 REGISTER_TEST_COMMAND(security_aesni_gcm_perftest,
 		test_security_cpu_crypto_aesni_gcm_perf);
+
+REGISTER_TEST_COMMAND(security_aesni_mb_autotest,
+		test_security_cpu_crypto_aesni_mb);
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH v2 07/10] app/test: add aesni_mb security cpu crypto perftest
  2019-10-07 16:28   ` [dpdk-dev] [PATCH v2 " Fan Zhang
                       ` (5 preceding siblings ...)
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 06/10] app/test: add aesni_mb security cpu crypto autotest Fan Zhang
@ 2019-10-07 16:28     ` Fan Zhang
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 08/10] ipsec: add rte_security cpu_crypto action support Fan Zhang
                       ` (2 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Fan Zhang @ 2019-10-07 16:28 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

Since crypto perf application does not support rte_security, this patch
adds a simple AES-CBC-SHA1-HMAC CPU crypto performance test to crypto
unittest application. The test includes different key and data sizes test
with single buffer test items and will display the throughput as well as
cycle count performance information.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 app/test/test_security_cpu_crypto.c | 194 ++++++++++++++++++++++++++++++++++++
 1 file changed, 194 insertions(+)

diff --git a/app/test/test_security_cpu_crypto.c b/app/test/test_security_cpu_crypto.c
index a9853a0c0..c3689d138 100644
--- a/app/test/test_security_cpu_crypto.c
+++ b/app/test/test_security_cpu_crypto.c
@@ -1122,6 +1122,197 @@ test_security_cpu_crypto_aesni_mb(void)
 	return unit_test_suite_runner(&security_cpu_crypto_aesni_mb_testsuite);
 }
 
+static inline void
+switch_blockcipher_enc_to_dec(struct blockcipher_test_data *tdata,
+		struct cpu_crypto_test_case *tcase, uint8_t *dst)
+{
+	memcpy(dst, tcase->seg_buf[0].seg, tcase->seg_buf[0].seg_len);
+	tdata->ciphertext.len = tcase->seg_buf[0].seg_len;
+	memcpy(tdata->digest.data, tcase->digest, tdata->digest.len);
+}
+
+static int
+cpu_crypto_test_blockcipher_perf(
+		const enum rte_crypto_cipher_algorithm cipher_algo,
+		uint32_t cipher_key_sz,
+		const enum rte_crypto_auth_algorithm auth_algo,
+		uint32_t auth_key_sz, uint32_t digest_sz,
+		uint32_t op_mask)
+{
+	struct blockcipher_test_data tdata = {0};
+	uint8_t plaintext[3000], ciphertext[3000];
+	struct cpu_crypto_testsuite_params *ts_params = &testsuite_params;
+	struct cpu_crypto_unittest_params *ut_params = &unittest_params;
+	struct cpu_crypto_test_obj *obj = &ut_params->test_obj;
+	struct cpu_crypto_test_case *tcase;
+	uint64_t hz = rte_get_tsc_hz(), time_start, time_now;
+	double rate, cycles_per_buf;
+	uint32_t test_data_szs[] = {64, 128, 256, 512, 1024, 2048};
+	uint32_t i, j;
+	uint32_t op_mask_opp = 0;
+	int ret;
+
+	if (op_mask & BLOCKCIPHER_TEST_OP_CIPHER)
+		op_mask_opp |= (~op_mask & BLOCKCIPHER_TEST_OP_CIPHER);
+	if (op_mask & BLOCKCIPHER_TEST_OP_AUTH)
+		op_mask_opp |= (~op_mask & BLOCKCIPHER_TEST_OP_AUTH);
+
+	tdata.plaintext.data = plaintext;
+	tdata.ciphertext.data = ciphertext;
+
+	tdata.cipher_key.len = cipher_key_sz;
+	tdata.auth_key.len = auth_key_sz;
+
+	gen_rand(tdata.cipher_key.data, cipher_key_sz / 8);
+	gen_rand(tdata.auth_key.data, auth_key_sz / 8);
+
+	tdata.crypto_algo = cipher_algo;
+	tdata.auth_algo = auth_algo;
+
+	tdata.digest.len = digest_sz;
+
+	ut_params->sess = create_blockcipher_session(ts_params->ctx,
+			ts_params->session_priv_mpool,
+			op_mask,
+			&tdata,
+			0);
+	if (!ut_params->sess)
+		return -1;
+
+	ret = allocate_buf(MAX_NUM_OPS_INFLIGHT);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < RTE_DIM(test_data_szs); i++) {
+		for (j = 0; j < MAX_NUM_OPS_INFLIGHT; j++) {
+			tdata.plaintext.len = test_data_szs[i];
+			gen_rand(plaintext, tdata.plaintext.len);
+
+			tdata.iv.len = 16;
+			gen_rand(tdata.iv.data, tdata.iv.len);
+
+			tcase = ut_params->test_datas[j];
+			ret = assemble_blockcipher_buf(tcase, obj, j,
+					op_mask,
+					&tdata,
+					0);
+			if (ret < 0) {
+				printf("Test is not supported by the driver\n");
+				return ret;
+			}
+		}
+
+		/* warm up cache */
+		for (j = 0; j < CACHE_WARM_ITER; j++)
+			run_test(ts_params->ctx, ut_params->sess, obj,
+					MAX_NUM_OPS_INFLIGHT);
+
+		time_start = rte_rdtsc();
+
+		run_test(ts_params->ctx, ut_params->sess, obj,
+				MAX_NUM_OPS_INFLIGHT);
+
+		time_now = rte_rdtsc();
+
+		rate = time_now - time_start;
+		cycles_per_buf = rate / MAX_NUM_OPS_INFLIGHT;
+
+		rate = ((hz / cycles_per_buf)) / 1000000;
+
+		printf("%s-%u-%s(%4uB) Enc %03.3fMpps (%03.3fGbps) ",
+			rte_crypto_cipher_algorithm_strings[cipher_algo],
+			cipher_key_sz * 8,
+			rte_crypto_auth_algorithm_strings[auth_algo],
+			test_data_szs[i],
+			rate, rate  * test_data_szs[i] * 8 / 1000);
+		printf("cycles per buf %03.3f per byte %03.3f\n",
+			cycles_per_buf, cycles_per_buf / test_data_szs[i]);
+
+		for (j = 0; j < MAX_NUM_OPS_INFLIGHT; j++) {
+			tcase = ut_params->test_datas[j];
+
+			switch_blockcipher_enc_to_dec(&tdata, tcase,
+					ciphertext);
+			ret = assemble_blockcipher_buf(tcase, obj, j,
+					op_mask_opp,
+					&tdata,
+					0);
+			if (ret < 0) {
+				printf("Test is not supported by the driver\n");
+				return ret;
+			}
+		}
+
+		time_start = rte_get_timer_cycles();
+
+		run_test(ts_params->ctx, ut_params->sess, obj,
+				MAX_NUM_OPS_INFLIGHT);
+
+		time_now = rte_get_timer_cycles();
+
+		rate = time_now - time_start;
+		cycles_per_buf = rate / MAX_NUM_OPS_INFLIGHT;
+
+		rate = ((hz / cycles_per_buf)) / 1000000;
+
+		printf("%s-%u-%s(%4uB) Dec %03.3fMpps (%03.3fGbps) ",
+			rte_crypto_cipher_algorithm_strings[cipher_algo],
+			cipher_key_sz * 8,
+			rte_crypto_auth_algorithm_strings[auth_algo],
+			test_data_szs[i],
+			rate, rate  * test_data_szs[i] * 8 / 1000);
+		printf("cycles per buf %03.3f per byte %03.3f\n",
+				cycles_per_buf,
+				cycles_per_buf / test_data_szs[i]);
+	}
+
+	return 0;
+}
+
+/* cipher-algo/cipher-key-len/auth-algo/auth-key-len/digest-len/op */
+#define all_block_cipher_perf_test_cases				\
+	TEST_EXPAND(_AES_CBC, 128, _NULL, 0, 0, TOP_ENC)		\
+	TEST_EXPAND(_NULL, 0, _SHA1_HMAC, 160, 20, TOP_AUTH_GEN)	\
+	TEST_EXPAND(_AES_CBC, 128, _SHA1_HMAC, 160, 20, TOP_ENC_AUTH)
+
+#define TEST_EXPAND(a, b, c, d, e, f)					\
+static int								\
+cpu_crypto_blockcipher_perf##a##_##b##c##_##f(void)			\
+{									\
+	return cpu_crypto_test_blockcipher_perf(RTE_CRYPTO_CIPHER##a,	\
+			b / 8, RTE_CRYPTO_AUTH##c, d / 8, e, f);	\
+}									\
+
+all_block_cipher_perf_test_cases
+#undef TEST_EXPAND
+
+static struct unit_test_suite security_cpu_crypto_aesni_mb_perf_testsuite  = {
+	.suite_name = "Security CPU Crypto AESNI-MB Perf Test Suite",
+	.setup = testsuite_setup,
+	.teardown = testsuite_teardown,
+	.unit_test_cases = {
+#define TEST_EXPAND(a, b, c, d, e, f)					\
+	TEST_CASE_ST(ut_setup, ut_teardown,				\
+		cpu_crypto_blockcipher_perf##a##_##b##c##_##f),	\
+
+	all_block_cipher_perf_test_cases
+#undef TEST_EXPAND
+
+	TEST_CASES_END() /**< NULL terminate unit test array */
+	},
+};
+
+static int
+test_security_cpu_crypto_aesni_mb_perf(void)
+{
+	gbl_driver_id =	rte_cryptodev_driver_id_get(
+			RTE_STR(CRYPTODEV_NAME_AESNI_MB_PMD));
+
+	return unit_test_suite_runner(
+			&security_cpu_crypto_aesni_mb_perf_testsuite);
+}
+
+
 REGISTER_TEST_COMMAND(security_aesni_gcm_autotest,
 		test_security_cpu_crypto_aesni_gcm);
 
@@ -1130,3 +1321,6 @@ REGISTER_TEST_COMMAND(security_aesni_gcm_perftest,
 
 REGISTER_TEST_COMMAND(security_aesni_mb_autotest,
 		test_security_cpu_crypto_aesni_mb);
+
+REGISTER_TEST_COMMAND(security_aesni_mb_perftest,
+		test_security_cpu_crypto_aesni_mb_perf);
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH v2 08/10] ipsec: add rte_security cpu_crypto action support
  2019-10-07 16:28   ` [dpdk-dev] [PATCH v2 " Fan Zhang
                       ` (6 preceding siblings ...)
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 07/10] app/test: add aesni_mb security cpu crypto perftest Fan Zhang
@ 2019-10-07 16:28     ` Fan Zhang
  2019-10-08 23:28       ` Ananyev, Konstantin
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 09/10] examples/ipsec-secgw: add security " Fan Zhang
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 10/10] doc: update security cpu process description Fan Zhang
  9 siblings, 1 reply; 87+ messages in thread
From: Fan Zhang @ 2019-10-07 16:28 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

This patch updates the ipsec library to handle the newly introduced
RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO action.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 lib/librte_ipsec/crypto.h   |  24 +++
 lib/librte_ipsec/esp_inb.c  | 200 ++++++++++++++++++++++--
 lib/librte_ipsec/esp_outb.c | 369 +++++++++++++++++++++++++++++++++++++++++---
 lib/librte_ipsec/sa.c       |  53 ++++++-
 lib/librte_ipsec/sa.h       |  29 ++++
 lib/librte_ipsec/ses.c      |   4 +-
 6 files changed, 643 insertions(+), 36 deletions(-)

diff --git a/lib/librte_ipsec/crypto.h b/lib/librte_ipsec/crypto.h
index f8fbf8d4f..901c8c7de 100644
--- a/lib/librte_ipsec/crypto.h
+++ b/lib/librte_ipsec/crypto.h
@@ -179,4 +179,28 @@ lksd_none_cop_prepare(struct rte_crypto_op *cop,
 	__rte_crypto_sym_op_attach_sym_session(sop, cs);
 }
 
+typedef void* (*_set_icv_f)(void *val, struct rte_mbuf *ml, uint32_t icv_off);
+
+static inline void *
+set_icv_va_pa(void *val, struct rte_mbuf *ml, uint32_t icv_off)
+{
+	union sym_op_data *icv = val;
+
+	icv->va = rte_pktmbuf_mtod_offset(ml, void *, icv_off);
+	icv->pa = rte_pktmbuf_iova_offset(ml, icv_off);
+
+	return icv->va;
+}
+
+static inline void *
+set_icv_va(__rte_unused void *val, __rte_unused struct rte_mbuf *ml,
+		__rte_unused uint32_t icv_off)
+{
+	void **icv_va = val;
+
+	*icv_va = rte_pktmbuf_mtod_offset(ml, void *, icv_off);
+
+	return *icv_va;
+}
+
 #endif /* _CRYPTO_H_ */
diff --git a/lib/librte_ipsec/esp_inb.c b/lib/librte_ipsec/esp_inb.c
index 8e3ecbc64..c4476e819 100644
--- a/lib/librte_ipsec/esp_inb.c
+++ b/lib/librte_ipsec/esp_inb.c
@@ -105,6 +105,78 @@ inb_cop_prepare(struct rte_crypto_op *cop,
 	}
 }
 
+static inline int
+inb_cpu_crypto_proc_prepare(const struct rte_ipsec_sa *sa, struct rte_mbuf *mb,
+	uint32_t pofs, uint32_t plen,
+	struct rte_security_vec *buf, struct iovec *cur_vec,
+	void *iv)
+{
+	struct rte_mbuf *ms;
+	struct iovec *vec = cur_vec;
+	struct aead_gcm_iv *gcm;
+	struct aesctr_cnt_blk *ctr;
+	uint64_t *ivp;
+	uint32_t algo;
+	uint32_t left;
+	uint32_t off = 0, n_seg = 0;
+
+	ivp = rte_pktmbuf_mtod_offset(mb, uint64_t *,
+		pofs + sizeof(struct rte_esp_hdr));
+	algo = sa->algo_type;
+
+	switch (algo) {
+	case ALGO_TYPE_AES_GCM:
+		gcm = (struct aead_gcm_iv *)iv;
+		aead_gcm_iv_fill(gcm, ivp[0], sa->salt);
+		off = sa->ctp.cipher.offset + pofs;
+		left = plen - sa->ctp.cipher.length;
+		break;
+	case ALGO_TYPE_AES_CBC:
+	case ALGO_TYPE_3DES_CBC:
+		copy_iv(iv, ivp, sa->iv_len);
+		off = sa->ctp.auth.offset + pofs;
+		left = plen - sa->ctp.auth.length;
+		break;
+	case ALGO_TYPE_AES_CTR:
+		copy_iv(iv, ivp, sa->iv_len);
+		off = sa->ctp.auth.offset + pofs;
+		left = plen - sa->ctp.auth.length;
+		ctr = (struct aesctr_cnt_blk *)iv;
+		aes_ctr_cnt_blk_fill(ctr, ivp[0], sa->salt);
+		break;
+	case ALGO_TYPE_NULL:
+		left = plen - sa->ctp.cipher.length;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	ms = mbuf_get_seg_ofs(mb, &off);
+	if (!ms)
+		return -1;
+
+	while (n_seg < RTE_LIBRTE_IP_FRAG_MAX_FRAG && left && ms) {
+		uint32_t len = RTE_MIN(left, ms->data_len - off);
+
+		vec->iov_base = rte_pktmbuf_mtod_offset(ms, void *, off);
+		vec->iov_len = len;
+
+		left -= len;
+		vec++;
+		n_seg++;
+		ms = ms->next;
+		off = 0;
+	}
+
+	if (left)
+		return -1;
+
+	buf->vec = cur_vec;
+	buf->num = n_seg;
+
+	return n_seg;
+}
+
 /*
  * Helper function for prepare() to deal with situation when
  * ICV is spread by two segments. Tries to move ICV completely into the
@@ -139,20 +211,21 @@ move_icv(struct rte_mbuf *ml, uint32_t ofs)
  */
 static inline void
 inb_pkt_xprepare(const struct rte_ipsec_sa *sa, rte_be64_t sqc,
-	const union sym_op_data *icv)
+	uint8_t *icv_va, void *aad_buf, uint32_t aad_off)
 {
 	struct aead_gcm_aad *aad;
 
 	/* insert SQN.hi between ESP trailer and ICV */
 	if (sa->sqh_len != 0)
-		insert_sqh(sqn_hi32(sqc), icv->va, sa->icv_len);
+		insert_sqh(sqn_hi32(sqc), icv_va, sa->icv_len);
 
 	/*
 	 * fill AAD fields, if any (aad fields are placed after icv),
 	 * right now we support only one AEAD algorithm: AES-GCM.
 	 */
 	if (sa->aad_len != 0) {
-		aad = (struct aead_gcm_aad *)(icv->va + sa->icv_len);
+		aad = aad_buf ? aad_buf :
+				(struct aead_gcm_aad *)(icv_va + aad_off);
 		aead_gcm_aad_fill(aad, sa->spi, sqc, IS_ESN(sa));
 	}
 }
@@ -162,13 +235,15 @@ inb_pkt_xprepare(const struct rte_ipsec_sa *sa, rte_be64_t sqc,
  */
 static inline int32_t
 inb_pkt_prepare(const struct rte_ipsec_sa *sa, const struct replay_sqn *rsn,
-	struct rte_mbuf *mb, uint32_t hlen, union sym_op_data *icv)
+	struct rte_mbuf *mb, uint32_t hlen, _set_icv_f set_icv, void *icv_val,
+	void *aad_buf)
 {
 	int32_t rc;
 	uint64_t sqn;
 	uint32_t clen, icv_len, icv_ofs, plen;
 	struct rte_mbuf *ml;
 	struct rte_esp_hdr *esph;
+	void *icv_va;
 
 	esph = rte_pktmbuf_mtod_offset(mb, struct rte_esp_hdr *, hlen);
 
@@ -226,8 +301,8 @@ inb_pkt_prepare(const struct rte_ipsec_sa *sa, const struct replay_sqn *rsn,
 	if (sa->aad_len + sa->sqh_len > rte_pktmbuf_tailroom(ml))
 		return -ENOSPC;
 
-	icv->va = rte_pktmbuf_mtod_offset(ml, void *, icv_ofs);
-	icv->pa = rte_pktmbuf_iova_offset(ml, icv_ofs);
+	icv_va = set_icv(icv_val, ml, icv_ofs);
+	inb_pkt_xprepare(sa, sqn, icv_va, aad_buf, sa->icv_len);
 
 	/*
 	 * if esn is used then high-order 32 bits are also used in ICV
@@ -238,7 +313,6 @@ inb_pkt_prepare(const struct rte_ipsec_sa *sa, const struct replay_sqn *rsn,
 	mb->pkt_len += sa->sqh_len;
 	ml->data_len += sa->sqh_len;
 
-	inb_pkt_xprepare(sa, sqn, icv);
 	return plen;
 }
 
@@ -265,7 +339,8 @@ esp_inb_pkt_prepare(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
 	for (i = 0; i != num; i++) {
 
 		hl = mb[i]->l2_len + mb[i]->l3_len;
-		rc = inb_pkt_prepare(sa, rsn, mb[i], hl, &icv);
+		rc = inb_pkt_prepare(sa, rsn, mb[i], hl, set_icv_va_pa,
+				(void *)&icv, NULL);
 		if (rc >= 0) {
 			lksd_none_cop_prepare(cop[k], cs, mb[i]);
 			inb_cop_prepare(cop[k], sa, mb[i], &icv, hl, rc);
@@ -512,7 +587,6 @@ tun_process(const struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
 	return k;
 }
 
-
 /*
  * *process* function for tunnel packets
  */
@@ -625,6 +699,114 @@ esp_inb_pkt_process(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
 	return n;
 }
 
+/*
+ * process packets using sync crypto engine
+ */
+static uint16_t
+esp_inb_cpu_crypto_pkt_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num,
+		esp_inb_process_t process)
+{
+	int32_t rc;
+	uint32_t i, hl, n, p;
+	struct rte_ipsec_sa *sa;
+	struct replay_sqn *rsn;
+	void *icv_va;
+	uint32_t sqn[num];
+	uint32_t dr[num];
+	uint8_t sqh_len;
+
+	/* cpu crypto specific variables */
+	struct rte_security_vec buf[num];
+	struct iovec vec[RTE_LIBRTE_IP_FRAG_MAX_FRAG * num];
+	uint32_t vec_idx = 0;
+	uint64_t iv_buf[num][IPSEC_MAX_IV_QWORD];
+	void *iv[num];
+	int status[num];
+	uint8_t *aad_buf[num][sizeof(struct aead_gcm_aad)];
+	void *aad[num];
+	void *digest[num];
+	uint32_t k;
+
+	sa = ss->sa;
+	rsn = rsn_acquire(sa);
+	sqh_len = sa->sqh_len;
+
+	k = 0;
+	for (i = 0; i != num; i++) {
+		hl = mb[i]->l2_len + mb[i]->l3_len;
+		rc = inb_pkt_prepare(sa, rsn, mb[i], hl, set_icv_va,
+				(void *)&icv_va, (void *)aad_buf[k]);
+		if (rc >= 0) {
+			iv[k] = (void *)iv_buf[k];
+			aad[k] = (void *)aad_buf[k];
+			digest[k] = (void *)icv_va;
+
+			rc = inb_cpu_crypto_proc_prepare(sa, mb[i], hl,
+					rc, &buf[k], &vec[vec_idx], iv[k]);
+			if (rc < 0) {
+				dr[i - k] = i;
+				continue;
+			}
+
+			vec_idx += rc;
+			k++;
+		} else
+			dr[i - k] = i;
+	}
+
+	/* copy not prepared mbufs beyond good ones */
+	if (k != num) {
+		rte_errno = EBADMSG;
+
+		if (unlikely(k == 0))
+			return 0;
+
+		move_bad_mbufs(mb, dr, num, num - k);
+	}
+
+	/* process the packets */
+	n = 0;
+	rc = rte_security_process_cpu_crypto_bulk(ss->security.ctx,
+			ss->security.ses, buf, iv, aad, digest, status, k);
+	/* move failed process packets to dr */
+	for (i = 0; i < k; i++) {
+		if (status[i]) {
+			dr[n++] = i;
+			rte_errno = EBADMSG;
+		}
+	}
+
+	/* move bad packets to the back */
+	if (n)
+		move_bad_mbufs(mb, dr, k, n);
+
+	/* process packets */
+	p = process(sa, mb, sqn, dr, k - n, sqh_len);
+
+	if (p != k - n && p != 0)
+		move_bad_mbufs(mb, dr, k - n, k - n - p);
+
+	if (p != num)
+		rte_errno = EBADMSG;
+
+	return p;
+}
+
+uint16_t
+esp_inb_tun_cpu_crypto_pkt_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
+{
+	return esp_inb_cpu_crypto_pkt_process(ss, mb, num, tun_process);
+}
+
+uint16_t
+esp_inb_trs_cpu_crypto_pkt_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
+{
+	return esp_inb_cpu_crypto_pkt_process(ss, mb, num, trs_process);
+}
+
 /*
  * process group of ESP inbound tunnel packets.
  */
diff --git a/lib/librte_ipsec/esp_outb.c b/lib/librte_ipsec/esp_outb.c
index 55799a867..ecfc4cd3f 100644
--- a/lib/librte_ipsec/esp_outb.c
+++ b/lib/librte_ipsec/esp_outb.c
@@ -104,7 +104,7 @@ outb_cop_prepare(struct rte_crypto_op *cop,
 static inline int32_t
 outb_tun_pkt_prepare(struct rte_ipsec_sa *sa, rte_be64_t sqc,
 	const uint64_t ivp[IPSEC_MAX_IV_QWORD], struct rte_mbuf *mb,
-	union sym_op_data *icv, uint8_t sqh_len)
+	_set_icv_f set_icv, void *icv_val, uint8_t sqh_len)
 {
 	uint32_t clen, hlen, l2len, pdlen, pdofs, plen, tlen;
 	struct rte_mbuf *ml;
@@ -177,8 +177,8 @@ outb_tun_pkt_prepare(struct rte_ipsec_sa *sa, rte_be64_t sqc,
 	espt->pad_len = pdlen;
 	espt->next_proto = sa->proto;
 
-	icv->va = rte_pktmbuf_mtod_offset(ml, void *, pdofs);
-	icv->pa = rte_pktmbuf_iova_offset(ml, pdofs);
+	/* set icv va/pa value(s) */
+	set_icv(icv_val, ml, pdofs);
 
 	return clen;
 }
@@ -189,14 +189,14 @@ outb_tun_pkt_prepare(struct rte_ipsec_sa *sa, rte_be64_t sqc,
  */
 static inline void
 outb_pkt_xprepare(const struct rte_ipsec_sa *sa, rte_be64_t sqc,
-	const union sym_op_data *icv)
+	uint8_t *icv_va, void *aad_buf)
 {
 	uint32_t *psqh;
 	struct aead_gcm_aad *aad;
 
 	/* insert SQN.hi between ESP trailer and ICV */
 	if (sa->sqh_len != 0) {
-		psqh = (uint32_t *)(icv->va - sa->sqh_len);
+		psqh = (uint32_t *)(icv_va - sa->sqh_len);
 		psqh[0] = sqn_hi32(sqc);
 	}
 
@@ -205,7 +205,7 @@ outb_pkt_xprepare(const struct rte_ipsec_sa *sa, rte_be64_t sqc,
 	 * right now we support only one AEAD algorithm: AES-GCM .
 	 */
 	if (sa->aad_len != 0) {
-		aad = (struct aead_gcm_aad *)(icv->va + sa->icv_len);
+		aad = aad_buf;
 		aead_gcm_aad_fill(aad, sa->spi, sqc, IS_ESN(sa));
 	}
 }
@@ -242,11 +242,12 @@ esp_outb_tun_prepare(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
 		gen_iv(iv, sqc);
 
 		/* try to update the packet itself */
-		rc = outb_tun_pkt_prepare(sa, sqc, iv, mb[i], &icv,
-					  sa->sqh_len);
+		rc = outb_tun_pkt_prepare(sa, sqc, iv, mb[i], set_icv_va_pa,
+				(void *)&icv, sa->sqh_len);
 		/* success, setup crypto op */
 		if (rc >= 0) {
-			outb_pkt_xprepare(sa, sqc, &icv);
+			outb_pkt_xprepare(sa, sqc, icv.va,
+					(void *)(icv.va + sa->icv_len));
 			lksd_none_cop_prepare(cop[k], cs, mb[i]);
 			outb_cop_prepare(cop[k], sa, iv, &icv, 0, rc);
 			k++;
@@ -270,7 +271,7 @@ esp_outb_tun_prepare(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
 static inline int32_t
 outb_trs_pkt_prepare(struct rte_ipsec_sa *sa, rte_be64_t sqc,
 	const uint64_t ivp[IPSEC_MAX_IV_QWORD], struct rte_mbuf *mb,
-	uint32_t l2len, uint32_t l3len, union sym_op_data *icv,
+	uint32_t l2len, uint32_t l3len, _set_icv_f set_icv, void *icv_val,
 	uint8_t sqh_len)
 {
 	uint8_t np;
@@ -340,8 +341,7 @@ outb_trs_pkt_prepare(struct rte_ipsec_sa *sa, rte_be64_t sqc,
 	espt->pad_len = pdlen;
 	espt->next_proto = np;
 
-	icv->va = rte_pktmbuf_mtod_offset(ml, void *, pdofs);
-	icv->pa = rte_pktmbuf_iova_offset(ml, pdofs);
+	set_icv(icv_val, ml, pdofs);
 
 	return clen;
 }
@@ -381,11 +381,12 @@ esp_outb_trs_prepare(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
 		gen_iv(iv, sqc);
 
 		/* try to update the packet itself */
-		rc = outb_trs_pkt_prepare(sa, sqc, iv, mb[i], l2, l3, &icv,
-					  sa->sqh_len);
+		rc = outb_trs_pkt_prepare(sa, sqc, iv, mb[i], l2, l3,
+				set_icv_va_pa, (void *)&icv, sa->sqh_len);
 		/* success, setup crypto op */
 		if (rc >= 0) {
-			outb_pkt_xprepare(sa, sqc, &icv);
+			outb_pkt_xprepare(sa, sqc, icv.va,
+					(void *)(icv.va + sa->icv_len));
 			lksd_none_cop_prepare(cop[k], cs, mb[i]);
 			outb_cop_prepare(cop[k], sa, iv, &icv, l2 + l3, rc);
 			k++;
@@ -403,6 +404,335 @@ esp_outb_trs_prepare(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
 	return k;
 }
 
+
+static inline int
+outb_cpu_crypto_proc_prepare(struct rte_mbuf *m, const struct rte_ipsec_sa *sa,
+		uint32_t hlen, uint32_t plen,
+		struct rte_security_vec *buf, struct iovec *cur_vec, void *iv)
+{
+	struct rte_mbuf *ms;
+	uint64_t *ivp = iv;
+	struct aead_gcm_iv *gcm;
+	struct aesctr_cnt_blk *ctr;
+	struct iovec *vec = cur_vec;
+	uint32_t left;
+	uint32_t off = 0;
+	uint32_t n_seg = 0;
+	uint32_t algo;
+
+	algo = sa->algo_type;
+
+	switch (algo) {
+	case ALGO_TYPE_AES_GCM:
+		gcm = iv;
+		aead_gcm_iv_fill(gcm, ivp[0], sa->salt);
+		off = sa->ctp.cipher.offset + hlen;
+		left = sa->ctp.cipher.length + plen;
+		break;
+	case ALGO_TYPE_AES_CBC:
+	case ALGO_TYPE_3DES_CBC:
+		off = sa->ctp.auth.offset + hlen;
+		left = sa->ctp.auth.length + plen;
+		break;
+	case ALGO_TYPE_AES_CTR:
+		off = sa->ctp.auth.offset + hlen;
+		left = sa->ctp.auth.length + plen;
+		ctr = iv;
+		aes_ctr_cnt_blk_fill(ctr, ivp[0], sa->salt);
+		break;
+	case ALGO_TYPE_NULL:
+		left = sa->ctp.cipher.length + plen;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	ms = mbuf_get_seg_ofs(m, &off);
+	if (!ms)
+		return -1;
+
+	while (n_seg < m->nb_segs && left && ms) {
+		uint32_t len = RTE_MIN(left, ms->data_len - off);
+
+		vec->iov_base = rte_pktmbuf_mtod_offset(ms, void *, off);
+		vec->iov_len = len;
+
+		left -= len;
+		vec++;
+		n_seg++;
+		ms = ms->next;
+		off = 0;
+	}
+
+	if (left)
+		return -1;
+
+	buf->vec = cur_vec;
+	buf->num = n_seg;
+
+	return n_seg;
+}
+
+static uint16_t
+esp_outb_tun_cpu_crypto_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
+{
+	uint64_t sqn;
+	rte_be64_t sqc;
+	struct rte_ipsec_sa *sa;
+	struct rte_security_ctx *ctx;
+	struct rte_security_session *rss;
+	void *icv_va;
+	uint32_t dr[num];
+	uint32_t i, n;
+	int32_t rc;
+
+	/* cpu crypto specific variables */
+	struct rte_security_vec buf[num];
+	struct iovec vec[RTE_LIBRTE_IP_FRAG_MAX_FRAG * num];
+	uint32_t vec_idx = 0;
+	uint64_t iv_buf[num][IPSEC_MAX_IV_QWORD];
+	void *iv[num];
+	int status[num];
+	uint8_t *aad_buf[num][sizeof(struct aead_gcm_aad)];
+	void *aad[num];
+	void *digest[num];
+	uint32_t k;
+
+	sa = ss->sa;
+	ctx = ss->security.ctx;
+	rss = ss->security.ses;
+
+	k = 0;
+	n = num;
+	sqn = esn_outb_update_sqn(sa, &n);
+	if (n != num)
+		rte_errno = EOVERFLOW;
+
+	for (i = 0; i != n; i++) {
+		sqc = rte_cpu_to_be_64(sqn + i);
+		gen_iv(iv_buf[k], sqc);
+
+		/* try to update the packet itself */
+		rc = outb_tun_pkt_prepare(sa, sqc, iv_buf[k], mb[i], set_icv_va,
+				(void *)&icv_va, sa->sqh_len);
+
+		/* success, setup crypto op */
+		if (rc >= 0) {
+			iv[k] = (void *)iv_buf[k];
+			aad[k] = (void *)aad_buf[k];
+			digest[k] = (void *)icv_va;
+
+			outb_pkt_xprepare(sa, sqc, icv_va, aad[k]);
+
+			rc = outb_cpu_crypto_proc_prepare(mb[i], sa,
+					0, rc, &buf[k], &vec[vec_idx], iv[k]);
+			if (rc < 0) {
+				dr[i - k] = i;
+				rte_errno = -rc;
+				continue;
+			}
+
+			vec_idx += rc;
+			k++;
+		/* failure, put packet into the death-row */
+		} else {
+			dr[i - k] = i;
+			rte_errno = -rc;
+		}
+	}
+
+	 /* copy not prepared mbufs beyond good ones */
+	if (k != n && k != 0)
+		move_bad_mbufs(mb, dr, n, n - k);
+
+	if (unlikely(k == 0)) {
+		rte_errno = EBADMSG;
+		return 0;
+	}
+
+	/* process the packets */
+	n = 0;
+	rc = rte_security_process_cpu_crypto_bulk(ctx, rss, buf, iv, aad,
+			digest, status, k);
+	/* move failed process packets to dr */
+	if (rc < 0)
+		for (i = 0; i < n; i++) {
+			if (status[i])
+				dr[n++] = i;
+		}
+
+	if (n)
+		move_bad_mbufs(mb, dr, k, n);
+
+	return k - n;
+}
+
+static uint16_t
+esp_outb_trs_cpu_crypto_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
+
+{
+	uint64_t sqn;
+	rte_be64_t sqc;
+	struct rte_ipsec_sa *sa;
+	struct rte_security_ctx *ctx;
+	struct rte_security_session *rss;
+	void *icv_va;
+	uint32_t dr[num];
+	uint32_t i, n;
+	uint32_t l2, l3;
+	int32_t rc;
+
+	/* cpu crypto specific variables */
+	struct rte_security_vec buf[num];
+	struct iovec vec[RTE_LIBRTE_IP_FRAG_MAX_FRAG * num];
+	uint32_t vec_idx = 0;
+	uint64_t iv_buf[num][IPSEC_MAX_IV_QWORD];
+	void *iv[num];
+	int status[num];
+	uint8_t *aad_buf[num][sizeof(struct aead_gcm_aad)];
+	void *aad[num];
+	void *digest[num];
+	uint32_t k;
+
+	sa = ss->sa;
+	ctx = ss->security.ctx;
+	rss = ss->security.ses;
+
+	k = 0;
+	n = num;
+	sqn = esn_outb_update_sqn(sa, &n);
+	if (n != num)
+		rte_errno = EOVERFLOW;
+
+	for (i = 0; i != n; i++) {
+		l2 = mb[i]->l2_len;
+		l3 = mb[i]->l3_len;
+
+		sqc = rte_cpu_to_be_64(sqn + i);
+		gen_iv(iv_buf[k], sqc);
+
+		/* try to update the packet itself */
+		rc = outb_trs_pkt_prepare(sa, sqc, iv_buf[k], mb[i], l2, l3,
+				set_icv_va, (void *)&icv_va, sa->sqh_len);
+
+		/* success, setup crypto op */
+		if (rc >= 0) {
+			iv[k] = (void *)iv_buf[k];
+			aad[k] = (void *)aad_buf[k];
+			digest[k] = (void *)icv_va;
+
+			outb_pkt_xprepare(sa, sqc, icv_va, aad[k]);
+
+			rc = outb_cpu_crypto_proc_prepare(mb[i], sa,
+					l2 + l3, rc, &buf[k], &vec[vec_idx],
+					iv[k]);
+			if (rc < 0) {
+				dr[i - k] = i;
+				rte_errno = -rc;
+				continue;
+			}
+
+			vec_idx += rc;
+			k++;
+		/* failure, put packet into the death-row */
+		} else {
+			dr[i - k] = i;
+			rte_errno = -rc;
+		}
+	}
+
+	 /* copy not prepared mbufs beyond good ones */
+	if (k != n && k != 0)
+		move_bad_mbufs(mb, dr, n, n - k);
+
+	/* process the packets */
+	n = 0;
+	rc = rte_security_process_cpu_crypto_bulk(ctx, rss, buf, iv, aad,
+			digest, status, k);
+	/* move failed process packets to dr */
+	if (rc < 0)
+		for (i = 0; i < k; i++) {
+			if (status[i])
+				dr[n++] = i;
+		}
+
+	if (n)
+		move_bad_mbufs(mb, dr, k, n);
+
+	return k - n;
+}
+
+uint16_t
+esp_outb_tun_cpu_crypto_sqh_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
+{
+	struct rte_ipsec_sa *sa = ss->sa;
+	uint32_t icv_len;
+	void *icv;
+	uint16_t n;
+	uint16_t i;
+
+	n = esp_outb_tun_cpu_crypto_process(ss, mb, num);
+
+	icv_len = sa->icv_len;
+
+	for (i = 0; i < n; i++) {
+		struct rte_mbuf *ml = rte_pktmbuf_lastseg(mb[i]);
+
+		mb[i]->pkt_len -= sa->sqh_len;
+		ml->data_len -= sa->sqh_len;
+
+		icv = rte_pktmbuf_mtod_offset(ml, void *,
+				ml->data_len - icv_len);
+		remove_sqh(icv, sa->icv_len);
+	}
+
+	return n;
+}
+
+uint16_t
+esp_outb_tun_cpu_crypto_flag_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
+{
+	return esp_outb_tun_cpu_crypto_process(ss, mb, num);
+}
+
+uint16_t
+esp_outb_trs_cpu_crypto_sqh_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
+{
+	struct rte_ipsec_sa *sa = ss->sa;
+	uint32_t icv_len;
+	void *icv;
+	uint16_t n;
+	uint16_t i;
+
+	n = esp_outb_trs_cpu_crypto_process(ss, mb, num);
+	icv_len = sa->icv_len;
+
+	for (i = 0; i < n; i++) {
+		struct rte_mbuf *ml = rte_pktmbuf_lastseg(mb[i]);
+
+		mb[i]->pkt_len -= sa->sqh_len;
+		ml->data_len -= sa->sqh_len;
+
+		icv = rte_pktmbuf_mtod_offset(ml, void *,
+				ml->data_len - icv_len);
+		remove_sqh(icv, sa->icv_len);
+	}
+
+	return n;
+}
+
+uint16_t
+esp_outb_trs_cpu_crypto_flag_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
+{
+	return esp_outb_trs_cpu_crypto_process(ss, mb, num);
+}
+
 /*
  * process outbound packets for SA with ESN support,
  * for algorithms that require SQN.hibits to be implictly included
@@ -410,8 +740,8 @@ esp_outb_trs_prepare(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
  * In that case we have to move ICV bytes back to their proper place.
  */
 uint16_t
-esp_outb_sqh_process(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
-	uint16_t num)
+esp_outb_sqh_process(const struct rte_ipsec_session *ss,
+	struct rte_mbuf *mb[], uint16_t num)
 {
 	uint32_t i, k, icv_len, *icv;
 	struct rte_mbuf *ml;
@@ -498,7 +828,8 @@ inline_outb_tun_pkt_process(const struct rte_ipsec_session *ss,
 		gen_iv(iv, sqc);
 
 		/* try to update the packet itself */
-		rc = outb_tun_pkt_prepare(sa, sqc, iv, mb[i], &icv, 0);
+		rc = outb_tun_pkt_prepare(sa, sqc, iv, mb[i], set_icv_va_pa,
+				(void *)&icv, 0);
 
 		k += (rc >= 0);
 
@@ -552,7 +883,7 @@ inline_outb_trs_pkt_process(const struct rte_ipsec_session *ss,
 
 		/* try to update the packet itself */
 		rc = outb_trs_pkt_prepare(sa, sqc, iv, mb[i],
-				l2, l3, &icv, 0);
+				l2, l3, set_icv_va_pa, (void *)&icv, 0);
 
 		k += (rc >= 0);
 
diff --git a/lib/librte_ipsec/sa.c b/lib/librte_ipsec/sa.c
index 23d394b46..b8d55a1c7 100644
--- a/lib/librte_ipsec/sa.c
+++ b/lib/librte_ipsec/sa.c
@@ -544,9 +544,9 @@ lksd_proto_prepare(const struct rte_ipsec_session *ss,
  * - inbound/outbound for RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
  * - outbound for RTE_SECURITY_ACTION_TYPE_NONE when ESN is disabled
  */
-static uint16_t
-pkt_flag_process(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
-	uint16_t num)
+uint16_t
+esp_outb_pkt_flag_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num)
 {
 	uint32_t i, k;
 	uint32_t dr[num];
@@ -599,12 +599,48 @@ lksd_none_pkt_func_select(const struct rte_ipsec_sa *sa,
 	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TUNLV6):
 		pf->prepare = esp_outb_tun_prepare;
 		pf->process = (sa->sqh_len != 0) ?
-			esp_outb_sqh_process : pkt_flag_process;
+			esp_outb_sqh_process : esp_outb_pkt_flag_process;
 		break;
 	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TRANS):
 		pf->prepare = esp_outb_trs_prepare;
 		pf->process = (sa->sqh_len != 0) ?
-			esp_outb_sqh_process : pkt_flag_process;
+			esp_outb_sqh_process : esp_outb_pkt_flag_process;
+		break;
+	default:
+		rc = -ENOTSUP;
+	}
+
+	return rc;
+}
+
+static int
+cpu_crypto_pkt_func_select(const struct rte_ipsec_sa *sa,
+		struct rte_ipsec_sa_pkt_func *pf)
+{
+	int32_t rc;
+
+	static const uint64_t msk = RTE_IPSEC_SATP_DIR_MASK |
+			RTE_IPSEC_SATP_MODE_MASK;
+
+	rc = 0;
+	switch (sa->type & msk) {
+	case (RTE_IPSEC_SATP_DIR_IB | RTE_IPSEC_SATP_MODE_TUNLV4):
+	case (RTE_IPSEC_SATP_DIR_IB | RTE_IPSEC_SATP_MODE_TUNLV6):
+		pf->process = esp_inb_tun_cpu_crypto_pkt_process;
+		break;
+	case (RTE_IPSEC_SATP_DIR_IB | RTE_IPSEC_SATP_MODE_TRANS):
+		pf->process = esp_inb_trs_cpu_crypto_pkt_process;
+		break;
+	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TUNLV4):
+	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TUNLV6):
+		pf->process = (sa->sqh_len != 0) ?
+			esp_outb_tun_cpu_crypto_sqh_process :
+			esp_outb_tun_cpu_crypto_flag_process;
+		break;
+	case (RTE_IPSEC_SATP_DIR_OB | RTE_IPSEC_SATP_MODE_TRANS):
+		pf->process = (sa->sqh_len != 0) ?
+			esp_outb_trs_cpu_crypto_sqh_process :
+			esp_outb_trs_cpu_crypto_flag_process;
 		break;
 	default:
 		rc = -ENOTSUP;
@@ -672,13 +708,16 @@ ipsec_sa_pkt_func_select(const struct rte_ipsec_session *ss,
 	case RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL:
 		if ((sa->type & RTE_IPSEC_SATP_DIR_MASK) ==
 				RTE_IPSEC_SATP_DIR_IB)
-			pf->process = pkt_flag_process;
+			pf->process = esp_outb_pkt_flag_process;
 		else
 			pf->process = inline_proto_outb_pkt_process;
 		break;
 	case RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL:
 		pf->prepare = lksd_proto_prepare;
-		pf->process = pkt_flag_process;
+		pf->process = esp_outb_pkt_flag_process;
+		break;
+	case RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO:
+		rc = cpu_crypto_pkt_func_select(sa, pf);
 		break;
 	default:
 		rc = -ENOTSUP;
diff --git a/lib/librte_ipsec/sa.h b/lib/librte_ipsec/sa.h
index 51e69ad05..770d36b8b 100644
--- a/lib/librte_ipsec/sa.h
+++ b/lib/librte_ipsec/sa.h
@@ -156,6 +156,14 @@ uint16_t
 inline_inb_trs_pkt_process(const struct rte_ipsec_session *ss,
 	struct rte_mbuf *mb[], uint16_t num);
 
+uint16_t
+esp_inb_tun_cpu_crypto_pkt_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num);
+
+uint16_t
+esp_inb_trs_cpu_crypto_pkt_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num);
+
 /* outbound processing */
 
 uint16_t
@@ -170,6 +178,10 @@ uint16_t
 esp_outb_sqh_process(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
 	uint16_t num);
 
+uint16_t
+esp_outb_pkt_flag_process(const struct rte_ipsec_session *ss,
+	struct rte_mbuf *mb[], uint16_t num);
+
 uint16_t
 inline_outb_tun_pkt_process(const struct rte_ipsec_session *ss,
 	struct rte_mbuf *mb[], uint16_t num);
@@ -182,4 +194,21 @@ uint16_t
 inline_proto_outb_pkt_process(const struct rte_ipsec_session *ss,
 	struct rte_mbuf *mb[], uint16_t num);
 
+uint16_t
+esp_outb_tun_cpu_crypto_sqh_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num);
+
+uint16_t
+esp_outb_tun_cpu_crypto_flag_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num);
+
+uint16_t
+esp_outb_trs_cpu_crypto_sqh_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num);
+
+uint16_t
+esp_outb_trs_cpu_crypto_flag_process(const struct rte_ipsec_session *ss,
+		struct rte_mbuf *mb[], uint16_t num);
+
+
 #endif /* _SA_H_ */
diff --git a/lib/librte_ipsec/ses.c b/lib/librte_ipsec/ses.c
index 82c765a33..eaa8c17b7 100644
--- a/lib/librte_ipsec/ses.c
+++ b/lib/librte_ipsec/ses.c
@@ -19,7 +19,9 @@ session_check(struct rte_ipsec_session *ss)
 			return -EINVAL;
 		if ((ss->type == RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO ||
 				ss->type ==
-				RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL) &&
+				RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL ||
+				ss->type ==
+				RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO) &&
 				ss->security.ctx == NULL)
 			return -EINVAL;
 	}
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH v2 09/10] examples/ipsec-secgw: add security cpu_crypto action support
  2019-10-07 16:28   ` [dpdk-dev] [PATCH v2 " Fan Zhang
                       ` (7 preceding siblings ...)
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 08/10] ipsec: add rte_security cpu_crypto action support Fan Zhang
@ 2019-10-07 16:28     ` Fan Zhang
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 10/10] doc: update security cpu process description Fan Zhang
  9 siblings, 0 replies; 87+ messages in thread
From: Fan Zhang @ 2019-10-07 16:28 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

Since ipsec library is added cpu_crypto security action type support,
this patch updates ipsec-secgw sample application with added action type
"cpu-crypto". The patch also includes a number of test scripts to
prove the correctness of the implementation.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 examples/ipsec-secgw/ipsec.c                       | 35 ++++++++++++++++++++++
 examples/ipsec-secgw/ipsec_process.c               |  7 +++--
 examples/ipsec-secgw/sa.c                          | 13 ++++++--
 examples/ipsec-secgw/test/run_test.sh              | 10 +++++++
 .../test/trs_3descbc_sha1_common_defs.sh           |  8 ++---
 .../test/trs_3descbc_sha1_cpu_crypto_defs.sh       |  5 ++++
 .../test/trs_aescbc_sha1_common_defs.sh            |  8 ++---
 .../test/trs_aescbc_sha1_cpu_crypto_defs.sh        |  5 ++++
 .../test/trs_aesctr_sha1_common_defs.sh            |  8 ++---
 .../test/trs_aesctr_sha1_cpu_crypto_defs.sh        |  5 ++++
 .../ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh |  5 ++++
 .../test/trs_aesgcm_mb_cpu_crypto_defs.sh          |  7 +++++
 .../test/tun_3descbc_sha1_common_defs.sh           |  8 ++---
 .../test/tun_3descbc_sha1_cpu_crypto_defs.sh       |  5 ++++
 .../test/tun_aescbc_sha1_common_defs.sh            |  8 ++---
 .../test/tun_aescbc_sha1_cpu_crypto_defs.sh        |  5 ++++
 .../test/tun_aesctr_sha1_common_defs.sh            |  8 ++---
 .../test/tun_aesctr_sha1_cpu_crypto_defs.sh        |  5 ++++
 .../ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh |  5 ++++
 .../test/tun_aesgcm_mb_cpu_crypto_defs.sh          |  7 +++++
 20 files changed, 138 insertions(+), 29 deletions(-)
 create mode 100644 examples/ipsec-secgw/test/trs_3descbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aescbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aesctr_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/trs_aesgcm_mb_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_3descbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aescbc_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aesctr_sha1_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh
 create mode 100644 examples/ipsec-secgw/test/tun_aesgcm_mb_cpu_crypto_defs.sh

diff --git a/examples/ipsec-secgw/ipsec.c b/examples/ipsec-secgw/ipsec.c
index 1145ca1c0..02b9443a8 100644
--- a/examples/ipsec-secgw/ipsec.c
+++ b/examples/ipsec-secgw/ipsec.c
@@ -10,6 +10,7 @@
 #include <rte_crypto.h>
 #include <rte_security.h>
 #include <rte_cryptodev.h>
+#include <rte_ipsec.h>
 #include <rte_ethdev.h>
 #include <rte_mbuf.h>
 #include <rte_hash.h>
@@ -51,6 +52,19 @@ set_ipsec_conf(struct ipsec_sa *sa, struct rte_security_ipsec_xform *ipsec)
 	ipsec->esn_soft_limit = IPSEC_OFFLOAD_ESN_SOFTLIMIT;
 }
 
+static int32_t
+compute_cipher_offset(struct ipsec_sa *sa)
+{
+	int32_t offset;
+
+	if (sa->aead_algo == RTE_CRYPTO_AEAD_AES_GCM)
+		return 0;
+
+	offset = (sa->iv_len + sizeof(struct rte_esp_hdr));
+
+	return offset;
+}
+
 int
 create_lookaside_session(struct ipsec_ctx *ipsec_ctx, struct ipsec_sa *sa)
 {
@@ -117,6 +131,25 @@ create_lookaside_session(struct ipsec_ctx *ipsec_ctx, struct ipsec_sa *sa)
 				"SEC Session init failed: err: %d\n", ret);
 				return -1;
 			}
+		} else if (sa->type == RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO) {
+			struct rte_security_ctx *ctx =
+				(struct rte_security_ctx *)
+				rte_cryptodev_get_sec_ctx(
+					ipsec_ctx->tbl[cdev_id_qp].id);
+
+			/* Set IPsec parameters in conf */
+			sess_conf.cpucrypto.cipher_offset =
+					compute_cipher_offset(sa);
+
+			set_ipsec_conf(sa, &(sess_conf.ipsec));
+			sa->security_ctx = ctx;
+			sa->sec_session = rte_security_session_create(ctx,
+				&sess_conf, ipsec_ctx->session_priv_pool);
+			if (sa->sec_session == NULL) {
+				RTE_LOG(ERR, IPSEC,
+				"SEC Session init failed: err: %d\n", ret);
+				return -1;
+			}
 		} else {
 			RTE_LOG(ERR, IPSEC, "Inline not supported\n");
 			return -1;
@@ -512,6 +545,8 @@ ipsec_enqueue(ipsec_xform_fn xform_func, struct ipsec_ctx *ipsec_ctx,
 						sa->security_ctx,
 						sa->sec_session, pkts[i], NULL);
 			continue;
+		default:
+			continue;
 		}
 
 		RTE_ASSERT(sa->cdev_id_qp < ipsec_ctx->nb_qps);
diff --git a/examples/ipsec-secgw/ipsec_process.c b/examples/ipsec-secgw/ipsec_process.c
index 868f1a28d..1932b631f 100644
--- a/examples/ipsec-secgw/ipsec_process.c
+++ b/examples/ipsec-secgw/ipsec_process.c
@@ -101,7 +101,8 @@ fill_ipsec_session(struct rte_ipsec_session *ss, struct ipsec_ctx *ctx,
 		}
 		ss->crypto.ses = sa->crypto_session;
 	/* setup session action type */
-	} else if (sa->type == RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL) {
+	} else if (sa->type == RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL ||
+			sa->type == RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO) {
 		if (sa->sec_session == NULL) {
 			rc = create_lookaside_session(ctx, sa);
 			if (rc != 0)
@@ -227,8 +228,8 @@ ipsec_process(struct ipsec_ctx *ctx, struct ipsec_traffic *trf)
 
 		/* process packets inline */
 		else if (sa->type == RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO ||
-				sa->type ==
-				RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL) {
+			sa->type == RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL ||
+			sa->type == RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO) {
 
 			satp = rte_ipsec_sa_type(ips->sa);
 
diff --git a/examples/ipsec-secgw/sa.c b/examples/ipsec-secgw/sa.c
index c3cf3bd1f..ba773346f 100644
--- a/examples/ipsec-secgw/sa.c
+++ b/examples/ipsec-secgw/sa.c
@@ -570,6 +570,9 @@ parse_sa_tokens(char **tokens, uint32_t n_tokens,
 				RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL;
 			else if (strcmp(tokens[ti], "no-offload") == 0)
 				rule->type = RTE_SECURITY_ACTION_TYPE_NONE;
+			else if (strcmp(tokens[ti], "cpu-crypto") == 0)
+				rule->type =
+					RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO;
 			else {
 				APP_CHECK(0, status, "Invalid input \"%s\"",
 						tokens[ti]);
@@ -624,10 +627,13 @@ parse_sa_tokens(char **tokens, uint32_t n_tokens,
 	if (status->status < 0)
 		return;
 
-	if ((rule->type != RTE_SECURITY_ACTION_TYPE_NONE) && (portid_p == 0))
+	if ((rule->type != RTE_SECURITY_ACTION_TYPE_NONE && rule->type !=
+			RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO) &&
+			(portid_p == 0))
 		printf("Missing portid option, falling back to non-offload\n");
 
-	if (!type_p || !portid_p) {
+	if (!type_p || (!portid_p && rule->type !=
+			RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO)) {
 		rule->type = RTE_SECURITY_ACTION_TYPE_NONE;
 		rule->portid = -1;
 	}
@@ -709,6 +715,9 @@ print_one_sa_rule(const struct ipsec_sa *sa, int inbound)
 	case RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL:
 		printf("lookaside-protocol-offload ");
 		break;
+	case RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO:
+		printf("cpu-crypto-accelerated");
+		break;
 	}
 	printf("\n");
 }
diff --git a/examples/ipsec-secgw/test/run_test.sh b/examples/ipsec-secgw/test/run_test.sh
index 8055a4c04..bcaf91715 100755
--- a/examples/ipsec-secgw/test/run_test.sh
+++ b/examples/ipsec-secgw/test/run_test.sh
@@ -32,15 +32,21 @@ usage()
 }
 
 LINUX_TEST="tun_aescbc_sha1 \
+tun_aescbc_sha1_cpu_crypto \
 tun_aescbc_sha1_esn \
 tun_aescbc_sha1_esn_atom \
 tun_aesgcm \
+tun_aesgcm_cpu_crypto \
+tun_aesgcm_mb_cpu_crypto \
 tun_aesgcm_esn \
 tun_aesgcm_esn_atom \
 trs_aescbc_sha1 \
+trs_aescbc_sha1_cpu_crypto \
 trs_aescbc_sha1_esn \
 trs_aescbc_sha1_esn_atom \
 trs_aesgcm \
+trs_aesgcm_cpu_crypto \
+trs_aesgcm_mb_cpu_crypto \
 trs_aesgcm_esn \
 trs_aesgcm_esn_atom \
 tun_aescbc_sha1_old \
@@ -49,17 +55,21 @@ trs_aescbc_sha1_old \
 trs_aesgcm_old \
 tun_aesctr_sha1 \
 tun_aesctr_sha1_old \
+tun_aesctr_sha1_cpu_crypto \
 tun_aesctr_sha1_esn \
 tun_aesctr_sha1_esn_atom \
 trs_aesctr_sha1 \
+trs_aesctr_sha1_cpu_crypto \
 trs_aesctr_sha1_old \
 trs_aesctr_sha1_esn \
 trs_aesctr_sha1_esn_atom \
 tun_3descbc_sha1 \
+tun_3descbc_sha1_cpu_crypto \
 tun_3descbc_sha1_old \
 tun_3descbc_sha1_esn \
 tun_3descbc_sha1_esn_atom \
 trs_3descbc_sha1 \
+trs_3descbc_sha1_cpu_crypto \
 trs_3descbc_sha1_old \
 trs_3descbc_sha1_esn \
 trs_3descbc_sha1_esn_atom"
diff --git a/examples/ipsec-secgw/test/trs_3descbc_sha1_common_defs.sh b/examples/ipsec-secgw/test/trs_3descbc_sha1_common_defs.sh
index bb4cef6a9..eda2ddf0c 100644
--- a/examples/ipsec-secgw/test/trs_3descbc_sha1_common_defs.sh
+++ b/examples/ipsec-secgw/test/trs_3descbc_sha1_common_defs.sh
@@ -32,14 +32,14 @@ cipher_key \
 de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode transport
+mode transport ${SGW_CFG_XPRM}
 
 sa in 9 cipher_algo 3des-cbc \
 cipher_key \
 de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode transport
+mode transport ${SGW_CFG_XPRM}
 
 #SA out rules
 sa out 7 cipher_algo 3des-cbc \
@@ -47,7 +47,7 @@ cipher_key \
 de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode transport
+mode transport ${SGW_CFG_XPRM}
 
 #SA out rules
 sa out 9 cipher_algo 3des-cbc \
@@ -55,7 +55,7 @@ cipher_key \
 de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode transport
+mode transport ${SGW_CFG_XPRM}
 
 #Routing rules
 rt ipv4 dst ${REMOTE_IPV4}/32 port 0
diff --git a/examples/ipsec-secgw/test/trs_3descbc_sha1_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/trs_3descbc_sha1_cpu_crypto_defs.sh
new file mode 100644
index 000000000..a864a8886
--- /dev/null
+++ b/examples/ipsec-secgw/test/trs_3descbc_sha1_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/trs_3descbc_sha1_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/trs_aescbc_sha1_common_defs.sh b/examples/ipsec-secgw/test/trs_aescbc_sha1_common_defs.sh
index e2621e0df..49b7b0713 100644
--- a/examples/ipsec-secgw/test/trs_aescbc_sha1_common_defs.sh
+++ b/examples/ipsec-secgw/test/trs_aescbc_sha1_common_defs.sh
@@ -31,27 +31,27 @@ sa in 7 cipher_algo aes-128-cbc \
 cipher_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode transport
+mode transport ${SGW_CFG_XPRM}
 
 sa in 9 cipher_algo aes-128-cbc \
 cipher_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode transport
+mode transport ${SGW_CFG_XPRM}
 
 #SA out rules
 sa out 7 cipher_algo aes-128-cbc \
 cipher_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode transport
+mode transport ${SGW_CFG_XPRM}
 
 #SA out rules
 sa out 9 cipher_algo aes-128-cbc \
 cipher_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode transport
+mode transport ${SGW_CFG_XPRM}
 
 #Routing rules
 rt ipv4 dst ${REMOTE_IPV4}/32 port 0
diff --git a/examples/ipsec-secgw/test/trs_aescbc_sha1_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/trs_aescbc_sha1_cpu_crypto_defs.sh
new file mode 100644
index 000000000..b515cd9f8
--- /dev/null
+++ b/examples/ipsec-secgw/test/trs_aescbc_sha1_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/trs_aescbc_sha1_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/trs_aesctr_sha1_common_defs.sh b/examples/ipsec-secgw/test/trs_aesctr_sha1_common_defs.sh
index 9c213e3cc..428322307 100644
--- a/examples/ipsec-secgw/test/trs_aesctr_sha1_common_defs.sh
+++ b/examples/ipsec-secgw/test/trs_aesctr_sha1_common_defs.sh
@@ -31,27 +31,27 @@ sa in 7 cipher_algo aes-128-ctr \
 cipher_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode transport
+mode transport ${SGW_CFG_XPRM}
 
 sa in 9 cipher_algo aes-128-ctr \
 cipher_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode transport
+mode transport ${SGW_CFG_XPRM}
 
 #SA out rules
 sa out 7 cipher_algo aes-128-ctr \
 cipher_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode transport
+mode transport ${SGW_CFG_XPRM}
 
 #SA out rules
 sa out 9 cipher_algo aes-128-ctr \
 cipher_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode transport
+mode transport ${SGW_CFG_XPRM}
 
 #Routing rules
 rt ipv4 dst ${REMOTE_IPV4}/32 port 0
diff --git a/examples/ipsec-secgw/test/trs_aesctr_sha1_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/trs_aesctr_sha1_cpu_crypto_defs.sh
new file mode 100644
index 000000000..745a2a02b
--- /dev/null
+++ b/examples/ipsec-secgw/test/trs_aesctr_sha1_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/trs_aesctr_sha1_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh
new file mode 100644
index 000000000..8917122da
--- /dev/null
+++ b/examples/ipsec-secgw/test/trs_aesgcm_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/trs_aesgcm_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/trs_aesgcm_mb_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/trs_aesgcm_mb_cpu_crypto_defs.sh
new file mode 100644
index 000000000..26943321f
--- /dev/null
+++ b/examples/ipsec-secgw/test/trs_aesgcm_mb_cpu_crypto_defs.sh
@@ -0,0 +1,7 @@
+#! /bin/bash
+
+. ${DIR}/trs_aesgcm_defs.sh
+
+CRYPTO_DEV=${CRYPTO_DEV:-'--vdev="crypto_aesni_mb0"'}
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/tun_3descbc_sha1_common_defs.sh b/examples/ipsec-secgw/test/tun_3descbc_sha1_common_defs.sh
index dd802d6be..a583ef605 100644
--- a/examples/ipsec-secgw/test/tun_3descbc_sha1_common_defs.sh
+++ b/examples/ipsec-secgw/test/tun_3descbc_sha1_common_defs.sh
@@ -32,14 +32,14 @@ cipher_key \
 de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode ipv4-tunnel src ${REMOTE_IPV4} dst ${LOCAL_IPV4}
+mode ipv4-tunnel src ${REMOTE_IPV4} dst ${LOCAL_IPV4} ${SGW_CFG_XPRM}
 
 sa in 9 cipher_algo 3des-cbc \
 cipher_key \
 de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode ipv6-tunnel src ${REMOTE_IPV6} dst ${LOCAL_IPV6}
+mode ipv6-tunnel src ${REMOTE_IPV6} dst ${LOCAL_IPV6} ${SGW_CFG_XPRM}
 
 #SA out rules
 sa out 7 cipher_algo 3des-cbc \
@@ -47,14 +47,14 @@ cipher_key \
 de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode ipv4-tunnel src ${LOCAL_IPV4} dst ${REMOTE_IPV4}
+mode ipv4-tunnel src ${LOCAL_IPV4} dst ${REMOTE_IPV4} ${SGW_CFG_XPRM}
 
 sa out 9 cipher_algo 3des-cbc \
 cipher_key \
 de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode ipv6-tunnel src ${LOCAL_IPV6} dst ${REMOTE_IPV6}
+mode ipv6-tunnel src ${LOCAL_IPV6} dst ${REMOTE_IPV6} ${SGW_CFG_XPRM}
 
 #Routing rules
 rt ipv4 dst ${REMOTE_IPV4}/32 port 0
diff --git a/examples/ipsec-secgw/test/tun_3descbc_sha1_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/tun_3descbc_sha1_cpu_crypto_defs.sh
new file mode 100644
index 000000000..747141f62
--- /dev/null
+++ b/examples/ipsec-secgw/test/tun_3descbc_sha1_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/tun_3descbc_sha1_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/tun_aescbc_sha1_common_defs.sh b/examples/ipsec-secgw/test/tun_aescbc_sha1_common_defs.sh
index 4025da232..ac0232d2c 100644
--- a/examples/ipsec-secgw/test/tun_aescbc_sha1_common_defs.sh
+++ b/examples/ipsec-secgw/test/tun_aescbc_sha1_common_defs.sh
@@ -31,26 +31,26 @@ sa in 7 cipher_algo aes-128-cbc \
 cipher_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode ipv4-tunnel src ${REMOTE_IPV4} dst ${LOCAL_IPV4}
+mode ipv4-tunnel src ${REMOTE_IPV4} dst ${LOCAL_IPV4} ${SGW_CFG_XPRM}
 
 sa in 9 cipher_algo aes-128-cbc \
 cipher_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode ipv6-tunnel src ${REMOTE_IPV6} dst ${LOCAL_IPV6}
+mode ipv6-tunnel src ${REMOTE_IPV6} dst ${LOCAL_IPV6} ${SGW_CFG_XPRM}
 
 #SA out rules
 sa out 7 cipher_algo aes-128-cbc \
 cipher_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode ipv4-tunnel src ${LOCAL_IPV4} dst ${REMOTE_IPV4}
+mode ipv4-tunnel src ${LOCAL_IPV4} dst ${REMOTE_IPV4} ${SGW_CFG_XPRM}
 
 sa out 9 cipher_algo aes-128-cbc \
 cipher_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode ipv6-tunnel src ${LOCAL_IPV6} dst ${REMOTE_IPV6}
+mode ipv6-tunnel src ${LOCAL_IPV6} dst ${REMOTE_IPV6} ${SGW_CFG_XPRM}
 
 #Routing rules
 rt ipv4 dst ${REMOTE_IPV4}/32 port 0
diff --git a/examples/ipsec-secgw/test/tun_aescbc_sha1_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/tun_aescbc_sha1_cpu_crypto_defs.sh
new file mode 100644
index 000000000..56076fa50
--- /dev/null
+++ b/examples/ipsec-secgw/test/tun_aescbc_sha1_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/tun_aescbc_sha1_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/tun_aesctr_sha1_common_defs.sh b/examples/ipsec-secgw/test/tun_aesctr_sha1_common_defs.sh
index a3ac3a698..523c396c9 100644
--- a/examples/ipsec-secgw/test/tun_aesctr_sha1_common_defs.sh
+++ b/examples/ipsec-secgw/test/tun_aesctr_sha1_common_defs.sh
@@ -31,26 +31,26 @@ sa in 7 cipher_algo aes-128-ctr \
 cipher_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode ipv4-tunnel src ${REMOTE_IPV4} dst ${LOCAL_IPV4}
+mode ipv4-tunnel src ${REMOTE_IPV4} dst ${LOCAL_IPV4} ${SGW_CFG_XPRM}
 
 sa in 9 cipher_algo aes-128-ctr \
 cipher_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode ipv6-tunnel src ${REMOTE_IPV6} dst ${LOCAL_IPV6}
+mode ipv6-tunnel src ${REMOTE_IPV6} dst ${LOCAL_IPV6} ${SGW_CFG_XPRM}
 
 #SA out rules
 sa out 7 cipher_algo aes-128-ctr \
 cipher_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode ipv4-tunnel src ${LOCAL_IPV4} dst ${REMOTE_IPV4}
+mode ipv4-tunnel src ${LOCAL_IPV4} dst ${REMOTE_IPV4} ${SGW_CFG_XPRM}
 
 sa out 9 cipher_algo aes-128-ctr \
 cipher_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
 auth_algo sha1-hmac \
 auth_key de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef:de:ad:be:ef \
-mode ipv6-tunnel src ${LOCAL_IPV6} dst ${REMOTE_IPV6}
+mode ipv6-tunnel src ${LOCAL_IPV6} dst ${REMOTE_IPV6} ${SGW_CFG_XPRM}
 
 #Routing rules
 rt ipv4 dst ${REMOTE_IPV4}/32 port 0
diff --git a/examples/ipsec-secgw/test/tun_aesctr_sha1_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/tun_aesctr_sha1_cpu_crypto_defs.sh
new file mode 100644
index 000000000..3af680533
--- /dev/null
+++ b/examples/ipsec-secgw/test/tun_aesctr_sha1_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/tun_aesctr_sha1_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh
new file mode 100644
index 000000000..5bf1c0ae5
--- /dev/null
+++ b/examples/ipsec-secgw/test/tun_aesgcm_cpu_crypto_defs.sh
@@ -0,0 +1,5 @@
+#! /bin/bash
+
+. ${DIR}/tun_aesgcm_defs.sh
+
+SGW_CFG_XPRM='type cpu-crypto'
diff --git a/examples/ipsec-secgw/test/tun_aesgcm_mb_cpu_crypto_defs.sh b/examples/ipsec-secgw/test/tun_aesgcm_mb_cpu_crypto_defs.sh
new file mode 100644
index 000000000..039b8095e
--- /dev/null
+++ b/examples/ipsec-secgw/test/tun_aesgcm_mb_cpu_crypto_defs.sh
@@ -0,0 +1,7 @@
+#! /bin/bash
+
+. ${DIR}/tun_aesgcm_defs.sh
+
+CRYPTO_DEV=${CRYPTO_DEV:-'--vdev="crypto_aesni_mb0"'}
+
+SGW_CFG_XPRM='type cpu-crypto'
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [dpdk-dev] [PATCH v2 10/10] doc: update security cpu process description
  2019-10-07 16:28   ` [dpdk-dev] [PATCH v2 " Fan Zhang
                       ` (8 preceding siblings ...)
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 09/10] examples/ipsec-secgw: add security " Fan Zhang
@ 2019-10-07 16:28     ` Fan Zhang
  9 siblings, 0 replies; 87+ messages in thread
From: Fan Zhang @ 2019-10-07 16:28 UTC (permalink / raw)
  To: dev; +Cc: konstantin.ananyev, declan.doherty, akhil.goyal, Fan Zhang

This patch updates programmer's guide and release note for
newly added security cpu process description.

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 doc/guides/cryptodevs/aesni_gcm.rst    |   6 ++
 doc/guides/cryptodevs/aesni_mb.rst     |   7 +++
 doc/guides/prog_guide/rte_security.rst | 112 ++++++++++++++++++++++++++++++++-
 doc/guides/rel_notes/release_19_11.rst |   7 +++
 4 files changed, 131 insertions(+), 1 deletion(-)

diff --git a/doc/guides/cryptodevs/aesni_gcm.rst b/doc/guides/cryptodevs/aesni_gcm.rst
index 15002aba7..e1c4f9d24 100644
--- a/doc/guides/cryptodevs/aesni_gcm.rst
+++ b/doc/guides/cryptodevs/aesni_gcm.rst
@@ -9,6 +9,12 @@ The AES-NI GCM PMD (**librte_pmd_aesni_gcm**) provides poll mode crypto driver
 support for utilizing Intel multi buffer library (see AES-NI Multi-buffer PMD documentation
 to learn more about it, including installation).
 
+The AES-NI GCM PMD also supports rte_security with security session create
+and ``rte_security_process_cpu_crypto_bulk`` function call to process
+symmetric crypto synchronously with all algorithms specified below. With this
+way it supports scather-gather buffers (``rte_security_vec`` can be greater than
+``1``. Please refer to ``rte_security`` programmer's guide for more detail.
+
 Features
 --------
 
diff --git a/doc/guides/cryptodevs/aesni_mb.rst b/doc/guides/cryptodevs/aesni_mb.rst
index 1eff2b073..1a3ddd850 100644
--- a/doc/guides/cryptodevs/aesni_mb.rst
+++ b/doc/guides/cryptodevs/aesni_mb.rst
@@ -12,6 +12,13 @@ support for utilizing Intel multi buffer library, see the white paper
 
 The AES-NI MB PMD has current only been tested on Fedora 21 64-bit with gcc.
 
+The AES-NI MB PMD also supports rte_security with security session create
+and ``rte_security_process_cpu_crypto_bulk`` function call to process
+symmetric crypto synchronously with all algorithms specified below. However
+it does not support scather-gather buffer so the ``num`` value in
+``rte_security_vec`` can only be ``1``. Please refer to ``rte_security``
+programmer's guide for more detail.
+
 Features
 --------
 
diff --git a/doc/guides/prog_guide/rte_security.rst b/doc/guides/prog_guide/rte_security.rst
index 7d0734a37..39bcc2e69 100644
--- a/doc/guides/prog_guide/rte_security.rst
+++ b/doc/guides/prog_guide/rte_security.rst
@@ -296,6 +296,56 @@ Just like IPsec, in case of PDCP also header addition/deletion, cipher/
 de-cipher, integrity protection/verification is done based on the action
 type chosen.
 
+
+Synchronous CPU Crypto
+~~~~~~~~~~~~~~~~~~~~~~
+
+RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO:
+This action type allows the burst of symmetric crypto workload using the same
+algorithm, key, and direction being processed by CPU cycles synchronously.
+
+The packet is sent to the crypto device for symmetric crypto
+processing. The device will encrypt or decrypt the buffer based on the key(s)
+and algorithm(s) specified and preprocessed in the security session. Different
+than the inline or lookaside modes, when the function exits, the user will
+expect the buffers are either processed successfully, or having the error number
+assigned to the appropriate index of the status array.
+
+E.g. in case of IPsec, the application will use CPU cycles to process both
+stack and crypto workload synchronously.
+
+.. code-block:: console
+
+         Egress Data Path
+                 |
+        +--------|--------+
+        |  egress IPsec   |
+        |        |        |
+        | +------V------+ |
+        | | SADB lookup | |
+        | +------|------+ |
+        | +------V------+ |
+        | |   Desc      | |
+        | +------|------+ |
+        +--------V--------+
+                 |
+        +--------V--------+
+        |    L2 Stack     |
+        +-----------------+
+        |                 |
+        |   Synchronous   |   <------ Using CPU instructions
+        |  Crypto Process |
+        |                 |
+        +--------V--------+
+        |  L2 Stack Post  |   <------ Add tunnel, ESP header etc header etc.
+        +--------|--------+
+                 |
+        +--------|--------+
+        |       NIC       |
+        +--------|--------+
+                 V
+
+
 Device Features and Capabilities
 ---------------------------------
 
@@ -491,6 +541,7 @@ Security Session configuration structure is defined as ``rte_security_session_co
                 struct rte_security_ipsec_xform ipsec;
                 struct rte_security_macsec_xform macsec;
                 struct rte_security_pdcp_xform pdcp;
+                struct rte_security_cpu_crypto_xform cpu_crypto;
         };
         /**< Configuration parameters for security session */
         struct rte_crypto_sym_xform *crypto_xform;
@@ -515,9 +566,12 @@ Offload.
         RTE_SECURITY_ACTION_TYPE_INLINE_PROTOCOL,
         /**< All security protocol processing is performed inline during
          * transmission */
-        RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
+        RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL,
         /**< All security protocol processing including crypto is performed
          * on a lookaside accelerator */
+        RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
+        /**< Crypto processing for security protocol is processed by CPU
+         * synchronously
     };
 
 The ``rte_security_session_protocol`` is defined as
@@ -587,6 +641,10 @@ PDCP related configuration parameters are defined in ``rte_security_pdcp_xform``
         uint32_t hfn_threshold;
     };
 
+For CPU Crypto processing action, the application should attach the initialized
+`xform` to the security session configuration to specify the algorithm, key,
+direction, and other necessary fields required to perform crypto operation.
+
 
 Security API
 ~~~~~~~~~~~~
@@ -650,3 +708,55 @@ it is only valid to have a single flow to map to that security session.
         +-------+            +--------+    +-----+
         |  Eth  | ->  ... -> |   ESP  | -> | END |
         +-------+            +--------+    +-----+
+
+
+Process bulk crypto workload using CPU instructions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The inline and lookaside mode depends on the external HW to complete the
+workload, where the user has another option to use rte_security to process
+symmetric crypto synchronously with CPU instructions.
+
+When creating the security session the user need to fill the
+``rte_security_session_conf`` parameter with the ``action_type`` field as
+``RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO``, and points ``crypto_xform`` to an
+properly initialized cryptodev xform. The user then passes the
+``rte_security_session_conf`` instance to ``rte_security_session_create()``
+along with the security context pointer belongs to a certain SW crypto device.
+The crypto device may or may not support this action type or the algorithm /
+key sizes specified in the ``crypto_xform``, but when everything is ok
+the function will return the created security session.
+
+The user then can use this session to process the crypto workload synchronously.
+Instead of using mbuf ``next`` pointers, synchronous CPU crypto processing uses
+a special structure ``rte_security_vec`` to describe scatter-gather buffers.
+
+.. code-block:: c
+
+    struct rte_security_vec {
+        struct iovec *vec;
+        uint32_t num;
+    };
+
+Where the structure ``rte_security_vec`` is used to store scatter-gather buffer
+pointers, where ``vec`` is the pointer to one buffer and ``num`` indicates the
+number of buffers.
+
+Please note not all crypto devices support scatter-gather buffer processing,
+please check ``cryptodev`` guide for more details.
+
+The API of the synchronous CPU crypto process is
+
+.. code-block:: c
+
+    int
+    rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
+            struct rte_security_session *sess,
+            struct rte_security_vec buf[], void *iv[], void *aad[],
+            void *digest[], int status[], uint32_t num);
+
+This function will process ``num`` number of ``rte_security_vec`` buffers using
+the content stored in ``iv`` and ``aad`` arrays. The API only support in-place
+operation so ``buf`` will be overwritten the encrypted or decrypted values
+when successfully processed. Otherwise a negative value will be returned and
+the error number of the status array's according index will be set.
diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index f971b3f77..3d89ab643 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -72,6 +72,13 @@ New Features
   Added a symmetric crypto PMD for Marvell NITROX V security processor.
   See the :doc:`../cryptodevs/nitrox` guide for more details on this new
 
+* **Added synchronous Crypto burst API with CPU for RTE_SECURITY.**
+
+  A new API rte_security_process_cpu_crypto_bulk is introduced in security
+  library to process crypto workload in bulk using CPU instructions. AESNI_MB
+  and AESNI_GCM PMD, as well as unit-test and ipsec-secgw sample applications
+  are updated to support this feature.
+
 
 Removed Items
 -------------
-- 
2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [PATCH v2 01/10] security: introduce CPU Crypto action type and API
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 01/10] security: introduce CPU Crypto action type and API Fan Zhang
@ 2019-10-08 13:42       ` Ananyev, Konstantin
  0 siblings, 0 replies; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-10-08 13:42 UTC (permalink / raw)
  To: Zhang, Roy Fan, dev; +Cc: Doherty, Declan, akhil.goyal

Hi Fan,

> 
> This patch introduce new RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO action type to
> security library. The type represents performing crypto operation with CPU
> cycles. The patch also includes a new API to process crypto operations in
> bulk and the function pointers for PMDs.
> 
> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> ---
>  lib/librte_security/rte_security.c           | 11 ++++++
>  lib/librte_security/rte_security.h           | 53 +++++++++++++++++++++++++++-
>  lib/librte_security/rte_security_driver.h    | 22 ++++++++++++
>  lib/librte_security/rte_security_version.map |  1 +
>  4 files changed, 86 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_security/rte_security.c b/lib/librte_security/rte_security.c
> index bc81ce15d..cdd1ee6af 100644
> --- a/lib/librte_security/rte_security.c
> +++ b/lib/librte_security/rte_security.c
> @@ -141,3 +141,14 @@ rte_security_capability_get(struct rte_security_ctx *instance,
> 
>  	return NULL;
>  }
> +
> +int
> +rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
> +		struct rte_security_session *sess,
> +		struct rte_security_vec buf[], void *iv[], void *aad[],
> +		void *digest[], int status[], uint32_t num)
> +{
> +	RTE_FUNC_PTR_OR_ERR_RET(*instance->ops->process_cpu_crypto_bulk, -1);
> +	return instance->ops->process_cpu_crypto_bulk(sess, buf, iv,
> +			aad, digest, status, num);
> +}
> diff --git a/lib/librte_security/rte_security.h b/lib/librte_security/rte_security.h
> index aaafdfcd7..0caf5d697 100644
> --- a/lib/librte_security/rte_security.h
> +++ b/lib/librte_security/rte_security.h
> @@ -18,6 +18,7 @@ extern "C" {
>  #endif
> 
>  #include <sys/types.h>
> +#include <sys/uio.h>
> 
>  #include <netinet/in.h>
>  #include <netinet/ip.h>
> @@ -289,6 +290,20 @@ struct rte_security_pdcp_xform {
>  	uint32_t hfn_ovrd;
>  };
> 
> +struct rte_security_cpu_crypto_xform {
> +	/** For cipher/authentication crypto operation the authentication may
> +	 * cover more content then the cipher. E.g., for IPSec ESP encryption
> +	 * with AES-CBC and SHA1-HMAC, the encryption happens after the ESP
> +	 * header but whole packet (apart from MAC header) is authenticated.
> +	 * The cipher_offset field is used to deduct the cipher data pointer
> +	 * from the buffer to be processed.
> +	 *
> +	 * NOTE this parameter shall be ignored by AEAD algorithms, since it
> +	 * uses the same offset for cipher and authentication.
> +	 */
> +	int32_t cipher_offset;
> +};
> +
>  /**
>   * Security session action type.
>   */
> @@ -303,10 +318,14 @@ enum rte_security_session_action_type {
>  	/**< All security protocol processing is performed inline during
>  	 * transmission
>  	 */
> -	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
> +	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL,
>  	/**< All security protocol processing including crypto is performed
>  	 * on a lookaside accelerator
>  	 */
> +	RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> +	/**< Crypto processing for security protocol is processed by CPU
> +	 * synchronously
> +	 */
>  };
> 
>  /** Security session protocol definition */
> @@ -332,6 +351,7 @@ struct rte_security_session_conf {
>  		struct rte_security_ipsec_xform ipsec;
>  		struct rte_security_macsec_xform macsec;
>  		struct rte_security_pdcp_xform pdcp;
> +		struct rte_security_cpu_crypto_xform cpucrypto;
>  	};
>  	/**< Configuration parameters for security session */
>  	struct rte_crypto_sym_xform *crypto_xform;
> @@ -665,6 +685,37 @@ const struct rte_security_capability *
>  rte_security_capability_get(struct rte_security_ctx *instance,
>  			    struct rte_security_capability_idx *idx);
> 
> +/**
> + * Security vector structure, contains pointer to vector array and the length
> + * of the array
> + */
> +struct rte_security_vec {
> +	struct iovec *vec;
> +	uint32_t num;
> +};
> +
> +/**
> + * Processing bulk crypto workload with CPU
> + *
> + * @param	instance	security instance.
> + * @param	sess		security session
> + * @param	buf		array of buffer SGL vectors
> + * @param	iv		array of IV pointers
> + * @param	aad		array of AAD pointers
> + * @param	digest		array of digest pointers
> + * @param	status		array of status for the function to return
> + * @param	num		number of elements in each array
> + * @return
> + *  - On success, 0
> + *  - On any failure, -1

I think it is much better to retrun number of successfully process entries
(or number of failed entries - whatever is your preference).
Then user can easily determine does he need to walk through status
(and if yes till what point) or not at all.
Sorry if I wasn't clear in my previous comment.

> + */
> +__rte_experimental
> +int
> +rte_security_process_cpu_crypto_bulk(struct rte_security_ctx *instance,
> +		struct rte_security_session *sess,
> +		struct rte_security_vec buf[], void *iv[], void *aad[],
> +		void *digest[], int status[], uint32_t num);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_security/rte_security_driver.h b/lib/librte_security/rte_security_driver.h
> index 1b561f852..fe940fffa 100644
> --- a/lib/librte_security/rte_security_driver.h
> +++ b/lib/librte_security/rte_security_driver.h
> @@ -132,6 +132,26 @@ typedef int (*security_get_userdata_t)(void *device,
>  typedef const struct rte_security_capability *(*security_capabilities_get_t)(
>  		void *device);
> 
> +/**
> + * Process security operations in bulk using CPU accelerated method.
> + *
> + * @param	sess		Security session structure.
> + * @param	buf		Buffer to the vectors to be processed.
> + * @param	iv		IV pointers.
> + * @param	aad		AAD pointers.
> + * @param	digest		Digest pointers.
> + * @param	status		Array of status value.
> + * @param	num		Number of elements in each array.
> + * @return
> + *  - On success, 0
> + *  - On any failure, -1
> + */
> +
> +typedef int (*security_process_cpu_crypto_bulk_t)(
> +		struct rte_security_session *sess,
> +		struct rte_security_vec buf[], void *iv[], void *aad[],
> +		void *digest[], int status[], uint32_t num);
> +
>  /** Security operations function pointer table */
>  struct rte_security_ops {
>  	security_session_create_t session_create;
> @@ -150,6 +170,8 @@ struct rte_security_ops {
>  	/**< Get userdata associated with session which processed the packet. */
>  	security_capabilities_get_t capabilities_get;
>  	/**< Get security capabilities. */
> +	security_process_cpu_crypto_bulk_t process_cpu_crypto_bulk;
> +	/**< Process data in bulk. */
>  };
> 
>  #ifdef __cplusplus
> diff --git a/lib/librte_security/rte_security_version.map b/lib/librte_security/rte_security_version.map
> index 53267bf3c..2132e7a00 100644
> --- a/lib/librte_security/rte_security_version.map
> +++ b/lib/librte_security/rte_security_version.map
> @@ -18,4 +18,5 @@ EXPERIMENTAL {
>  	rte_security_get_userdata;
>  	rte_security_session_stats_get;
>  	rte_security_session_update;
> +	rte_security_process_cpu_crypto_bulk;
>  };
> --
> 2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [PATCH v2 02/10] crypto/aesni_gcm: add rte_security handler
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 02/10] crypto/aesni_gcm: add rte_security handler Fan Zhang
@ 2019-10-08 13:44       ` Ananyev, Konstantin
  0 siblings, 0 replies; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-10-08 13:44 UTC (permalink / raw)
  To: Zhang, Roy Fan, dev; +Cc: Doherty, Declan, akhil.goyal



> 
> This patch add rte_security support support to AESNI-GCM PMD. The PMD now
> initialize security context instance, create/delete PMD specific security
> sessions, and process crypto workloads in synchronous mode with
> scatter-gather list buffer supported.
> 
> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> ---
>  drivers/crypto/aesni_gcm/aesni_gcm_pmd.c         | 97 +++++++++++++++++++++++-
>  drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c     | 95 +++++++++++++++++++++++
>  drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h | 23 ++++++
>  drivers/crypto/aesni_gcm/meson.build             |  2 +-
>  4 files changed, 215 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
> index 1006a5c4d..2e91bf149 100644
> --- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
> +++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
> @@ -6,6 +6,7 @@
>  #include <rte_hexdump.h>
>  #include <rte_cryptodev.h>
>  #include <rte_cryptodev_pmd.h>
> +#include <rte_security_driver.h>
>  #include <rte_bus_vdev.h>
>  #include <rte_malloc.h>
>  #include <rte_cpuflags.h>
> @@ -174,6 +175,56 @@ aesni_gcm_get_session(struct aesni_gcm_qp *qp, struct rte_crypto_op *op)
>  	return sess;
>  }
> 
> +static __rte_always_inline int
> +process_gcm_security_sgl_buf(struct aesni_gcm_security_session *sess,
> +		struct rte_security_vec *buf, uint8_t *iv,
> +		uint8_t *aad, uint8_t *digest)
> +{
> +	struct aesni_gcm_session *session = &sess->sess;
> +	uint8_t *tag;
> +	uint32_t i;
> +
> +	sess->init(&session->gdata_key, &sess->gdata_ctx, iv, aad,
> +			(uint64_t)session->aad_length);
> +
> +	for (i = 0; i < buf->num; i++) {
> +		struct iovec *vec = &buf->vec[i];
> +
> +		sess->update(&session->gdata_key, &sess->gdata_ctx,
> +				vec->iov_base, vec->iov_base, vec->iov_len);
> +	}
> +
> +	switch (session->op) {
> +	case AESNI_GCM_OP_AUTHENTICATED_ENCRYPTION:
> +		if (session->req_digest_length != session->gen_digest_length)
> +			tag = sess->temp_digest;
> +		else
> +			tag = digest;
> +
> +		sess->finalize(&session->gdata_key, &sess->gdata_ctx, tag,
> +				session->gen_digest_length);
> +
> +		if (session->req_digest_length != session->gen_digest_length)
> +			memcpy(digest, sess->temp_digest,
> +					session->req_digest_length);
> +		break;
> +
> +	case AESNI_GCM_OP_AUTHENTICATED_DECRYPTION:
> +		tag = sess->temp_digest;
> +
> +		sess->finalize(&session->gdata_key, &sess->gdata_ctx, tag,
> +				session->gen_digest_length);
> +
> +		if (memcmp(tag, digest,	session->req_digest_length) != 0)
> +			return -1;
> +		break;
> +	default:
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
>  /**
>   * Process a crypto operation, calling
>   * the GCM API from the multi buffer library.
> @@ -488,8 +539,10 @@ aesni_gcm_create(const char *name,
>  {
>  	struct rte_cryptodev *dev;
>  	struct aesni_gcm_private *internals;
> +	struct rte_security_ctx *sec_ctx;
>  	enum aesni_gcm_vector_mode vector_mode;
>  	MB_MGR *mb_mgr;
> +	char sec_name[RTE_DEV_NAME_MAX_LEN];
> 
>  	/* Check CPU for support for AES instruction set */
>  	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_AES)) {
> @@ -524,7 +577,8 @@ aesni_gcm_create(const char *name,
>  			RTE_CRYPTODEV_FF_SYM_OPERATION_CHAINING |
>  			RTE_CRYPTODEV_FF_CPU_AESNI |
>  			RTE_CRYPTODEV_FF_OOP_SGL_IN_LB_OUT |
> -			RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT;
> +			RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT |
> +			RTE_CRYPTODEV_FF_SECURITY;
> 
>  	mb_mgr = alloc_mb_mgr(0);
>  	if (mb_mgr == NULL)
> @@ -587,6 +641,21 @@ aesni_gcm_create(const char *name,
> 
>  	internals->max_nb_queue_pairs = init_params->max_nb_queue_pairs;
> 
> +	/* setup security operations */
> +	snprintf(sec_name, sizeof(sec_name) - 1, "aes_gcm_sec_%u",
> +			dev->driver_id);
> +	sec_ctx = rte_zmalloc_socket(sec_name,
> +			sizeof(struct rte_security_ctx),
> +			RTE_CACHE_LINE_SIZE, init_params->socket_id);
> +	if (sec_ctx == NULL) {
> +		AESNI_GCM_LOG(ERR, "memory allocation failed\n");
> +		goto error_exit;
> +	}
> +
> +	sec_ctx->device = (void *)dev;
> +	sec_ctx->ops = rte_aesni_gcm_pmd_security_ops;
> +	dev->security_ctx = sec_ctx;
> +
>  #if IMB_VERSION_NUM >= IMB_VERSION(0, 50, 0)
>  	AESNI_GCM_LOG(INFO, "IPSec Multi-buffer library version used: %s\n",
>  			imb_get_version_str());
> @@ -641,6 +710,8 @@ aesni_gcm_remove(struct rte_vdev_device *vdev)
>  	if (cryptodev == NULL)
>  		return -ENODEV;
> 
> +	rte_free(cryptodev->security_ctx);
> +
>  	internals = cryptodev->data->dev_private;
> 
>  	free_mb_mgr(internals->mb_mgr);
> @@ -648,6 +719,30 @@ aesni_gcm_remove(struct rte_vdev_device *vdev)
>  	return rte_cryptodev_pmd_destroy(cryptodev);
>  }
> 
> +int
> +aesni_gcm_sec_crypto_process_bulk(struct rte_security_session *sess,
> +		struct rte_security_vec buf[], void *iv[], void *aad[],
> +		void *digest[], int status[], uint32_t num)
> +{
> +	struct aesni_gcm_security_session *session =
> +			get_sec_session_private_data(sess);
> +	uint32_t i;
> +	int errcnt = 0;
> +
> +	if (unlikely(!session))
> +		return -num;

You return negative status (error), but don't send each status[] value.


> +
> +	for (i = 0; i < num; i++) {
> +		status[i] = process_gcm_security_sgl_buf(session, &buf[i],
> +				(uint8_t *)iv[i], (uint8_t *)aad[i],
> +				(uint8_t *)digest[i]);
> +		if (unlikely(status[i]))
> +			errcnt -= 1;
> +	}
> +
> +	return errcnt;
> +}
> +
>  static struct rte_vdev_driver aesni_gcm_pmd_drv = {
>  	.probe = aesni_gcm_probe,
>  	.remove = aesni_gcm_remove
> diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
> index 2f66c7c58..cc71dbd60 100644
> --- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
> +++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
> @@ -7,6 +7,7 @@
>  #include <rte_common.h>
>  #include <rte_malloc.h>
>  #include <rte_cryptodev_pmd.h>
> +#include <rte_security_driver.h>
> 
>  #include "aesni_gcm_pmd_private.h"
> 
> @@ -316,6 +317,85 @@ aesni_gcm_pmd_sym_session_clear(struct rte_cryptodev *dev,
>  	}
>  }
> 
> +static int
> +aesni_gcm_security_session_create(void *dev,
> +		struct rte_security_session_conf *conf,
> +		struct rte_security_session *sess,
> +		struct rte_mempool *mempool)
> +{
> +	struct rte_cryptodev *cdev = dev;
> +	struct aesni_gcm_private *internals = cdev->data->dev_private;
> +	struct aesni_gcm_security_session *sess_priv;
> +	int ret;
> +
> +	if (!conf->crypto_xform) {
> +		AESNI_GCM_LOG(ERR, "Invalid security session conf");
> +		return -EINVAL;
> +	}
> +
> +	if (conf->crypto_xform->type == RTE_CRYPTO_SYM_XFORM_AUTH) {
> +		AESNI_GCM_LOG(ERR, "GMAC is not supported in security session");
> +		return -EINVAL;
> +	}
> +
> +
> +	if (rte_mempool_get(mempool, (void **)(&sess_priv))) {
> +		AESNI_GCM_LOG(ERR,
> +				"Couldn't get object from session mempool");
> +		return -ENOMEM;
> +	}
> +
> +	ret = aesni_gcm_set_session_parameters(internals->ops,
> +				&sess_priv->sess, conf->crypto_xform);
> +	if (ret != 0) {
> +		AESNI_GCM_LOG(ERR, "Failed configure session parameters");
> +
> +		/* Return session to mempool */
> +		rte_mempool_put(mempool, (void *)sess_priv);
> +		return ret;
> +	}
> +
> +	sess_priv->pre = internals->ops[sess_priv->sess.key].pre;
> +	sess_priv->init = internals->ops[sess_priv->sess.key].init;
> +	if (sess_priv->sess.op == AESNI_GCM_OP_AUTHENTICATED_ENCRYPTION) {
> +		sess_priv->update =
> +			internals->ops[sess_priv->sess.key].update_enc;
> +		sess_priv->finalize =
> +			internals->ops[sess_priv->sess.key].finalize_enc;
> +	} else {
> +		sess_priv->update =
> +			internals->ops[sess_priv->sess.key].update_dec;
> +		sess_priv->finalize =
> +			internals->ops[sess_priv->sess.key].finalize_dec;
> +	}
> +
> +	sess->sess_private_data = sess_priv;
> +
> +	return 0;
> +}
> +
> +static int
> +aesni_gcm_security_session_destroy(void *dev __rte_unused,
> +		struct rte_security_session *sess)
> +{
> +	void *sess_priv = get_sec_session_private_data(sess);
> +
> +	if (sess_priv) {
> +		struct rte_mempool *sess_mp = rte_mempool_from_obj(sess_priv);
> +
> +		memset(sess, 0, sizeof(struct aesni_gcm_security_session));
> +		set_sec_session_private_data(sess, NULL);
> +		rte_mempool_put(sess_mp, sess_priv);
> +	}
> +	return 0;
> +}
> +
> +static unsigned int
> +aesni_gcm_sec_session_get_size(__rte_unused void *device)
> +{
> +	return sizeof(struct aesni_gcm_security_session);
> +}
> +
>  struct rte_cryptodev_ops aesni_gcm_pmd_ops = {
>  		.dev_configure		= aesni_gcm_pmd_config,
>  		.dev_start		= aesni_gcm_pmd_start,
> @@ -336,4 +416,19 @@ struct rte_cryptodev_ops aesni_gcm_pmd_ops = {
>  		.sym_session_clear	= aesni_gcm_pmd_sym_session_clear
>  };
> 
> +static struct rte_security_ops aesni_gcm_security_ops = {
> +		.session_create = aesni_gcm_security_session_create,
> +		.session_get_size = aesni_gcm_sec_session_get_size,
> +		.session_update = NULL,
> +		.session_stats_get = NULL,
> +		.session_destroy = aesni_gcm_security_session_destroy,
> +		.set_pkt_metadata = NULL,
> +		.capabilities_get = NULL,
> +		.process_cpu_crypto_bulk =
> +				aesni_gcm_sec_crypto_process_bulk,
> +};
> +
>  struct rte_cryptodev_ops *rte_aesni_gcm_pmd_ops = &aesni_gcm_pmd_ops;
> +
> +struct rte_security_ops *rte_aesni_gcm_pmd_security_ops =
> +		&aesni_gcm_security_ops;
> diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
> index 56b29e013..ed3f6eb2e 100644
> --- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
> +++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
> @@ -114,5 +114,28 @@ aesni_gcm_set_session_parameters(const struct aesni_gcm_ops *ops,
>   * Device specific operations function pointer structure */
>  extern struct rte_cryptodev_ops *rte_aesni_gcm_pmd_ops;
> 
> +/**
> + * Security session structure.
> + */
> +struct aesni_gcm_security_session {
> +	/** Temp digest for decryption */
> +	uint8_t temp_digest[DIGEST_LENGTH_MAX];
> +	/** GCM operations */
> +	aesni_gcm_pre_t pre;
> +	aesni_gcm_init_t init;
> +	aesni_gcm_update_t update;
> +	aesni_gcm_finalize_t finalize;
> +	/** AESNI-GCM session */
> +	struct aesni_gcm_session sess;
> +	/** AESNI-GCM context */
> +	struct gcm_context_data gdata_ctx;
> +};
> +
> +extern int
> +aesni_gcm_sec_crypto_process_bulk(struct rte_security_session *sess,
> +		struct rte_security_vec buf[], void *iv[], void *aad[],
> +		void *digest[], int status[], uint32_t num);
> +
> +extern struct rte_security_ops *rte_aesni_gcm_pmd_security_ops;
> 
>  #endif /* _RTE_AESNI_GCM_PMD_PRIVATE_H_ */
> diff --git a/drivers/crypto/aesni_gcm/meson.build b/drivers/crypto/aesni_gcm/meson.build
> index 3a6e332dc..f6e160bb3 100644
> --- a/drivers/crypto/aesni_gcm/meson.build
> +++ b/drivers/crypto/aesni_gcm/meson.build
> @@ -22,4 +22,4 @@ endif
> 
>  allow_experimental_apis = true
>  sources = files('aesni_gcm_pmd.c', 'aesni_gcm_pmd_ops.c')
> -deps += ['bus_vdev']
> +deps += ['bus_vdev', 'security']
> --
> 2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [PATCH v2 05/10] crypto/aesni_mb: add rte_security handler
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 05/10] crypto/aesni_mb: add rte_security handler Fan Zhang
@ 2019-10-08 16:23       ` Ananyev, Konstantin
  2019-10-09  8:29       ` Ananyev, Konstantin
  1 sibling, 0 replies; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-10-08 16:23 UTC (permalink / raw)
  To: Zhang, Roy Fan, dev; +Cc: Doherty, Declan, akhil.goyal


Hi Fan,
 
> This patch add rte_security support support to AESNI-MB PMD. The PMD now
> initialize security context instance, create/delete PMD specific security
> sessions, and process crypto workloads in synchronous mode.
> 
> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> ---
>  drivers/crypto/aesni_mb/meson.build                |   2 +-
>  drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c         | 368 +++++++++++++++++++--
>  drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c     |  92 +++++-
>  drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h |  21 +-
>  4 files changed, 453 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/crypto/aesni_mb/meson.build b/drivers/crypto/aesni_mb/meson.build
> index 3e1687416..e7b585168 100644
> --- a/drivers/crypto/aesni_mb/meson.build
> +++ b/drivers/crypto/aesni_mb/meson.build
> @@ -23,4 +23,4 @@ endif
> 
>  sources = files('rte_aesni_mb_pmd.c', 'rte_aesni_mb_pmd_ops.c')
>  allow_experimental_apis = true
> -deps += ['bus_vdev']
> +deps += ['bus_vdev', 'security']
> diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
> index ce1144b95..a4cd518b7 100644
> --- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
> +++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
> @@ -8,6 +8,8 @@
>  #include <rte_hexdump.h>
>  #include <rte_cryptodev.h>
>  #include <rte_cryptodev_pmd.h>
> +#include <rte_security.h>
> +#include <rte_security_driver.h>
>  #include <rte_bus_vdev.h>
>  #include <rte_malloc.h>
>  #include <rte_cpuflags.h>
> @@ -19,6 +21,9 @@
>  #define HMAC_MAX_BLOCK_SIZE 128
>  static uint8_t cryptodev_driver_id;
> 
> +static enum aesni_mb_vector_mode vector_mode;
> +/**< CPU vector instruction set mode */
> +
>  typedef void (*hash_one_block_t)(const void *data, void *digest);
>  typedef void (*aes_keyexp_t)(const void *key, void *enc_exp_keys, void *dec_exp_keys);
> 
> @@ -808,6 +813,164 @@ auth_start_offset(struct rte_crypto_op *op, struct aesni_mb_session *session,
>  			(UINT64_MAX - u_src + u_dst + 1);
>  }
> 
> +union sec_userdata_field {
> +	int status;
> +	struct {
> +		uint16_t is_gen_digest;
> +		uint16_t digest_len;
> +	};
> +};
> +
> +struct sec_udata_digest_field {
> +	uint32_t is_digest_gen;
> +	uint32_t digest_len;
> +};
> +
> +static inline int
> +set_mb_job_params_sec(JOB_AES_HMAC *job, struct aesni_mb_sec_session *sec_sess,
> +		void *buf, uint32_t buf_len, void *iv, void *aad, void *digest,
> +		int *status, uint8_t *digest_idx)
> +{
> +	struct aesni_mb_session *session = &sec_sess->sess;
> +	uint32_t cipher_offset = sec_sess->cipher_offset;
> +	union sec_userdata_field udata;
> +
> +	if (unlikely(cipher_offset > buf_len))
> +		return -EINVAL;
> +
> +	/* Set crypto operation */
> +	job->chain_order = session->chain_order;
> +
> +	/* Set cipher parameters */
> +	job->cipher_direction = session->cipher.direction;
> +	job->cipher_mode = session->cipher.mode;
> +
> +	job->aes_key_len_in_bytes = session->cipher.key_length_in_bytes;
> +
> +	/* Set authentication parameters */
> +	job->hash_alg = session->auth.algo;
> +	job->iv = iv;
> +
> +	switch (job->hash_alg) {
> +	case AES_XCBC:
> +		job->u.XCBC._k1_expanded = session->auth.xcbc.k1_expanded;
> +		job->u.XCBC._k2 = session->auth.xcbc.k2;
> +		job->u.XCBC._k3 = session->auth.xcbc.k3;
> +
> +		job->aes_enc_key_expanded =
> +				session->cipher.expanded_aes_keys.encode;
> +		job->aes_dec_key_expanded =
> +				session->cipher.expanded_aes_keys.decode;
> +		break;
> +
> +	case AES_CCM:
> +		job->u.CCM.aad = (uint8_t *)aad + 18;
> +		job->u.CCM.aad_len_in_bytes = session->aead.aad_len;
> +		job->aes_enc_key_expanded =
> +				session->cipher.expanded_aes_keys.encode;
> +		job->aes_dec_key_expanded =
> +				session->cipher.expanded_aes_keys.decode;
> +		job->iv++;
> +		break;
> +
> +	case AES_CMAC:
> +		job->u.CMAC._key_expanded = session->auth.cmac.expkey;
> +		job->u.CMAC._skey1 = session->auth.cmac.skey1;
> +		job->u.CMAC._skey2 = session->auth.cmac.skey2;
> +		job->aes_enc_key_expanded =
> +				session->cipher.expanded_aes_keys.encode;
> +		job->aes_dec_key_expanded =
> +				session->cipher.expanded_aes_keys.decode;
> +		break;
> +
> +	case AES_GMAC:
> +		if (session->cipher.mode == GCM) {
> +			job->u.GCM.aad = aad;
> +			job->u.GCM.aad_len_in_bytes = session->aead.aad_len;
> +		} else {
> +			/* For GMAC */
> +			job->u.GCM.aad = aad;
> +			job->u.GCM.aad_len_in_bytes = buf_len;
> +			job->cipher_mode = GCM;
> +		}
> +		job->aes_enc_key_expanded = &session->cipher.gcm_key;
> +		job->aes_dec_key_expanded = &session->cipher.gcm_key;
> +		break;
> +
> +	default:
> +		job->u.HMAC._hashed_auth_key_xor_ipad =
> +				session->auth.pads.inner;
> +		job->u.HMAC._hashed_auth_key_xor_opad =
> +				session->auth.pads.outer;

Same question as from v1:
Seems like too many branches at data-path.
We'll have only one job-type(alg) per session.
Can we have prefilled job struct template with all common fields already setuped,
and then at process() just copy it over and update few fields that has to be different
(like msg_len_to_cipher_in_bytes)?
If the whole job struct is big enough (184B), we at least can copy contents of u (24B)
in one go, can't we?


> +
> +		if (job->cipher_mode == DES3) {
> +			job->aes_enc_key_expanded =
> +				session->cipher.exp_3des_keys.ks_ptr;
> +			job->aes_dec_key_expanded =
> +				session->cipher.exp_3des_keys.ks_ptr;
> +		} else {
> +			job->aes_enc_key_expanded =
> +				session->cipher.expanded_aes_keys.encode;
> +			job->aes_dec_key_expanded =
> +				session->cipher.expanded_aes_keys.decode;
> +		}
> +	}
> +
> +	/* Set digest output location */
> +	if (job->hash_alg != NULL_HASH &&
> +			session->auth.operation == RTE_CRYPTO_AUTH_OP_VERIFY) {
> +		job->auth_tag_output = sec_sess->temp_digests[*digest_idx];
> +		*digest_idx = (*digest_idx + 1) % MAX_JOBS;
> +
> +		udata.is_gen_digest = 0;
> +		udata.digest_len = session->auth.req_digest_len;
> +	} else {
> +		udata.is_gen_digest = 1;
> +		udata.digest_len = session->auth.req_digest_len;
> +
> +		if (session->auth.req_digest_len !=
> +				session->auth.gen_digest_len) {
> +			job->auth_tag_output =
> +					sec_sess->temp_digests[*digest_idx];
> +			*digest_idx = (*digest_idx + 1) % MAX_JOBS;
> +		} else
> +			job->auth_tag_output = digest;
> +	}
> +
> +	/* A bit of hack here, since job structure only supports
> +	 * 2 user data fields and we need 4 params to be passed
> +	 * (status, direction, digest for verify, and length of
> +	 * digest), we set the status value as digest length +
> +	 * direction here temporarily to avoid creating longer
> +	 * buffer to store all 4 params.
> +	 */
> +	*status = udata.status;
> +
> +	/*
> +	 * Multi-buffer library current only support returning a truncated
> +	 * digest length as specified in the relevant IPsec RFCs
> +	 */
> +
> +	/* Set digest length */
> +	job->auth_tag_output_len_in_bytes = session->auth.gen_digest_len;
> +
> +	/* Set IV parameters */
> +	job->iv_len_in_bytes = session->iv.length;
> +
> +	/* Data Parameters */
> +	job->src = buf;
> +	job->dst = (uint8_t *)buf + cipher_offset;
> +	job->cipher_start_src_offset_in_bytes = cipher_offset;
> +	job->msg_len_to_cipher_in_bytes = buf_len - cipher_offset;
> +	job->hash_start_src_offset_in_bytes = 0;
> +	job->msg_len_to_hash_in_bytes = buf_len;
> +
> +	job->user_data = (void *)status;
> +	job->user_data2 = digest;
> +
> +	return 0;
> +}
> +
>  /**
>   * Process a crypto operation and complete a JOB_AES_HMAC job structure for
>   * submission to the multi buffer library for processing.
> @@ -1100,6 +1263,35 @@ post_process_mb_job(struct aesni_mb_qp *qp, JOB_AES_HMAC *job)
>  	return op;
>  }
> 
> +static inline void
> +post_process_mb_sec_job(JOB_AES_HMAC *job)
> +{
> +	void *user_digest = job->user_data2;
> +	int *status = job->user_data;
> +
> +	switch (job->status) {
> +	case STS_COMPLETED:
> +		if (user_digest) {
> +			union sec_userdata_field udata;
> +
> +			udata.status = *status;
> +			if (udata.is_gen_digest) {
> +				*status = RTE_CRYPTO_OP_STATUS_SUCCESS;
> +				memcpy(user_digest, job->auth_tag_output,
> +						udata.digest_len);
> +			} else {
> +				*status = (memcmp(job->auth_tag_output,
> +					user_digest, udata.digest_len) != 0) ?
> +						-1 : 0;
> +			}
> +		} else
> +			*status = RTE_CRYPTO_OP_STATUS_SUCCESS;

Same question as for v1:
multiple process() functions instead of branches at data-path?

> +		break;
> +	default:
> +		*status = RTE_CRYPTO_OP_STATUS_ERROR;
> +	}
> +}
> +
>  /**
>   * Process a completed JOB_AES_HMAC job and keep processing jobs until
>   * get_completed_job return NULL
> @@ -1136,6 +1328,32 @@ handle_completed_jobs(struct aesni_mb_qp *qp, JOB_AES_HMAC *job,
>  	return processed_jobs;
>  }
> 
> +static inline uint32_t
> +handle_completed_sec_jobs(JOB_AES_HMAC *job, MB_MGR *mb_mgr)
> +{
> +	uint32_t processed = 0;
> +
> +	while (job != NULL) {
> +		post_process_mb_sec_job(job);
> +		job = IMB_GET_COMPLETED_JOB(mb_mgr);
> +		processed++;
> +	}
> +
> +	return processed;
> +}
> +
> +static inline uint32_t
> +flush_mb_sec_mgr(MB_MGR *mb_mgr)
> +{
> +	JOB_AES_HMAC *job = IMB_FLUSH_JOB(mb_mgr);
> +	uint32_t processed = 0;
> +
> +	if (job)
> +		processed = handle_completed_sec_jobs(job, mb_mgr);
> +
> +	return processed;
> +}
> +
>  static inline uint16_t
>  flush_mb_mgr(struct aesni_mb_qp *qp, struct rte_crypto_op **ops,
>  		uint16_t nb_ops)
> @@ -1239,6 +1457,105 @@ aesni_mb_pmd_dequeue_burst(void *queue_pair, struct rte_crypto_op **ops,
>  	return processed_jobs;
>  }
> 
> +static MB_MGR *
> +alloc_init_mb_mgr(void)
> +{
> +	MB_MGR *mb_mgr = alloc_mb_mgr(0);
> +	if (mb_mgr == NULL)
> +		return NULL;
> +
> +	switch (vector_mode) {
> +	case RTE_AESNI_MB_SSE:
> +		init_mb_mgr_sse(mb_mgr);
> +		break;
> +	case RTE_AESNI_MB_AVX:
> +		init_mb_mgr_avx(mb_mgr);
> +		break;
> +	case RTE_AESNI_MB_AVX2:
> +		init_mb_mgr_avx2(mb_mgr);
> +		break;
> +	case RTE_AESNI_MB_AVX512:
> +		init_mb_mgr_avx512(mb_mgr);
> +		break;
> +	default:
> +		AESNI_MB_LOG(ERR, "Unsupported vector mode %u\n", vector_mode);
> +		free_mb_mgr(mb_mgr);
> +		return NULL;
> +	}
> +
> +	return mb_mgr;
> +}
> +
> +static MB_MGR *sec_mb_mgrs[RTE_MAX_LCORE];
> +
> +int
> +aesni_mb_sec_crypto_process_bulk(struct rte_security_session *sess,
> +		struct rte_security_vec buf[], void *iv[], void *aad[],
> +		void *digest[], int status[], uint32_t num)
> +{
> +	struct aesni_mb_sec_session *sec_sess = sess->sess_private_data;
> +	JOB_AES_HMAC *job;
> +	static MB_MGR *mb_mgr;
> +	uint32_t lcore_id = rte_lcore_id();
> +	uint8_t digest_idx = sec_sess->digest_idx;
> +	uint32_t i, processed = 0;
> +	int ret = 0, errcnt = 0;
> +
> +	if (unlikely(sec_mb_mgrs[lcore_id] == NULL)) {

I don't think it is completely safe.
For non-EAL threads rte_lcore_id() == -1.
So at least need to check for lcore_id < RTE_MAX_LCORE.


> +		sec_mb_mgrs[lcore_id] = alloc_init_mb_mgr();
> +
> +		if (sec_mb_mgrs[lcore_id] == NULL) {
> +			for (i = 0; i < num; i++)
> +				status[i] = -ENOMEM;
> +
> +			return -num;
> +		}
> +	}
> +
> +	mb_mgr = sec_mb_mgrs[lcore_id];
> +
> +	for (i = 0; i < num; i++) {
> +		void *seg_buf = buf[i].vec[0].iov_base;
> +		uint32_t buf_len = buf[i].vec[0].iov_len;
> +
> +		job = IMB_GET_NEXT_JOB(mb_mgr);
> +		if (unlikely(job == NULL)) {
> +			processed += flush_mb_sec_mgr(mb_mgr);
> +
> +			job = IMB_GET_NEXT_JOB(mb_mgr);
> +			if (!job) {
> +				errcnt -= 1;
> +				status[i] = -ENOMEM;
> +			}
> +		}
> +
> +		ret = set_mb_job_params_sec(job, sec_sess, seg_buf, buf_len,
> +				iv[i], aad[i], digest[i], &status[i],
> +				&digest_idx);

I still don't understand the purpose of passing digest_idx pointer here...
Why not to just:

ret = set_mb_job_params_sec(job, sec_sess, seg_buf, buf_len,
				iv[i], aad[i], digest[i], &status[i],
				digest_idx);
digest_idx = (digest_idx + 1) % MAX_JOBS;
Second thing, I am not sure what is the purpose to store digest_idx
inside the session (sess->digest_idx) at all?
As I can see you never update it, and it seems just 
digest_idx = 0
at the start of that function is enough?



> +				/* Submit job to multi-buffer for processing */
> +		if (ret) {
> +			processed++;
> +			status[i] = ret;
> +			errcnt -= 1;
> +			continue;
> +		}
> +
> +#ifdef RTE_LIBRTE_PMD_AESNI_MB_DEBUG
> +		job = IMB_SUBMIT_JOB(mb_mgr);
> +#else
> +		job = IMB_SUBMIT_JOB_NOCHECK(mb_mgr);
> +#endif
> +
> +		if (job)
> +			processed += handle_completed_sec_jobs(job, mb_mgr);
> +	}
> +
> +	while (processed < num)
> +		processed += flush_mb_sec_mgr(mb_mgr);
> +
> +	return errcnt;
> +}
> +
>  static int cryptodev_aesni_mb_remove(struct rte_vdev_device *vdev);
> 
>  static int
> @@ -1248,8 +1565,9 @@ cryptodev_aesni_mb_create(const char *name,
>  {
>  	struct rte_cryptodev *dev;
>  	struct aesni_mb_private *internals;
> -	enum aesni_mb_vector_mode vector_mode;
> +	struct rte_security_ctx *sec_ctx;
>  	MB_MGR *mb_mgr;
> +	char sec_name[RTE_DEV_NAME_MAX_LEN];
> 
>  	/* Check CPU for support for AES instruction set */
>  	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_AES)) {
> @@ -1283,35 +1601,14 @@ cryptodev_aesni_mb_create(const char *name,
>  	dev->feature_flags = RTE_CRYPTODEV_FF_SYMMETRIC_CRYPTO |
>  			RTE_CRYPTODEV_FF_SYM_OPERATION_CHAINING |
>  			RTE_CRYPTODEV_FF_CPU_AESNI |
> -			RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT;
> +			RTE_CRYPTODEV_FF_OOP_LB_IN_LB_OUT |
> +			RTE_CRYPTODEV_FF_SECURITY;
> 
> 
> -	mb_mgr = alloc_mb_mgr(0);
> +	mb_mgr = alloc_init_mb_mgr();
>  	if (mb_mgr == NULL)
>  		return -ENOMEM;
> 
> -	switch (vector_mode) {
> -	case RTE_AESNI_MB_SSE:
> -		dev->feature_flags |= RTE_CRYPTODEV_FF_CPU_SSE;
> -		init_mb_mgr_sse(mb_mgr);
> -		break;
> -	case RTE_AESNI_MB_AVX:
> -		dev->feature_flags |= RTE_CRYPTODEV_FF_CPU_AVX;
> -		init_mb_mgr_avx(mb_mgr);
> -		break;
> -	case RTE_AESNI_MB_AVX2:
> -		dev->feature_flags |= RTE_CRYPTODEV_FF_CPU_AVX2;
> -		init_mb_mgr_avx2(mb_mgr);
> -		break;
> -	case RTE_AESNI_MB_AVX512:
> -		dev->feature_flags |= RTE_CRYPTODEV_FF_CPU_AVX512;
> -		init_mb_mgr_avx512(mb_mgr);
> -		break;
> -	default:
> -		AESNI_MB_LOG(ERR, "Unsupported vector mode %u\n", vector_mode);
> -		goto error_exit;
> -	}
> -
>  	/* Set vector instructions mode supported */
>  	internals = dev->data->dev_private;
> 
> @@ -1322,11 +1619,28 @@ cryptodev_aesni_mb_create(const char *name,
>  	AESNI_MB_LOG(INFO, "IPSec Multi-buffer library version used: %s\n",
>  			imb_get_version_str());
> 
> +	/* setup security operations */
> +	snprintf(sec_name, sizeof(sec_name) - 1, "aes_mb_sec_%u",
> +			dev->driver_id);
> +	sec_ctx = rte_zmalloc_socket(sec_name,
> +			sizeof(struct rte_security_ctx),
> +			RTE_CACHE_LINE_SIZE, init_params->socket_id);
> +	if (sec_ctx == NULL) {
> +		AESNI_MB_LOG(ERR, "memory allocation failed\n");
> +		goto error_exit;
> +	}
> +
> +	sec_ctx->device = (void *)dev;
> +	sec_ctx->ops = rte_aesni_mb_pmd_security_ops;
> +	dev->security_ctx = sec_ctx;
> +
>  	return 0;
> 
>  error_exit:
>  	if (mb_mgr)
>  		free_mb_mgr(mb_mgr);
> +	if (sec_ctx)
> +		rte_free(sec_ctx);
> 
>  	rte_cryptodev_pmd_destroy(dev);
> 
> @@ -1367,6 +1681,7 @@ cryptodev_aesni_mb_remove(struct rte_vdev_device *vdev)
>  	struct rte_cryptodev *cryptodev;
>  	struct aesni_mb_private *internals;
>  	const char *name;
> +	uint32_t i;
> 
>  	name = rte_vdev_device_name(vdev);
>  	if (name == NULL)
> @@ -1379,6 +1694,9 @@ cryptodev_aesni_mb_remove(struct rte_vdev_device *vdev)
>  	internals = cryptodev->data->dev_private;
> 
>  	free_mb_mgr(internals->mb_mgr);
> +	for (i = 0; i < RTE_MAX_LCORE; i++)
> +		if (sec_mb_mgrs[i])
> +			free_mb_mgr(sec_mb_mgrs[i]);
> 
>  	return rte_cryptodev_pmd_destroy(cryptodev);
>  }
> diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
> index 8d15b99d4..f47df2d57 100644
> --- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
> +++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
> @@ -8,6 +8,7 @@
>  #include <rte_common.h>
>  #include <rte_malloc.h>
>  #include <rte_cryptodev_pmd.h>
> +#include <rte_security_driver.h>
> 
>  #include "rte_aesni_mb_pmd_private.h"
> 
> @@ -732,7 +733,8 @@ aesni_mb_pmd_qp_count(struct rte_cryptodev *dev)
>  static unsigned
>  aesni_mb_pmd_sym_session_get_size(struct rte_cryptodev *dev __rte_unused)
>  {
> -	return sizeof(struct aesni_mb_session);
> +	return RTE_ALIGN_CEIL(sizeof(struct aesni_mb_session),
> +			RTE_CACHE_LINE_SIZE);
>  }
> 
>  /** Configure a aesni multi-buffer session from a crypto xform chain */
> @@ -810,4 +812,92 @@ struct rte_cryptodev_ops aesni_mb_pmd_ops = {
>  		.sym_session_clear	= aesni_mb_pmd_sym_session_clear
>  };
> 
> +/** Set session authentication parameters */
> +
> +static int
> +aesni_mb_security_session_create(void *dev,
> +		struct rte_security_session_conf *conf,
> +		struct rte_security_session *sess,
> +		struct rte_mempool *mempool)
> +{
> +	struct rte_cryptodev *cdev = dev;
> +	struct aesni_mb_private *internals = cdev->data->dev_private;
> +	struct aesni_mb_sec_session *sess_priv;
> +	int ret;
> +
> +	if (!conf->crypto_xform) {
> +		AESNI_MB_LOG(ERR, "Invalid security session conf");
> +		return -EINVAL;
> +	}
> +
> +	if (conf->cpucrypto.cipher_offset < 0) {
> +		AESNI_MB_LOG(ERR, "Invalid security session conf");
> +		return -EINVAL;
> +	}
> +
> +	if (rte_mempool_get(mempool, (void **)(&sess_priv))) {
> +		AESNI_MB_LOG(ERR,
> +				"Couldn't get object from session mempool");
> +		return -ENOMEM;
> +	}
> +
> +	sess_priv->cipher_offset = conf->cpucrypto.cipher_offset;
> +
> +	ret = aesni_mb_set_session_parameters(internals->mb_mgr,
> +			&sess_priv->sess, conf->crypto_xform);
> +	if (ret != 0) {
> +		AESNI_MB_LOG(ERR, "failed configure session parameters");
> +
> +		rte_mempool_put(mempool, sess_priv);
> +	}
> +
> +	sess->sess_private_data = (void *)sess_priv;
> +
> +	return ret;
> +}
> +
> +static int
> +aesni_mb_security_session_destroy(void *dev __rte_unused,
> +		struct rte_security_session *sess)
> +{
> +	struct aesni_mb_sec_session *sess_priv =
> +			get_sec_session_private_data(sess);
> +
> +	if (sess_priv) {
> +		struct rte_mempool *sess_mp = rte_mempool_from_obj(
> +				(void *)sess_priv);
> +
> +		memset(sess, 0, sizeof(struct aesni_mb_sec_session));
> +		set_sec_session_private_data(sess, NULL);
> +
> +		if (sess_mp == NULL) {
> +			AESNI_MB_LOG(ERR, "failed fetch session mempool");
> +			return -EINVAL;
> +		}
> +
> +		rte_mempool_put(sess_mp, sess_priv);
> +	}
> +
> +	return 0;
> +}
> +
> +static unsigned int
> +aesni_mb_sec_session_get_size(__rte_unused void *device)
> +{
> +	return RTE_ALIGN_CEIL(sizeof(struct aesni_mb_sec_session),
> +			RTE_CACHE_LINE_SIZE);
> +}
> +
> +static struct rte_security_ops aesni_mb_security_ops = {
> +		.session_create = aesni_mb_security_session_create,
> +		.session_get_size = aesni_mb_sec_session_get_size,
> +		.session_update = NULL,
> +		.session_stats_get = NULL,
> +		.session_destroy = aesni_mb_security_session_destroy,
> +		.set_pkt_metadata = NULL,
> +		.capabilities_get = NULL,
> +		.process_cpu_crypto_bulk = aesni_mb_sec_crypto_process_bulk,
> +};
> +
>  struct rte_cryptodev_ops *rte_aesni_mb_pmd_ops = &aesni_mb_pmd_ops;
> +struct rte_security_ops *rte_aesni_mb_pmd_security_ops = &aesni_mb_security_ops;
> diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h
> index b794d4bc1..64b58ca8e 100644
> --- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h
> +++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h
> @@ -176,7 +176,6 @@ struct aesni_mb_qp {
>  	 */
>  } __rte_cache_aligned;
> 
> -/** AES-NI multi-buffer private session structure */
>  struct aesni_mb_session {
>  	JOB_CHAIN_ORDER chain_order;
>  	struct {
> @@ -265,16 +264,32 @@ struct aesni_mb_session {
>  		/** AAD data length */
>  		uint16_t aad_len;
>  	} aead;
> -} __rte_cache_aligned;
> +};
> +
> +/** AES-NI multi-buffer private security session structure */
> +struct aesni_mb_sec_session {
> +	/**< Unique Queue Pair Name */
> +	struct aesni_mb_session sess;
> +	uint8_t temp_digests[MAX_JOBS][DIGEST_LENGTH_MAX];

Same question as for v1:
Probably better to move these temp_digest[][] at the very end?
To have all read-only data grouped together?
Another thought - do you need it here at all?
Can't we just allocate
temp_digests[MAX_JOBS][DIGEST_LENGTH_MAX];
on the stack inside process() function?

> +	uint16_t digest_idx;
> +	uint32_t cipher_offset;
> +	MB_MGR *mb_mgr;
> +};
> 
>  extern int
>  aesni_mb_set_session_parameters(const MB_MGR *mb_mgr,
>  		struct aesni_mb_session *sess,
>  		const struct rte_crypto_sym_xform *xform);
> 
> +extern int
> +aesni_mb_sec_crypto_process_bulk(struct rte_security_session *sess,
> +		struct rte_security_vec buf[], void *iv[], void *aad[],
> +		void *digest[], int status[], uint32_t num);
> +
>  /** device specific operations function pointer structure */
>  extern struct rte_cryptodev_ops *rte_aesni_mb_pmd_ops;
> 
> -
> +/** device specific operations function pointer structure for rte_security */
> +extern struct rte_security_ops *rte_aesni_mb_pmd_security_ops;
> 
>  #endif /* _RTE_AESNI_MB_PMD_PRIVATE_H_ */
> --
> 2.14.5


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [PATCH v2 08/10] ipsec: add rte_security cpu_crypto action support
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 08/10] ipsec: add rte_security cpu_crypto action support Fan Zhang
@ 2019-10-08 23:28       ` Ananyev, Konstantin
  0 siblings, 0 replies; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-10-08 23:28 UTC (permalink / raw)
  To: Zhang, Roy Fan, dev; +Cc: Doherty, Declan, akhil.goyal

Hi Fan,
Comments for inbound part inline.
As I can see majority of my v1 comments still are not addressed.
Please check.
Konstantin

> 
> This patch updates the ipsec library to handle the newly introduced
> RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO action.
> 
> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> ---
>  lib/librte_ipsec/crypto.h   |  24 +++
>  lib/librte_ipsec/esp_inb.c  | 200 ++++++++++++++++++++++--
>  lib/librte_ipsec/esp_outb.c | 369 +++++++++++++++++++++++++++++++++++++++++---
>  lib/librte_ipsec/sa.c       |  53 ++++++-
>  lib/librte_ipsec/sa.h       |  29 ++++
>  lib/librte_ipsec/ses.c      |   4 +-
>  6 files changed, 643 insertions(+), 36 deletions(-)
> 
> diff --git a/lib/librte_ipsec/crypto.h b/lib/librte_ipsec/crypto.h
> index f8fbf8d4f..901c8c7de 100644
> --- a/lib/librte_ipsec/crypto.h
> +++ b/lib/librte_ipsec/crypto.h
> @@ -179,4 +179,28 @@ lksd_none_cop_prepare(struct rte_crypto_op *cop,
>  	__rte_crypto_sym_op_attach_sym_session(sop, cs);
>  }
> 
> +typedef void* (*_set_icv_f)(void *val, struct rte_mbuf *ml, uint32_t icv_off);
> +
> +static inline void *
> +set_icv_va_pa(void *val, struct rte_mbuf *ml, uint32_t icv_off)
> +{
> +	union sym_op_data *icv = val;
> +
> +	icv->va = rte_pktmbuf_mtod_offset(ml, void *, icv_off);
> +	icv->pa = rte_pktmbuf_iova_offset(ml, icv_off);
> +
> +	return icv->va;
> +}
> +
> +static inline void *
> +set_icv_va(__rte_unused void *val, __rte_unused struct rte_mbuf *ml,
> +		__rte_unused uint32_t icv_off)
> +{
> +	void **icv_va = val;
> +
> +	*icv_va = rte_pktmbuf_mtod_offset(ml, void *, icv_off);
> +
> +	return *icv_va;
> +}
> +
>  #endif /* _CRYPTO_H_ */
> diff --git a/lib/librte_ipsec/esp_inb.c b/lib/librte_ipsec/esp_inb.c
> index 8e3ecbc64..c4476e819 100644
> --- a/lib/librte_ipsec/esp_inb.c
> +++ b/lib/librte_ipsec/esp_inb.c
> @@ -105,6 +105,78 @@ inb_cop_prepare(struct rte_crypto_op *cop,
>  	}
>  }
> 
> +static inline int
> +inb_cpu_crypto_proc_prepare(const struct rte_ipsec_sa *sa, struct rte_mbuf *mb,
> +	uint32_t pofs, uint32_t plen,
> +	struct rte_security_vec *buf, struct iovec *cur_vec,
> +	void *iv)
> +{
> +	struct rte_mbuf *ms;
> +	struct iovec *vec = cur_vec;
> +	struct aead_gcm_iv *gcm;
> +	struct aesctr_cnt_blk *ctr;
> +	uint64_t *ivp;
> +	uint32_t algo;
> +	uint32_t left;
> +	uint32_t off = 0, n_seg = 0;

Same comment as for v1:
Please separate variable definition and value assignment.
It makes it hard to read, plus we don't do that in the rest of the library,
so better to follow rest of the code style.

> +
> +	ivp = rte_pktmbuf_mtod_offset(mb, uint64_t *,
> +		pofs + sizeof(struct rte_esp_hdr));
> +	algo = sa->algo_type;
> +
> +	switch (algo) {
> +	case ALGO_TYPE_AES_GCM:
> +		gcm = (struct aead_gcm_iv *)iv;
> +		aead_gcm_iv_fill(gcm, ivp[0], sa->salt);
> +		off = sa->ctp.cipher.offset + pofs;
> +		left = plen - sa->ctp.cipher.length;
> +		break;
> +	case ALGO_TYPE_AES_CBC:
> +	case ALGO_TYPE_3DES_CBC:
> +		copy_iv(iv, ivp, sa->iv_len);
> +		off = sa->ctp.auth.offset + pofs;
> +		left = plen - sa->ctp.auth.length;
> +		break;
> +	case ALGO_TYPE_AES_CTR:
> +		copy_iv(iv, ivp, sa->iv_len);
> +		off = sa->ctp.auth.offset + pofs;
> +		left = plen - sa->ctp.auth.length;
> +		ctr = (struct aesctr_cnt_blk *)iv;
> +		aes_ctr_cnt_blk_fill(ctr, ivp[0], sa->salt);
> +		break;
> +	case ALGO_TYPE_NULL:
> +		left = plen - sa->ctp.cipher.length;
> +		break;
> +	default:
> +		return -EINVAL;

How we can endup here?
If we have an unknown algorithm, shouldn't we fail at init stage?

> +	}
> +
> +	ms = mbuf_get_seg_ofs(mb, &off);
> +	if (!ms)
> +		return -1;

Same comment as for v1:
inb_pkt_prepare() should already check that we have a valid packet.
I don't think there is a need to check for any failure here.
Another thing, our esp header will be in the first segment for sure,
so do we need get_seg_ofs() here at all?


> +
> +	while (n_seg < RTE_LIBRTE_IP_FRAG_MAX_FRAG && left && ms) {
> +		uint32_t len = RTE_MIN(left, ms->data_len - off);


Again, same comments as for v1:

- I don't think this is right, we shouldn't impose additional limitations to
the number of segments in the packet.

- Whole construction seems a bit over-complicated here...
Why just not have a separate function that would dill iovec[] from mbuf
And return an error if there is not enough iovec[] entries?
Something like:

static inline int
mbuf_to_iovec(const struct rte_mbuf *mb, uint32_t ofs, uint32_t len, struct iovec vec[], uint32_t num)
{
     uint32_t i;
     if (mb->nb_seg > num)
        return - mb->nb_seg;

    vec[0].iov_base =  rte_pktmbuf_mtod_offset(mb, void *, off);
    vec[0].iov_len = mb->data_len - off;

    for (i = 1, ms = mb->next; mb != NULL; ms = ms->next, i++) {
        vec[i].iov_base = rte_pktmbuf_mtod(ms);
        vec[i].iov_len = ms->data_len;
    }

   vec[i].iov_len -= mb->pkt_len - len;
   return i;
}

Then we can use that function to fill our iovec[] in a loop.

L- ooking at this function, it seems to consist of 2 separate parts:
1. calculates offset and generates iv
2. setup iovec[].
Probably worth to split it into 2 separate functions like that.
Would be much easier to read/understand.

> +
> +		vec->iov_base = rte_pktmbuf_mtod_offset(ms, void *, off);
> +		vec->iov_len = len;
> +
> +		left -= len;
> +		vec++;
> +		n_seg++;
> +		ms = ms->next;
> +		off = 0;
> +	}
> +
> +	if (left)
> +		return -1;
> +
> +	buf->vec = cur_vec;
> +	buf->num = n_seg;
> +
> +	return n_seg;
> +}
> +
>  /*
>   * Helper function for prepare() to deal with situation when
>   * ICV is spread by two segments. Tries to move ICV completely into the
> @@ -139,20 +211,21 @@ move_icv(struct rte_mbuf *ml, uint32_t ofs)
>   */
>  static inline void
>  inb_pkt_xprepare(const struct rte_ipsec_sa *sa, rte_be64_t sqc,
> -	const union sym_op_data *icv)
> +	uint8_t *icv_va, void *aad_buf, uint32_t aad_off)
>  {
>  	struct aead_gcm_aad *aad;
> 
>  	/* insert SQN.hi between ESP trailer and ICV */
>  	if (sa->sqh_len != 0)
> -		insert_sqh(sqn_hi32(sqc), icv->va, sa->icv_len);
> +		insert_sqh(sqn_hi32(sqc), icv_va, sa->icv_len);
> 
>  	/*
>  	 * fill AAD fields, if any (aad fields are placed after icv),
>  	 * right now we support only one AEAD algorithm: AES-GCM.
>  	 */
>  	if (sa->aad_len != 0) {
> -		aad = (struct aead_gcm_aad *)(icv->va + sa->icv_len);
> +		aad = aad_buf ? aad_buf :
> +				(struct aead_gcm_aad *)(icv_va + aad_off);
>  		aead_gcm_aad_fill(aad, sa->spi, sqc, IS_ESN(sa));
>  	}
>  }
> @@ -162,13 +235,15 @@ inb_pkt_xprepare(const struct rte_ipsec_sa *sa, rte_be64_t sqc,
>   */
>  static inline int32_t
>  inb_pkt_prepare(const struct rte_ipsec_sa *sa, const struct replay_sqn *rsn,
> -	struct rte_mbuf *mb, uint32_t hlen, union sym_op_data *icv)
> +	struct rte_mbuf *mb, uint32_t hlen, _set_icv_f set_icv, void *icv_val,
> +	void *aad_buf)

This whole construct with another function pointer , overloaded arguments, etc.,
looks a bit clumsy and overcomplicated.
I think it would be much cleaner and easier to re-arrange the code like that:

1. update inb_pkt_xprepare to take aad buffer pointer and aad len as a parameters:

static inline void
inb_pkt_xprepare(const struct rte_ipsec_sa *sa, rte_be64_t sqc,
        const union sym_op_data *icv, void *aad, uint32_t aad_len)
{
        /* insert SQN.hi between ESP trailer and ICV */
        if (sa->sqh_len != 0)
                insert_sqh(sqn_hi32(sqc), icv->va, sa->icv_len);

        /*
         * fill AAD fields, if any (aad fields are placed after icv),
         * right now we support only one AEAD algorithm: AES-GCM.
         */
        if (aad_len != 0)
                aead_gcm_aad_fill(aad, sa->spi, sqc, IS_ESN(sa));
}

2. split inb_pkt_prepare() into 2 reusable helper functions:

*
 * retrieve and reconstruct SQN, then check it, then
 * convert it back into network byte order.
 */
static inline int
inb_get_sqn(const struct rte_ipsec_sa *sa, const struct replay_sqn *rsn,
        struct rte_mbuf *mb, uint32_t hlen, rte_be64_t *sqc)
{
        int32_t rc;
        uint64_t sqn;
        struct rte_esp_hdr *esph;

        esph = rte_pktmbuf_mtod_offset(mb, struct rte_esp_hdr *, hlen);

        /*
         * retrieve and reconstruct SQN, then check it, then
         * convert it back into network byte order.
         */
        sqn = rte_be_to_cpu_32(esph->seq);
        if (IS_ESN(sa))
                sqn = reconstruct_esn(rsn->sqn, sqn, sa->replay.win_sz);

        rc = esn_inb_check_sqn(rsn, sa, sqn);
        if (rc == 0)
                *sqc = rte_cpu_to_be_64(sqn);

        return rc;
}

static inline int32_t
inb_prepare(const struct rte_ipsec_sa *sa, struct rte_mbuf *mb,
        uint32_t hlen, uint32_t aad_len, union sym_op_data *icv)
{
        uint32_t clen, icv_len, icv_ofs, plen;
        struct rte_mbuf *ml;

        /* start packet manipulation */
        plen = mb->pkt_len;
        plen = plen - hlen;

        /* check that packet has a valid length */
        clen = plen - sa->ctp.cipher.length;
        if ((int32_t)clen < 0 || (clen & (sa->pad_align - 1)) != 0)
                return -EBADMSG;

        /* find ICV location */
        icv_len = sa->icv_len;
        icv_ofs = mb->pkt_len - icv_len;

        ml = mbuf_get_seg_ofs(mb, &icv_ofs);

        /*
         * if ICV is spread by two segments, then try to
         * move ICV completely into the last segment.
         */
         if (ml->data_len < icv_ofs + icv_len) {

                ml = move_icv(ml, icv_ofs);
                if (ml == NULL)
                        return -ENOSPC;

                /* new ICV location */
                icv_ofs = 0;
        }

        icv_ofs += sa->sqh_len;

        /* we have to allocate space for AAD somewhere,
         * right now - just use free trailing space at the last segment.
         * Would probably be more convenient to reserve space for AAD
         * inside rte_crypto_op itself
         * (again for IV space is already reserved inside cop).
         */
        if (aad_len + sa->sqh_len > rte_pktmbuf_tailroom(ml))
                return -ENOSPC;

        icv->va = rte_pktmbuf_mtod_offset(ml, void *, icv_ofs);
        icv->pa = rte_pktmbuf_iova_offset(ml, icv_ofs);

        /*
         * if esn is used then high-order 32 bits are also used in ICV
         * calculation but are not transmitted, update packet length
         * to be consistent with auth data length and offset, this will
         * be subtracted from packet length in post crypto processing
         */
        mb->pkt_len += sa->sqh_len;
        ml->data_len += sa->sqh_len;

        return plen;
}      
   
3. Now inb_pkt_prepare() becomes a simple sequential invocation of these 3 sub-functions
with right parameters:

static inline int32_t
inb_pkt_prepare(const struct rte_ipsec_sa *sa, const struct replay_sqn *rsn,
        struct rte_mbuf *mb, uint32_t hlen, union sym_op_data *icv)
{
        int32_t rc;
        uint64_t sqn;
        
        rc = inb_get_sqn(sa, rsn, mb, hlen, &sqn);
        if (rc != 0)
                return rc;

        rc = inb_prepare(sa, mb, hlen, sa->aad_len, icv);
        if (rc < 0)
                return rc;

        inb_pkt_xprepare(sa, sqn, icv, icv->va + sa->icv_len, sa->aad_len);
        return rc;
}

4. And that would be version of inb_pkt_prepare for cpu_cypto:

static inline int32_t
inb_cpu_prepare(const struct rte_ipsec_sa *sa, const struct replay_sqn *rsn,
        struct rte_mbuf *mb, uint32_t hlen, union sym_op_data *icv, void *aad)
{
        int32_t rc;
        uint64_t sqn;
        
        rc = inb_get_sqn(sa, rsn, mb, hlen, &sqn);
        if (rc != 0)
                return rc;

        rc = inb_prepare(sa, mb, hlen, 0, icv);
        if (rc < 0)
                return rc;

        inb_pkt_xprepare(sa, sqn, icv, aad, sa->aad_len);
        return rc;
}


>  {
>  	int32_t rc;
>  	uint64_t sqn;
>  	uint32_t clen, icv_len, icv_ofs, plen;
>  	struct rte_mbuf *ml;
>  	struct rte_esp_hdr *esph;
> +	void *icv_va;
> 
>  	esph = rte_pktmbuf_mtod_offset(mb, struct rte_esp_hdr *, hlen);
> 
> @@ -226,8 +301,8 @@ inb_pkt_prepare(const struct rte_ipsec_sa *sa, const struct replay_sqn *rsn,
>  	if (sa->aad_len + sa->sqh_len > rte_pktmbuf_tailroom(ml))
>  		return -ENOSPC;
> 
> -	icv->va = rte_pktmbuf_mtod_offset(ml, void *, icv_ofs);
> -	icv->pa = rte_pktmbuf_iova_offset(ml, icv_ofs);
> +	icv_va = set_icv(icv_val, ml, icv_ofs);
> +	inb_pkt_xprepare(sa, sqn, icv_va, aad_buf, sa->icv_len);
> 
>  	/*
>  	 * if esn is used then high-order 32 bits are also used in ICV
> @@ -238,7 +313,6 @@ inb_pkt_prepare(const struct rte_ipsec_sa *sa, const struct replay_sqn *rsn,
>  	mb->pkt_len += sa->sqh_len;
>  	ml->data_len += sa->sqh_len;
> 
> -	inb_pkt_xprepare(sa, sqn, icv);
>  	return plen;
>  }
> 
> @@ -265,7 +339,8 @@ esp_inb_pkt_prepare(const struct rte_ipsec_session *ss, struct rte_mbuf *mb[],
>  	for (i = 0; i != num; i++) {
> 
>  		hl = mb[i]->l2_len + mb[i]->l3_len;
> -		rc = inb_pkt_prepare(sa, rsn, mb[i], hl, &icv);
> +		rc = inb_pkt_prepare(sa, rsn, mb[i], hl, set_icv_va_pa,
> +				(void *)&icv, NULL);
>  		if (rc >= 0) {
>  			lksd_none_cop_prepare(cop[k], cs, mb[i]);
>  			inb_cop_prepare(cop[k], sa, mb[i], &icv, hl, rc);
> @@ -512,7 +587,6 @@ tun_process(const struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
>  	return k;
>  }
> 
> -
>  /*
>   * *process* function for tunnel packets
>   */
> @@ -625,6 +699,114 @@ esp_inb_pkt_process(struct rte_ipsec_sa *sa, struct rte_mbuf *mb[],
>  	return n;
>  }
> 
> +/*
> + * process packets using sync crypto engine
> + */
> +static uint16_t
> +esp_inb_cpu_crypto_pkt_process(const struct rte_ipsec_session *ss,
> +		struct rte_mbuf *mb[], uint16_t num,
> +		esp_inb_process_t process)
> +{
> +	int32_t rc;
> +	uint32_t i, hl, n, p;
> +	struct rte_ipsec_sa *sa;
> +	struct replay_sqn *rsn;
> +	void *icv_va;
> +	uint32_t sqn[num];
> +	uint32_t dr[num];
> +	uint8_t sqh_len;
> +
> +	/* cpu crypto specific variables */
> +	struct rte_security_vec buf[num];
> +	struct iovec vec[RTE_LIBRTE_IP_FRAG_MAX_FRAG * num];

Same comment as for v1:
I don't think this is right, we shouldn't impose additional limitations to
the number of segments in the packet.

> +	uint32_t vec_idx = 0;
> +	uint64_t iv_buf[num][IPSEC_MAX_IV_QWORD];
> +	void *iv[num];
> +	int status[num];
> +	uint8_t *aad_buf[num][sizeof(struct aead_gcm_aad)];
> +	void *aad[num];
> +	void *digest[num];
> +	uint32_t k;
> +
> +	sa = ss->sa;
> +	rsn = rsn_acquire(sa);
> +	sqh_len = sa->sqh_len;
> +
> +	k = 0;
> +	for (i = 0; i != num; i++) {
> +		hl = mb[i]->l2_len + mb[i]->l3_len;
> +		rc = inb_pkt_prepare(sa, rsn, mb[i], hl, set_icv_va,
> +				(void *)&icv_va, (void *)aad_buf[k]);

Better to have separate function similar to inb_pkt_prepare(), see above.

> +		if (rc >= 0) {
> +			iv[k] = (void *)iv_buf[k];
> +			aad[k] = (void *)aad_buf[k];
> +			digest[k] = (void *)icv_va;
> +
> +			rc = inb_cpu_crypto_proc_prepare(sa, mb[i], hl,
> +					rc, &buf[k], &vec[vec_idx], iv[k]);
> +			if (rc < 0) {
> +				dr[i - k] = i;

I think with you current aproach you can't do like that.
As your vec[] still can contain some entries from fail mbuf.
So in theoryyou need to cleanup these entries which is quite complicated,
better to avoid such case at all. 
I think you need to reorder the code, as described above.

> +				continue;
> +			}
> +
> +			vec_idx += rc;
> +			k++;
> +		} else
> +			dr[i - k] = i;
> +	}
> +
> +	/* copy not prepared mbufs beyond good ones */
> +	if (k != num) {
> +		rte_errno = EBADMSG;
> +
> +		if (unlikely(k == 0))
> +			return 0;
> +
> +		move_bad_mbufs(mb, dr, num, num - k);
> +	}
> +
> +	/* process the packets */
> +	n = 0;
> +	rc = rte_security_process_cpu_crypto_bulk(ss->security.ctx,
> +			ss->security.ses, buf, iv, aad, digest, status, k);
> +	/* move failed process packets to dr */

Same comment as for v1:
That just doesn't look right, instead of updating dr[] and move_bad_mbufs(), you need to:
if (rc != 0) {walk through status[] and for failed ones set PKT_RX_SEC_OFFLOAD_FAILED in appropriate mbuf->ol_flags}.
tun_process(), etc. expects PKT_RX_SEC_OFFLOAD_FAILED to be set in mb->ol_flags
for failed packets.


> +	for (i = 0; i < k; i++) {
> +		if (status[i]) {
> +			dr[n++] = i;
> +			rte_errno = EBADMSG;
> +		}
> +	}
> +
> +	/* move bad packets to the back */
> +	if (n)
> +		move_bad_mbufs(mb, dr, k, n);
> +
> +	/* process packets */
> +	p = process(sa, mb, sqn, dr, k - n, sqh_len);
> +



> +	if (p != k - n && p != 0)
> +		move_bad_mbufs(mb, dr, k - n, k - n - p);
> +
> +	if (p != num)
> +		rte_errno = EBADMSG;
> +
> +	return p;
> +}
> +
> +uint16_t
> +esp_inb_tun_cpu_crypto_pkt_process(const struct rte_ipsec_session *ss,
> +		struct rte_mbuf *mb[], uint16_t num)
> +{
> +	return esp_inb_cpu_crypto_pkt_process(ss, mb, num, tun_process);
> +}
> +
> +uint16_t
> +esp_inb_trs_cpu_crypto_pkt_process(const struct rte_ipsec_session *ss,
> +		struct rte_mbuf *mb[], uint16_t num)
> +{
> +	return esp_inb_cpu_crypto_pkt_process(ss, mb, num, trs_process);
> +}
> +

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-10-07 12:53                                   ` Ananyev, Konstantin
@ 2019-10-09  7:20                                     ` Akhil Goyal
  2019-10-09 13:43                                       ` Ananyev, Konstantin
  0 siblings, 1 reply; 87+ messages in thread
From: Akhil Goyal @ 2019-10-09  7:20 UTC (permalink / raw)
  To: Ananyev, Konstantin, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon',
	Zhang, Roy Fan
  Cc: Doherty, Declan, 'Anoob Joseph'

Hi Konstantin,

> 
> 
> Hi Akhil,
> 
> > > > > > > > > > > > > > > > This action type allows the burst of symmetric crypto
> > > > > workload
> > > > > > > using
> > > > > > > > > > the
> > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > algorithm, key, and direction being processed by CPU
> > > cycles
> > > > > > > > > > > > synchronously.
> > > > > > > > > > > > > > > > This flexible action type does not require external
> > > hardware
> > > > > > > > > > involvement,
> > > > > > > > > > > > > > > > having the crypto workload processed synchronously,
> > > and is
> > > > > > > more
> > > > > > > > > > > > > > performant
> > > > > > > > > > > > > > > > than Cryptodev SW PMD due to the saved cycles on
> > > removed
> > > > > > > "async
> > > > > > > > > > > > mode
> > > > > > > > > > > > > > > > simulation" as well as 3 cacheline access of the
> crypto
> > > ops.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Does that mean application will not call the
> > > > > > > cryptodev_enqueue_burst
> > > > > > > > > > and
> > > > > > > > > > > > > > corresponding dequeue burst.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Yes, instead it just call
> > > rte_security_process_cpu_crypto_bulk(...)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It would be a new API something like process_packets
> and
> > > it
> > > > > will
> > > > > > > have
> > > > > > > > > > the
> > > > > > > > > > > > > > crypto processed packets while returning from the API?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Yes, though the plan is that API will operate on raw data
> > > buffers,
> > > > > > > not
> > > > > > > > > > mbufs.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I still do not understand why we cannot do with the
> > > > > conventional
> > > > > > > > > > crypto lib
> > > > > > > > > > > > > > only.
> > > > > > > > > > > > > > > As far as I can understand, you are not doing any
> protocol
> > > > > > > processing
> > > > > > > > > > or
> > > > > > > > > > > > any
> > > > > > > > > > > > > > value add
> > > > > > > > > > > > > > > To the crypto processing. IMO, you just need a
> > > synchronous
> > > > > > > crypto
> > > > > > > > > > > > processing
> > > > > > > > > > > > > > API which
> > > > > > > > > > > > > > > Can be defined in cryptodev, you don't need to re-
> create a
> > > > > crypto
> > > > > > > > > > session
> > > > > > > > > > > > in
> > > > > > > > > > > > > > the name of
> > > > > > > > > > > > > > > Security session in the driver just to do a synchronous
> > > > > processing.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I suppose your question is why not to have
> > > > > > > > > > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > > > > > > > > > The main reason is that would require disruptive changes
> in
> > > > > existing
> > > > > > > > > > > > cryptodev
> > > > > > > > > > > > > > API
> > > > > > > > > > > > > > (would cause ABI/API breakage).
> > > > > > > > > > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > need
> > > > > > > some
> > > > > > > > > > extra
> > > > > > > > > > > > > > information
> > > > > > > > > > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > > > > > > > > > (cipher offset from the start of the buffer, might be
> > > something
> > > > > extra
> > > > > > > in
> > > > > > > > > > > > future).
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cipher offset will be part of rte_crypto_op.
> > > > > > > > > > > >
> > > > > > > > > > > > fill/read (+ alloc/free) is one of the main things that
> slowdown
> > > > > current
> > > > > > > > > > crypto-op
> > > > > > > > > > > > approach.
> > > > > > > > > > > > That's why the general idea - have all data that wouldn't
> change
> > > > > from
> > > > > > > packet
> > > > > > > > > > to
> > > > > > > > > > > > packet
> > > > > > > > > > > > included into the session and setup it once at session_init().
> > > > > > > > > > >
> > > > > > > > > > > I agree that you cannot use crypto-op.
> > > > > > > > > > > You can have the new API in crypto.
> > > > > > > > > > > As per the current patch, you only need cipher_offset which
> you
> > > can
> > > > > have
> > > > > > > it as
> > > > > > > > > > a parameter until
> > > > > > > > > > > You get it approved in the crypto xform. I believe it will be
> > > beneficial
> > > > > in
> > > > > > > case of
> > > > > > > > > > other crypto cases as well.
> > > > > > > > > > > We can have cipher offset at both places(crypto-op and
> > > > > cipher_xform). It
> > > > > > > will
> > > > > > > > > > give flexibility to the user to
> > > > > > > > > > > override it.
> > > > > > > > > >
> > > > > > > > > > After having another thought on your proposal:
> > > > > > > > > > Probably we can introduce new rte_crypto_sym_xform_types
> for
> > > CPU
> > > > > > > related
> > > > > > > > > > stuff here?
> > > > > > > > >
> > > > > > > > > I also thought of adding new xforms, but that wont serve the
> purpose
> > > for
> > > > > > > may be all the cases.
> > > > > > > > > You would be needing all information currently available in the
> > > current
> > > > > > > xforms.
> > > > > > > > > So if you are adding new fields in the new xform, the size will be
> more
> > > > > than
> > > > > > > that of the union of xforms.
> > > > > > > > > ABI breakage would still be there.
> > > > > > > > >
> > > > > > > > > If you think a valid compression of the AEAD xform can be done,
> then
> > > > > that
> > > > > > > can be done for each of the
> > > > > > > > > Xforms and we can have a solution to this issue.
> > > > > > > >
> > > > > > > > I think that we can re-use iv.offset for our purposes (for crypto
> offset).
> > > > > > > > So for now we can make that path work without any ABI breakage.
> > > > > > > > Fan, please feel free to correct me here, if I missed something.
> > > > > > > > If in future we would need to add some extra information it might
> > > > > > > > require ABI breakage, though by now I don't envision anything
> > > particular to
> > > > > > > add.
> > > > > > > > Anyway, if there is no objection to go that way, we can try to make
> > > > > > > > these changes for v2.
> > > > > > > >
> > > > > > >
> > > > > > > Actually, after looking at it more deeply it appears not that easy as I
> > > thought
> > > > > it
> > > > > > > would be :)
> > > > > > > Below is a very draft version of proposed API additions.
> > > > > > > I think it avoids ABI breakages right now and provides enough
> flexibility
> > > for
> > > > > > > future extensions (if any).
> > > > > > > For now, it doesn't address your comments about naming
> conventions
> > > > > (_CPU_
> > > > > > > vs _SYNC_) , etc.
> > > > > > > but I suppose is comprehensive enough to provide a main idea
> beyond it.
> > > > > > > Akhil and other interested parties, please try to review and provide
> > > feedback
> > > > > > > ASAP,
> > > > > > > as related changes would take some time and we still like to hit 19.11
> > > > > deadline.
> > > > > > > Konstantin
> > > > > > >
> > > > > > >  diff --git a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > > b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > > index bc8da2466..c03069e23 100644
> > > > > > > --- a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > > +++ b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > > @@ -103,6 +103,9 @@ rte_crypto_cipher_operation_strings[];
> > > > > > >   *
> > > > > > >   * This structure contains data relating to Cipher (Encryption and
> > > Decryption)
> > > > > > >   *  use to create a session.
> > > > > > > + * Actually I was wrong saying that we don't have free space inside
> > > xforms.
> > > > > > > + * Making key struct packed (see below) allow us to regain 6B that
> could
> > > be
> > > > > > > + * used for future extensions.
> > > > > > >   */
> > > > > > >  struct rte_crypto_cipher_xform {
> > > > > > >         enum rte_crypto_cipher_operation op;
> > > > > > > @@ -116,7 +119,25 @@ struct rte_crypto_cipher_xform {
> > > > > > >         struct {
> > > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > > -       } key;
> > > > > > > +       } __attribute__((__packed__)) key;
> > > > > > > +
> > > > > > > +       /**
> > > > > > > +         * offset for cipher to start within user provided data buffer.
> > > > > > > +        * Fan suggested another (and less space consuming way) -
> > > > > > > +         * reuse iv.offset space below, by changing:
> > > > > > > +        * struct {uint16_t offset, length;} iv;
> > > > > > > +        * to uunamed union:
> > > > > > > +        * union {
> > > > > > > +        *      struct {uint16_t offset, length;} iv;
> > > > > > > +        *      struct {uint16_t iv_len, crypto_offset} cpu_crypto_param;
> > > > > > > +        * };
> > > > > > > +        * Both approaches seems ok to me in general.
> > > > > >
> > > > > > No strong opinions here. OK with this one.
> > > > > >
> > > > > > > +        * Comments/suggestions are welcome.
> > > > > > > +         */
> > > > > > > +       uint16_t offset;
> > > > >
> > > > > After another thought - it is probably a bit better to have offset as a
> separate
> > > > > field.
> > > > > In that case we can use the same xforms to create both type of sessions.
> > > > ok
> > > > >
> > > > > > > +
> > > > > > > +       uint8_t reserved1[4];
> > > > > > > +
> > > > > > >         /**< Cipher key
> > > > > > >          *
> > > > > > >          * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operation,
> > > key.data
> > > > > will
> > > > > > > @@ -284,7 +305,7 @@ struct rte_crypto_auth_xform {
> > > > > > >         struct {
> > > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > > -       } key;
> > > > > > > +       } __attribute__((__packed__)) key;
> > > > > > >         /**< Authentication key data.
> > > > > > >          * The authentication key length MUST be less than or equal to
> the
> > > > > > >          * block size of the algorithm. It is the callers responsibility to
> > > > > > > @@ -292,6 +313,8 @@ struct rte_crypto_auth_xform {
> > > > > > >          * (for example RFC 2104, FIPS 198a).
> > > > > > >          */
> > > > > > >
> > > > > > > +       uint8_t reserved1[6];
> > > > > > > +
> > > > > > >         struct {
> > > > > > >                 uint16_t offset;
> > > > > > >                 /**< Starting point for Initialisation Vector or Counter,
> > > > > > > @@ -376,7 +399,12 @@ struct rte_crypto_aead_xform {
> > > > > > >         struct {
> > > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > > -       } key;
> > > > > > > +       } __attribute__((__packed__)) key;
> > > > > > > +
> > > > > > > +       /** offset for cipher to start within data buffer */
> > > > > > > +       uint16_t cipher_offset;
> > > > > > > +
> > > > > > > +       uint8_t reserved1[4];
> > > > > > >
> > > > > > >         struct {
> > > > > > >                 uint16_t offset;
> > > > > > > diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > > b/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > > index e175b838c..c0c7bfed7 100644
> > > > > > > --- a/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > > +++ b/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > > @@ -1272,6 +1272,101 @@ void *
> > > > > > >  rte_cryptodev_sym_session_get_user_data(
> > > > > > >                                         struct rte_cryptodev_sym_session *sess);
> > > > > > >
> > > > > > > +/*
> > > > > > > + * After several thoughts decided not to try to squeeze CPU_CRYPTO
> > > > > > > + * into existing rte_crypto_sym_session structure/API, but instead
> > > > > > > + * introduce an extentsion to it via new fully opaque
> > > > > > > + * struct rte_crypto_cpu_sym_session and additional related API.
> > > > > >
> > > > > >
> > > > > > What all things do we need to squeeze?
> > > > > > In this proposal I do not see the new struct cpu_sym_session  defined
> here.
> > > > >
> > > > > The plan is to have it totally opaque to the user, i.e. just:
> > > > > struct rte_crypto_cpu_sym_session;
> > > > > in public header files.
> > > > >
> > > > > > I believe you will have same lib API/struct for cpu_sym_session  and
> > > > > sym_session.
> > > > >
> > > > > I thought about such way, but there are few things that looks clumsy to
> me:
> > > > > 1. Right now there is no 'type' (or so) field inside
> rte_cryptodev_sym_session,
> > > > > so it is not possible to easy distinguish what session do you have:
> lksd_sym or
> > > > > cpu_sym.
> > > > > In theory, there is a hole of 4B inside rte_cryptodev_sym_session, so we
> can
> > > add
> > > > > some extra field
> > > > > here, but in that case  we wouldn't be able to use the same xform for
> both
> > > > > lksd_sym or cpu_sym
> > > > > (which seems really plausible thing for me).
> > > > > 2.  Majority of rte_cryptodev_sym_session fields I think are unnecessary
> for
> > > > > rte_crypto_cpu_sym_session:
> > > > > sess_data[], opaque_data, user_data, nb_drivers.
> > > > > All that consumes space, that could be used somewhere else instead.
> > > > > 3. I am a bit reluctant to touch existing rte_cryptodev API - to avoid any
> > > > > breakages I can't foresee right now.
> > > > > From other side - if we'll add new functions/structs for cpu_sym_session
> we
> > > can
> > > > > mark it
> > > > > and keep it for some time as experimental, so further changes (if needed)
> > > would
> > > > > still be possible.
> > > > >
> > > >
> > > > OK let us assume that you have a separate structure. But I have a few
> queries:
> > > > 1. how can multiple drivers use a same session
> > >
> > > As a short answer: they can't.
> > > It is pretty much the same approach as with rte_security - each device needs
> to
> > > create/init its own session.
> > > So upper layer would need to maintain its own array (or so) for such case.
> > > Though the question is why would you like to have same session over
> multiple
> > > SW backed devices?
> > > As it would be anyway just a synchronous function call that will be executed
> on
> > > the same cpu.
> >
> > I may have single FAT tunnel which may be distributed over multiple
> > Cores, and each core is affined to a different SW device.
> 
> If it is pure SW, then we don't need multiple devices for such scenario.
> Device in that case is pure abstraction that we can skip.

Yes agreed, but that liberty is given to the application whether it need multiple
devices with single queue or a single device with multiple queues.
I think that independence should not be broken in this new API.

> 
> > So a single session may be accessed by multiple devices.
> >
> > One more example would be depending on packet sizes, I may switch between
> > HW/SW PMDs with the same session.
> 
> Sure, but then we'll have multiple sessions.

No, the session will be same and it will have multiple private data for each of the PMD.

> BTW, we have same thing now - these private session pointers are just stored
> inside the same rte_crypto_sym_session.
> And if user wants to support this model, he would also need to store <dev_id,
> queue_id>
> pair for each HW device anyway.

Yes agreed, but how is that thing happening in your new struct, you cannot support that.

> 
> >
> > >
> > > > 2. Can somebody use the scheduler pmd for scheduling the different type
> of
> > > payloads for the same session?
> > >
> > > In theory yes.
> > > Though for that scheduler pmd should have inside it's
> > > rte_crypto_cpu_sym_session an array of pointers to
> > > the underlying devices sessions.
> > >
> > > >
> > > > With your proposal the APIs would be very specific to your use case only.
> > >
> > > Yes in some way.
> > > I consider that API specific for SW backed crypto PMDs.
> > > I can hardly see how any 'real HW' PMDs (lksd-none, lksd-proto) will benefit
> > > from it.
> > > Current crypto-op API is very much HW oriented.
> > > Which is ok, that's for it was intended for, but I think we also need one that
> > > would be designed
> > > for SW backed implementation in mind.
> >
> > We may re-use your API for HW PMDs as well which do not have requirement
> of
> > Crypto-op/mbuf etc.
> > The return type of your new process API may have a status which say
> 'processed'
> > Or can be say 'enqueued'. So if it is  'enqueued', we may have a new API for
> raw
> > Bufs dequeue as well.
> >
> > This requirement can be for any hardware PMDs like QAT as well.
> 
> I don't think it is a good idea to extend this API for async (lookaside) devices.
> You'll need to:
>  - provide dev_id and queue_id for each process(enqueue) and dequeuer
> operation.
>  - provide IOVA for all buffers passing to that function (data buffers, digest, IV,
> aad).
>  - On dequeue provide some way to associate dequed data and digest buffers
> with
>    crypto-session that was used  (and probably with mbuf).
>  So most likely we'll end up with another just version of our current crypto-op
> structure.
> If you'd like to get rid of mbufs dependency within current crypto-op API that
> understandable,
> but I don't think we should have same API for both sync (CPU) and async
> (lookaside) cases.
> It doesn't seem feasible at all and voids whole purpose of that patch.

At this moment we are not much concerned about the dequeue API and about the
HW PMD support. It is just that the new API should be generic enough to be used in
some future scenarios as well. I am just highlighting the possible usecases which can 
be there in future.

What is the issue that you face in making a dev-op for this new API. Do you see any
performance impact with that?

> 
> > That is why a dev-ops would be a better option.
> >
> > >
> > > > When you would add more functionality to this sync API/struct, it will end
> up
> > > being the same API/struct.
> > > >
> > > > Let us  see how close/ far we are from the existing APIs when the actual
> > > implementation is done.
> > > >
> > > > > > I am not sure if that would be needed.
> > > > > > It would be internal to the driver that if synchronous processing is
> > > > > supported(from feature flag) and
> > > > > > Have relevant fields in xform(the newly added ones which are packed
> as
> > > per
> > > > > your suggestions) set,
> > > > > > It will create that type of session.
> > > > > >
> > > > > >
> > > > > > > + * Main points:
> > > > > > > + * - Current crypto-dev API is reasonably mature and it is desirable
> > > > > > > + *   to keep it unchanged (API/ABI stability). From other side, this
> > > > > > > + *   new sync API is new one and probably would require extra
> changes.
> > > > > > > + *   Having it as a new one allows to mark it as experimental, without
> > > > > > > + *   affecting existing one.
> > > > > > > + * - Fully opaque cpu_sym_session structure gives more flexibility
> > > > > > > + *   to the PMD writers and again allows to avoid ABI breakages in
> future.
> > > > > > > + * - process() function per set of xforms
> > > > > > > + *   allows to expose different process() functions for different
> > > > > > > + *   xform combinations. PMD writer can decide, does he wants to
> > > > > > > + *   push all supported algorithms into one process() function,
> > > > > > > + *   or spread it across several ones.
> > > > > > > + *   I.E. More flexibility for PMD writer.
> > > > > >
> > > > > > Which process function should be chosen is internal to PMD, how
> would
> > > that
> > > > > info
> > > > > > be visible to the application or the library. These will get stored in the
> > > session
> > > > > private
> > > > > > data. It would be upto the PMD writer, to store the per session process
> > > > > function in
> > > > > > the session private data.
> > > > > >
> > > > > > Process function would be a dev ops just like enc/deq operations and it
> > > should
> > > > > call
> > > > > > The respective process API stored in the session private data.
> > > > >
> > > > > That model (via devops) is possible, but has several drawbacks from my
> > > > > perspective:
> > > > >
> > > > > 1. It means we'll need to pass dev_id as a parameter to process() function.
> > > > > Though in fact dev_id is not a relevant information for us here
> > > > > (all we need is pointer to the session and pointer to the fuction to call)
> > > > > and I tried to avoid using it in data-path functions for that API.
> > > >
> > > > You have a single vdev, but someone may have multiple vdevs for each
> thread,
> > > or may
> > > > Have same dev with multiple queues for each core.
> > >
> > > That's fine. As I said above it is a SW backed implementation.
> > > Each session has to be a separate entity that contains all necessary
> information
> > > (keys, alg/mode info,  etc.)  to process input buffers.
> > > Plus we need the actual function pointer to call.
> > > I just don't see what for we need a dev_id in that situation.
> >
> > To iterate the session private data in the session.
> >
> > > Again, here we don't need care about queues and their pinning to cores.
> > > If let say someone would like to process buffers from the same IPsec SA on 2
> > > different cores in parallel, he can just create 2 sessions for the same xform,
> > > give one to thread #1  and second to thread #2.
> > > After that both threads are free to call process(this_thread_ses, ...) at will.
> >
> > Say you have a 16core device to handle 100G of traffic on a single tunnel.
> > Will we make 16 sessions with same parameters?
> 
> Absolutely same question we can ask for current crypto-op API.
> You have lookaside crypto-dev with 16 HW queues, each queue is serviced by
> different CPU.
> For the same SA, do you need a separate session per queue, or is it ok to reuse
> current one?
> AFAIK, right now this is a grey area not clearly defined.
> For crypto-devs I am aware - user can reuse the same session (as PMD uses it
> read-only).
> But again, right now I think it is not clearly defined and is implementation
> specific.

User can use the same session, that is what I am also insisting, but it may have separate
Session private data. Cryptodev session create API provide that functionality and we can
Leverage that.

BTW, I can see a v2 to this RFC which is still based on security library. When do you plan
To submit the patches for crypto based APIs. We have RC1 merge deadline for this
patchset on 21st Oct.

As per my understanding you only need a new dev-op for sync support. Session APIs
Will remain the same and you will have some extra fields packed in xform structs.

The PMD will need to maintain a pointer to the per session process function while creating
Session and will be used by the dev-op API at runtime without any extra check at runtime.

> 
> >
> > >
> > > >
> > > > > 2. As you pointed in that case it will be just one process() function per
> device.
> > > > > So if PMD would like to have several process() functions for different type
> of
> > > > > sessions
> > > > > (let say one per alg) first thing it has to do inside it's process() - read
> session
> > > data
> > > > > and
> > > > > based on that, do a jump/call to particular internal sub-routine.
> > > > > Something like:
> > > > > driver_id = get_pmd_driver_id();
> > > > > priv_ses = ses->sess_data[driver_id];
> > > > > Then either:
> > > > > switch(priv_sess->alg) {case XXX: process_XXX(priv_sess, ...);break;...}
> > > > > OR
> > > > > priv_ses->process(priv_sess, ...);
> > > > >
> > > > > to select and call the proper function.
> > > > > Looks like totally unnecessary overhead to me.
> > > > > Though if we'll have ability to query/extract some sort session_ops based
> on
> > > the
> > > > > xform -
> > > > > we can avoid  this extra de-refererence+jump/call thing.
> > > >
> > > > What is the issue in the priv_ses->process(); approach?
> > >
> > > Nothing at all.
> > > What I am saying that schema with dev_ops
> > > dev[dev_id]->dev_ops.process(ses->priv_ses[driver_id], ...)
> > >    |
> > >    |-> priv_ses->process(...)
> > >
> > > Has bigger overhead then just:
> > > process(ses,...);
> > >
> > > So what for to introduce extra-level of indirection here?
> >
> > Explained above.
> >
> > >
> > > > I don't understand what are you saving by not doing this.
> > > > In any case you would need to identify which session correspond to which
> > > process().
> > >
> > > Yes, sure, but I think we can make user to store information that relationship,
> > > in a way he likes: store process() pointer for each session, or group sessions
> > > that share the same process() somehow, or...
> >
> > So whatever relationship that user will make and store will make its life
> complicated.
> > If we can hide that information in the driver, then what is the issue in that and
> user
> > Will not need to worry. He would just call the process() and driver will choose
> which
> > Process need to be called.
> 
> Driver can do that at config/init time.
> Then at run-time we can avoid that choice at all and call already chosen function.
> 
> >
> > I think we should have a POC around this and see the difference in the cycle
> count.
> > IMO it would be negligible and we would end up making a generic API set
> which
> > can be used by others as well.
> >
> > >
> > > > For that you would be doing it somewhere in your data path.
> > >
> > > Why at data-path?
> > > Only once at session creation/initialization time.
> > > Or might be even once per group of sessions.
> > >
> > > >
> > > > >
> > > > > >
> > > > > > I am not sure if you would need a new session init API for this as
> nothing
> > > would
> > > > > be visible to
> > > > > > the app or lib.
> > > > > >
> > > > > > > + * - Not storing process() pointer inside the session -
> > > > > > > + *   Allows user to choose does he want to store a process() pointer
> > > > > > > + *   per session, or per group of sessions for that device that share
> > > > > > > + *   the same input xforms. I.E. extra flexibility for the user,
> > > > > > > + *   plus allows us to keep cpu_sym_session totally opaque, see
> above.
> > > > > >
> > > > > > If multiple sessions need to be processed via the same process function,
> > > > > > PMD would save the same process in all the sessions, I don't think there
> > > would
> > > > > > be any perf overhead with that.
> > > > >
> > > > > I think it would, see above.
> > > > >
> > > > > >
> > > > > > > + * Sketched usage model:
> > > > > > > + * ....
> > > > > > > + * /* control path, alloc/init session */
> > > > > > > + * int32_t sz = rte_crypto_cpu_sym_session_size(dev_id, &xform);
> > > > > > > + * struct rte_crypto_cpu_sym_session *ses = user_alloc(..., sz);
> > > > > > > + * rte_crypto_cpu_sym_process_t process =
> > > > > > > + *     rte_crypto_cpu_sym_session_func(dev_id, &xform);
> > > > > > > + * rte_crypto_cpu_sym_session_init(dev_id, ses, &xform);
> > > > > > > + * ...
> > > > > > > + * /* data-path*/
> > > > > > > + * process(ses, ....);
> > > > > > > + * ....
> > > > > > > + * /* control path, termiante/free session */
> > > > > > > + * rte_crypto_cpu_sym_session_fini(dev_id, ses);
> > > > > > > + */
> > > > > > > +
> > > > > > > +/**
> > > > > > > + * vector structure, contains pointer to vector array and the length
> > > > > > > + * of the array
> > > > > > > + */
> > > > > > > +struct rte_crypto_vec {
> > > > > > > +       struct iovec *vec;
> > > > > > > +       uint32_t num;
> > > > > > > +};
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Data-path bulk process crypto function.
> > > > > > > + */
> > > > > > > +typedef void (*rte_crypto_cpu_sym_process_t)(
> > > > > > > +               struct rte_crypto_cpu_sym_session *sess,
> > > > > > > +               struct rte_crypto_vec buf[], void *iv[], void *aad[],
> > > > > > > +               void *digest[], int status[], uint32_t num);
> > > > > > > +/*
> > > > > > > + * for given device return process function specific to input xforms
> > > > > > > + * on error - return NULL and set rte_errno value.
> > > > > > > + * Note that for same input xfroms for the same device should
> return
> > > > > > > + * the same process function.
> > > > > > > + */
> > > > > > > +__rte_experimental
> > > > > > > +rte_crypto_cpu_sym_process_t
> > > > > > > +rte_crypto_cpu_sym_session_func(uint8_t dev_id,
> > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Return required session size in bytes for given set of xforms.
> > > > > > > + * if xforms == NULL, then return the max possible session size,
> > > > > > > + * that would fit session for any supported by the device algorithm.
> > > > > > > + * if CPU mode is not supported at all, or requeted in xform
> > > > > > > + * algorithm is not supported, then return -ENOTSUP.
> > > > > > > + */
> > > > > > > +__rte_experimental
> > > > > > > +int
> > > > > > > +rte_crypto_cpu_sym_session_size(uint8_t dev_id,
> > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Initialize session.
> > > > > > > + * It is caller responsibility to allocate enough space for it.
> > > > > > > + * See rte_crypto_cpu_sym_session_size above.
> > > > > > > + */
> > > > > > > +__rte_experimental
> > > > > > > +int rte_crypto_cpu_sym_session_init(uint8_t dev_id,
> > > > > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > +
> > > > > > > +__rte_experimental
> > > > > > > +void
> > > > > > > +rte_crypto_cpu_sym_session_fini(uint8_t dev_id,
> > > > > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > > > > +
> > > > > > > +
> > > > > > >  #ifdef __cplusplus
> > > > > > >  }
> > > > > > >  #endif
> > > > > > > diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > > b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > > index defe05ea0..ed7e63fab 100644
> > > > > > > --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > > +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > > @@ -310,6 +310,20 @@ typedef void
> > > > > (*cryptodev_sym_free_session_t)(struct
> > > > > > > rte_cryptodev *dev,
> > > > > > >  typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev
> > > *dev,
> > > > > > >                 struct rte_cryptodev_asym_session *sess);
> > > > > > >
> > > > > > > +typedef int (*cryptodev_cpu_sym_session_size_t) (struct
> rte_cryptodev
> > > > > *dev,
> > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > +
> > > > > > > +typedef int (*cryptodev_cpu_sym_session_init_t) (struct
> rte_cryptodev
> > > > > *dev,
> > > > > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > +
> > > > > > > +typedef void (*cryptodev_cpu_sym_session_fini_t) (struct
> rte_cryptodev
> > > > > *dev,
> > > > > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > > > > +
> > > > > > > +typedef rte_crypto_cpu_sym_process_t
> > > > > (*cryptodev_cpu_sym_session_func_t)
> > > > > > > (
> > > > > > > +                       struct rte_cryptodev *dev,
> > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > +
> > > > > > >  /** Crypto device operations function pointer table */
> > > > > > >  struct rte_cryptodev_ops {
> > > > > > >         cryptodev_configure_t dev_configure;    /**< Configure device.
> */
> > > > > > > @@ -343,6 +357,11 @@ struct rte_cryptodev_ops {
> > > > > > >         /**< Clear a Crypto sessions private data. */
> > > > > > >         cryptodev_asym_free_session_t asym_session_clear;
> > > > > > >         /**< Clear a Crypto sessions private data. */
> > > > > > > +
> > > > > > > +       cryptodev_cpu_sym_session_size_t sym_cpu_session_get_size;
> > > > > > > +       cryptodev_cpu_sym_session_func_t sym_cpu_session_get_func;
> > > > > > > +       cryptodev_cpu_sym_session_init_t sym_cpu_session_init;
> > > > > > > +       cryptodev_cpu_sym_session_fini_t sym_cpu_session_fini;
> > > > > > >  };
> > > > > > >
> > > > > > >
> > > > > > >


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [PATCH v2 05/10] crypto/aesni_mb: add rte_security handler
  2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 05/10] crypto/aesni_mb: add rte_security handler Fan Zhang
  2019-10-08 16:23       ` Ananyev, Konstantin
@ 2019-10-09  8:29       ` Ananyev, Konstantin
  1 sibling, 0 replies; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-10-09  8:29 UTC (permalink / raw)
  To: Zhang, Roy Fan, dev; +Cc: Doherty, Declan, akhil.goyal



> 
> +	/* setup security operations */
> +	snprintf(sec_name, sizeof(sec_name) - 1, "aes_mb_sec_%u",
> +			dev->driver_id);

Just a nit here and in aesni_gcm code:
this is useless actually, rte_malloc ignores name argument.
You can safely pass NULL here.


> +	sec_ctx = rte_zmalloc_socket(sec_name,
> +			sizeof(struct rte_security_ctx),
> +			RTE_CACHE_LINE_SIZE, init_params->socket_id);
> +	if (sec_ctx == NULL) {
> +		AESNI_MB_LOG(ERR, "memory allocation failed\n");
> +		goto error_exit;
> +	}
> +
> +	sec_ctx->device = (void *)dev;
> +	sec_ctx->ops = rte_aesni_mb_pmd_security_ops;
> +	dev->security_ctx = sec_ctx;
> +
>  	return 0;
> 
>  error_exit:
>  	if (mb_mgr)
>  		free_mb_mgr(mb_mgr);
> +	if (sec_ctx)
> +		rte_free(sec_ctx);
> 
>  	rte_cryptodev_pmd_destroy(dev);
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-10-09  7:20                                     ` Akhil Goyal
@ 2019-10-09 13:43                                       ` Ananyev, Konstantin
  2019-10-11 13:23                                         ` Akhil Goyal
  0 siblings, 1 reply; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-10-09 13:43 UTC (permalink / raw)
  To: Akhil Goyal, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon',
	Zhang, Roy Fan
  Cc: Doherty, Declan, 'Anoob Joseph'


Hi Akhil,

> > > > > > > > > > > > > > > > > This action type allows the burst of symmetric crypto
> > > > > > workload
> > > > > > > > using
> > > > > > > > > > > the
> > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > algorithm, key, and direction being processed by CPU
> > > > cycles
> > > > > > > > > > > > > synchronously.
> > > > > > > > > > > > > > > > > This flexible action type does not require external
> > > > hardware
> > > > > > > > > > > involvement,
> > > > > > > > > > > > > > > > > having the crypto workload processed synchronously,
> > > > and is
> > > > > > > > more
> > > > > > > > > > > > > > > performant
> > > > > > > > > > > > > > > > > than Cryptodev SW PMD due to the saved cycles on
> > > > removed
> > > > > > > > "async
> > > > > > > > > > > > > mode
> > > > > > > > > > > > > > > > > simulation" as well as 3 cacheline access of the
> > crypto
> > > > ops.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Does that mean application will not call the
> > > > > > > > cryptodev_enqueue_burst
> > > > > > > > > > > and
> > > > > > > > > > > > > > > corresponding dequeue burst.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yes, instead it just call
> > > > rte_security_process_cpu_crypto_bulk(...)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > It would be a new API something like process_packets
> > and
> > > > it
> > > > > > will
> > > > > > > > have
> > > > > > > > > > > the
> > > > > > > > > > > > > > > crypto processed packets while returning from the API?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yes, though the plan is that API will operate on raw data
> > > > buffers,
> > > > > > > > not
> > > > > > > > > > > mbufs.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I still do not understand why we cannot do with the
> > > > > > conventional
> > > > > > > > > > > crypto lib
> > > > > > > > > > > > > > > only.
> > > > > > > > > > > > > > > > As far as I can understand, you are not doing any
> > protocol
> > > > > > > > processing
> > > > > > > > > > > or
> > > > > > > > > > > > > any
> > > > > > > > > > > > > > > value add
> > > > > > > > > > > > > > > > To the crypto processing. IMO, you just need a
> > > > synchronous
> > > > > > > > crypto
> > > > > > > > > > > > > processing
> > > > > > > > > > > > > > > API which
> > > > > > > > > > > > > > > > Can be defined in cryptodev, you don't need to re-
> > create a
> > > > > > crypto
> > > > > > > > > > > session
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > the name of
> > > > > > > > > > > > > > > > Security session in the driver just to do a synchronous
> > > > > > processing.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I suppose your question is why not to have
> > > > > > > > > > > > > > > rte_crypot_process_cpu_crypto_bulk(...) instead?
> > > > > > > > > > > > > > > The main reason is that would require disruptive changes
> > in
> > > > > > existing
> > > > > > > > > > > > > cryptodev
> > > > > > > > > > > > > > > API
> > > > > > > > > > > > > > > (would cause ABI/API breakage).
> > > > > > > > > > > > > > > Session for  RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO
> > > > need
> > > > > > > > some
> > > > > > > > > > > extra
> > > > > > > > > > > > > > > information
> > > > > > > > > > > > > > > that normal crypto_sym_xform doesn't contain
> > > > > > > > > > > > > > > (cipher offset from the start of the buffer, might be
> > > > something
> > > > > > extra
> > > > > > > > in
> > > > > > > > > > > > > future).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cipher offset will be part of rte_crypto_op.
> > > > > > > > > > > > >
> > > > > > > > > > > > > fill/read (+ alloc/free) is one of the main things that
> > slowdown
> > > > > > current
> > > > > > > > > > > crypto-op
> > > > > > > > > > > > > approach.
> > > > > > > > > > > > > That's why the general idea - have all data that wouldn't
> > change
> > > > > > from
> > > > > > > > packet
> > > > > > > > > > > to
> > > > > > > > > > > > > packet
> > > > > > > > > > > > > included into the session and setup it once at session_init().
> > > > > > > > > > > >
> > > > > > > > > > > > I agree that you cannot use crypto-op.
> > > > > > > > > > > > You can have the new API in crypto.
> > > > > > > > > > > > As per the current patch, you only need cipher_offset which
> > you
> > > > can
> > > > > > have
> > > > > > > > it as
> > > > > > > > > > > a parameter until
> > > > > > > > > > > > You get it approved in the crypto xform. I believe it will be
> > > > beneficial
> > > > > > in
> > > > > > > > case of
> > > > > > > > > > > other crypto cases as well.
> > > > > > > > > > > > We can have cipher offset at both places(crypto-op and
> > > > > > cipher_xform). It
> > > > > > > > will
> > > > > > > > > > > give flexibility to the user to
> > > > > > > > > > > > override it.
> > > > > > > > > > >
> > > > > > > > > > > After having another thought on your proposal:
> > > > > > > > > > > Probably we can introduce new rte_crypto_sym_xform_types
> > for
> > > > CPU
> > > > > > > > related
> > > > > > > > > > > stuff here?
> > > > > > > > > >
> > > > > > > > > > I also thought of adding new xforms, but that wont serve the
> > purpose
> > > > for
> > > > > > > > may be all the cases.
> > > > > > > > > > You would be needing all information currently available in the
> > > > current
> > > > > > > > xforms.
> > > > > > > > > > So if you are adding new fields in the new xform, the size will be
> > more
> > > > > > than
> > > > > > > > that of the union of xforms.
> > > > > > > > > > ABI breakage would still be there.
> > > > > > > > > >
> > > > > > > > > > If you think a valid compression of the AEAD xform can be done,
> > then
> > > > > > that
> > > > > > > > can be done for each of the
> > > > > > > > > > Xforms and we can have a solution to this issue.
> > > > > > > > >
> > > > > > > > > I think that we can re-use iv.offset for our purposes (for crypto
> > offset).
> > > > > > > > > So for now we can make that path work without any ABI breakage.
> > > > > > > > > Fan, please feel free to correct me here, if I missed something.
> > > > > > > > > If in future we would need to add some extra information it might
> > > > > > > > > require ABI breakage, though by now I don't envision anything
> > > > particular to
> > > > > > > > add.
> > > > > > > > > Anyway, if there is no objection to go that way, we can try to make
> > > > > > > > > these changes for v2.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Actually, after looking at it more deeply it appears not that easy as I
> > > > thought
> > > > > > it
> > > > > > > > would be :)
> > > > > > > > Below is a very draft version of proposed API additions.
> > > > > > > > I think it avoids ABI breakages right now and provides enough
> > flexibility
> > > > for
> > > > > > > > future extensions (if any).
> > > > > > > > For now, it doesn't address your comments about naming
> > conventions
> > > > > > (_CPU_
> > > > > > > > vs _SYNC_) , etc.
> > > > > > > > but I suppose is comprehensive enough to provide a main idea
> > beyond it.
> > > > > > > > Akhil and other interested parties, please try to review and provide
> > > > feedback
> > > > > > > > ASAP,
> > > > > > > > as related changes would take some time and we still like to hit 19.11
> > > > > > deadline.
> > > > > > > > Konstantin
> > > > > > > >
> > > > > > > >  diff --git a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > > > b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > > > index bc8da2466..c03069e23 100644
> > > > > > > > --- a/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > > > +++ b/lib/librte_cryptodev/rte_crypto_sym.h
> > > > > > > > @@ -103,6 +103,9 @@ rte_crypto_cipher_operation_strings[];
> > > > > > > >   *
> > > > > > > >   * This structure contains data relating to Cipher (Encryption and
> > > > Decryption)
> > > > > > > >   *  use to create a session.
> > > > > > > > + * Actually I was wrong saying that we don't have free space inside
> > > > xforms.
> > > > > > > > + * Making key struct packed (see below) allow us to regain 6B that
> > could
> > > > be
> > > > > > > > + * used for future extensions.
> > > > > > > >   */
> > > > > > > >  struct rte_crypto_cipher_xform {
> > > > > > > >         enum rte_crypto_cipher_operation op;
> > > > > > > > @@ -116,7 +119,25 @@ struct rte_crypto_cipher_xform {
> > > > > > > >         struct {
> > > > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > > > -       } key;
> > > > > > > > +       } __attribute__((__packed__)) key;
> > > > > > > > +
> > > > > > > > +       /**
> > > > > > > > +         * offset for cipher to start within user provided data buffer.
> > > > > > > > +        * Fan suggested another (and less space consuming way) -
> > > > > > > > +         * reuse iv.offset space below, by changing:
> > > > > > > > +        * struct {uint16_t offset, length;} iv;
> > > > > > > > +        * to uunamed union:
> > > > > > > > +        * union {
> > > > > > > > +        *      struct {uint16_t offset, length;} iv;
> > > > > > > > +        *      struct {uint16_t iv_len, crypto_offset} cpu_crypto_param;
> > > > > > > > +        * };
> > > > > > > > +        * Both approaches seems ok to me in general.
> > > > > > >
> > > > > > > No strong opinions here. OK with this one.
> > > > > > >
> > > > > > > > +        * Comments/suggestions are welcome.
> > > > > > > > +         */
> > > > > > > > +       uint16_t offset;
> > > > > >
> > > > > > After another thought - it is probably a bit better to have offset as a
> > separate
> > > > > > field.
> > > > > > In that case we can use the same xforms to create both type of sessions.
> > > > > ok
> > > > > >
> > > > > > > > +
> > > > > > > > +       uint8_t reserved1[4];
> > > > > > > > +
> > > > > > > >         /**< Cipher key
> > > > > > > >          *
> > > > > > > >          * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operation,
> > > > key.data
> > > > > > will
> > > > > > > > @@ -284,7 +305,7 @@ struct rte_crypto_auth_xform {
> > > > > > > >         struct {
> > > > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > > > -       } key;
> > > > > > > > +       } __attribute__((__packed__)) key;
> > > > > > > >         /**< Authentication key data.
> > > > > > > >          * The authentication key length MUST be less than or equal to
> > the
> > > > > > > >          * block size of the algorithm. It is the callers responsibility to
> > > > > > > > @@ -292,6 +313,8 @@ struct rte_crypto_auth_xform {
> > > > > > > >          * (for example RFC 2104, FIPS 198a).
> > > > > > > >          */
> > > > > > > >
> > > > > > > > +       uint8_t reserved1[6];
> > > > > > > > +
> > > > > > > >         struct {
> > > > > > > >                 uint16_t offset;
> > > > > > > >                 /**< Starting point for Initialisation Vector or Counter,
> > > > > > > > @@ -376,7 +399,12 @@ struct rte_crypto_aead_xform {
> > > > > > > >         struct {
> > > > > > > >                 const uint8_t *data;    /**< pointer to key data */
> > > > > > > >                 uint16_t length;        /**< key length in bytes */
> > > > > > > > -       } key;
> > > > > > > > +       } __attribute__((__packed__)) key;
> > > > > > > > +
> > > > > > > > +       /** offset for cipher to start within data buffer */
> > > > > > > > +       uint16_t cipher_offset;
> > > > > > > > +
> > > > > > > > +       uint8_t reserved1[4];
> > > > > > > >
> > > > > > > >         struct {
> > > > > > > >                 uint16_t offset;
> > > > > > > > diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > > > b/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > > > index e175b838c..c0c7bfed7 100644
> > > > > > > > --- a/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > > > +++ b/lib/librte_cryptodev/rte_cryptodev.h
> > > > > > > > @@ -1272,6 +1272,101 @@ void *
> > > > > > > >  rte_cryptodev_sym_session_get_user_data(
> > > > > > > >                                         struct rte_cryptodev_sym_session *sess);
> > > > > > > >
> > > > > > > > +/*
> > > > > > > > + * After several thoughts decided not to try to squeeze CPU_CRYPTO
> > > > > > > > + * into existing rte_crypto_sym_session structure/API, but instead
> > > > > > > > + * introduce an extentsion to it via new fully opaque
> > > > > > > > + * struct rte_crypto_cpu_sym_session and additional related API.
> > > > > > >
> > > > > > >
> > > > > > > What all things do we need to squeeze?
> > > > > > > In this proposal I do not see the new struct cpu_sym_session  defined
> > here.
> > > > > >
> > > > > > The plan is to have it totally opaque to the user, i.e. just:
> > > > > > struct rte_crypto_cpu_sym_session;
> > > > > > in public header files.
> > > > > >
> > > > > > > I believe you will have same lib API/struct for cpu_sym_session  and
> > > > > > sym_session.
> > > > > >
> > > > > > I thought about such way, but there are few things that looks clumsy to
> > me:
> > > > > > 1. Right now there is no 'type' (or so) field inside
> > rte_cryptodev_sym_session,
> > > > > > so it is not possible to easy distinguish what session do you have:
> > lksd_sym or
> > > > > > cpu_sym.
> > > > > > In theory, there is a hole of 4B inside rte_cryptodev_sym_session, so we
> > can
> > > > add
> > > > > > some extra field
> > > > > > here, but in that case  we wouldn't be able to use the same xform for
> > both
> > > > > > lksd_sym or cpu_sym
> > > > > > (which seems really plausible thing for me).
> > > > > > 2.  Majority of rte_cryptodev_sym_session fields I think are unnecessary
> > for
> > > > > > rte_crypto_cpu_sym_session:
> > > > > > sess_data[], opaque_data, user_data, nb_drivers.
> > > > > > All that consumes space, that could be used somewhere else instead.
> > > > > > 3. I am a bit reluctant to touch existing rte_cryptodev API - to avoid any
> > > > > > breakages I can't foresee right now.
> > > > > > From other side - if we'll add new functions/structs for cpu_sym_session
> > we
> > > > can
> > > > > > mark it
> > > > > > and keep it for some time as experimental, so further changes (if needed)
> > > > would
> > > > > > still be possible.
> > > > > >
> > > > >
> > > > > OK let us assume that you have a separate structure. But I have a few
> > queries:
> > > > > 1. how can multiple drivers use a same session
> > > >
> > > > As a short answer: they can't.
> > > > It is pretty much the same approach as with rte_security - each device needs
> > to
> > > > create/init its own session.
> > > > So upper layer would need to maintain its own array (or so) for such case.
> > > > Though the question is why would you like to have same session over
> > multiple
> > > > SW backed devices?
> > > > As it would be anyway just a synchronous function call that will be executed
> > on
> > > > the same cpu.
> > >
> > > I may have single FAT tunnel which may be distributed over multiple
> > > Cores, and each core is affined to a different SW device.
> >
> > If it is pure SW, then we don't need multiple devices for such scenario.
> > Device in that case is pure abstraction that we can skip.
> 
> Yes agreed, but that liberty is given to the application whether it need multiple
> devices with single queue or a single device with multiple queues.
> I think that independence should not be broken in this new API.
> >
> > > So a single session may be accessed by multiple devices.
> > >
> > > One more example would be depending on packet sizes, I may switch between
> > > HW/SW PMDs with the same session.
> >
> > Sure, but then we'll have multiple sessions.
> 
> No, the session will be same and it will have multiple private data for each of the PMD.
> 
> > BTW, we have same thing now - these private session pointers are just stored
> > inside the same rte_crypto_sym_session.
> > And if user wants to support this model, he would also need to store <dev_id,
> > queue_id>
> > pair for each HW device anyway.
> 
> Yes agreed, but how is that thing happening in your new struct, you cannot support that.

User can store all these info in his own struct.
That's exactly what we have right now.
Let say ipsec-secgw has to store for each IPsec SA:
pointer to crypto-session and/or pointer to security session
plus (for lookaside-devices) cdev_id_qp that allows it to extract
dev_id + queue_id information.
As I understand that works for now, as each ipsec_sa uses only one
dev+queue. Though if someone would like to use multiple devices/queues
for the same SA - he would need to have an array of these <dev+queue> pairs.
So even right now rte_cryptodev_sym_session is not self-consistent and
requires extra information to be maintained by user. 

> 
> >
> > >
> > > >
> > > > > 2. Can somebody use the scheduler pmd for scheduling the different type
> > of
> > > > payloads for the same session?
> > > >
> > > > In theory yes.
> > > > Though for that scheduler pmd should have inside it's
> > > > rte_crypto_cpu_sym_session an array of pointers to
> > > > the underlying devices sessions.
> > > >
> > > > >
> > > > > With your proposal the APIs would be very specific to your use case only.
> > > >
> > > > Yes in some way.
> > > > I consider that API specific for SW backed crypto PMDs.
> > > > I can hardly see how any 'real HW' PMDs (lksd-none, lksd-proto) will benefit
> > > > from it.
> > > > Current crypto-op API is very much HW oriented.
> > > > Which is ok, that's for it was intended for, but I think we also need one that
> > > > would be designed
> > > > for SW backed implementation in mind.
> > >
> > > We may re-use your API for HW PMDs as well which do not have requirement
> > of
> > > Crypto-op/mbuf etc.
> > > The return type of your new process API may have a status which say
> > 'processed'
> > > Or can be say 'enqueued'. So if it is  'enqueued', we may have a new API for
> > raw
> > > Bufs dequeue as well.
> > >
> > > This requirement can be for any hardware PMDs like QAT as well.
> >
> > I don't think it is a good idea to extend this API for async (lookaside) devices.
> > You'll need to:
> >  - provide dev_id and queue_id for each process(enqueue) and dequeuer
> > operation.
> >  - provide IOVA for all buffers passing to that function (data buffers, digest, IV,
> > aad).
> >  - On dequeue provide some way to associate dequed data and digest buffers
> > with
> >    crypto-session that was used  (and probably with mbuf).
> >  So most likely we'll end up with another just version of our current crypto-op
> > structure.
> > If you'd like to get rid of mbufs dependency within current crypto-op API that
> > understandable,
> > but I don't think we should have same API for both sync (CPU) and async
> > (lookaside) cases.
> > It doesn't seem feasible at all and voids whole purpose of that patch.
> 
> At this moment we are not much concerned about the dequeue API and about the
> HW PMD support. It is just that the new API should be generic enough to be used in
> some future scenarios as well. I am just highlighting the possible usecases which can
> be there in future.

Sorry, but I strongly disagree with such approach.
We should stop adding/modifying API 'just in case' and because 'it might be useful for some future HW'.
Inside DPDK we already do have too many dev level APIs without any implementations.
That's quite bad practice and very dis-orienting for end-users.
I think to justify API additions/changes we need at least one proper implementation for it,
or at least some strong evidence that people are really committed to support it in nearest future.
BTW, that what TB agreed on, nearly a year ago.  

This new API (if we'll go ahead with it of course) would stay experimental for some time anyway
to make sure we don't miss anything needed (I think for about a year time-frame).
So if you guys *really* want to extend it support _async_ devices too -
I am open for modifications/additions here.
Though personally I think such addition would over-complicate things and we'll end up with
another reincarnation of current crypto-op.
We actually discussed it internally, and decided to drop that idea because of that.  
Again, my opinion - for lookaside devices it might be better to try to optimize
current crypto-op path (remove mbuf requirement, probably add  ability to
group by session on enqueue/dequeue, etc.). 

> 
> What is the issue that you face in making a dev-op for this new API. Do you see any
> performance impact with that?

There are two main things:
1. user would need to maintain and provide for each process() call dev_id+queue_id.
That's means extra (and totally unnecessary for SW) overhead. 
2. yes I would expect some perf overhead too - it would be extra call or branch.
Again as it would be data-dependency - most likely cpu wouldn't be able to  pipeline
it efficiently:

rte_crypto_sym_process(uint8_t dev_id, uint16 qp_id, rte_crypto_sym_session *ses, ...)
{
     struct rte_cryptodev *dev = &rte_cryptodevs[dev_id];
     return (*dev->process)(sess->data[dev->driver_id, ...);
}

driver_specific_process(driver_specific_sym_session *sess)
{
   return sess->process(sess, ...) ;
}

I didn't make any exact measurements but sure it would be slower than just:
session_udata->process(session->udata->sess, ...);
Again it would be much more noticeable on low end cpus.
Let say here: http://mails.dpdk.org/archives/dev/2019-September/144350.html
Jerin claims 1.5-3% drop for introducing extra call via hiding eth_dev contents -
I suppose we would have something similar here.
I do realize that in majority of cases crypto is more expensive then RX/TX, but still. 

If it would be a really unavoidable tradeoff (support already existing API, or so)
I wouldn't mind, but I don't see any real need for it right now.

> 
> >
> > > That is why a dev-ops would be a better option.
> > >
> > > >
> > > > > When you would add more functionality to this sync API/struct, it will end
> > up
> > > > being the same API/struct.
> > > > >
> > > > > Let us  see how close/ far we are from the existing APIs when the actual
> > > > implementation is done.
> > > > >
> > > > > > > I am not sure if that would be needed.
> > > > > > > It would be internal to the driver that if synchronous processing is
> > > > > > supported(from feature flag) and
> > > > > > > Have relevant fields in xform(the newly added ones which are packed
> > as
> > > > per
> > > > > > your suggestions) set,
> > > > > > > It will create that type of session.
> > > > > > >
> > > > > > >
> > > > > > > > + * Main points:
> > > > > > > > + * - Current crypto-dev API is reasonably mature and it is desirable
> > > > > > > > + *   to keep it unchanged (API/ABI stability). From other side, this
> > > > > > > > + *   new sync API is new one and probably would require extra
> > changes.
> > > > > > > > + *   Having it as a new one allows to mark it as experimental, without
> > > > > > > > + *   affecting existing one.
> > > > > > > > + * - Fully opaque cpu_sym_session structure gives more flexibility
> > > > > > > > + *   to the PMD writers and again allows to avoid ABI breakages in
> > future.
> > > > > > > > + * - process() function per set of xforms
> > > > > > > > + *   allows to expose different process() functions for different
> > > > > > > > + *   xform combinations. PMD writer can decide, does he wants to
> > > > > > > > + *   push all supported algorithms into one process() function,
> > > > > > > > + *   or spread it across several ones.
> > > > > > > > + *   I.E. More flexibility for PMD writer.
> > > > > > >
> > > > > > > Which process function should be chosen is internal to PMD, how
> > would
> > > > that
> > > > > > info
> > > > > > > be visible to the application or the library. These will get stored in the
> > > > session
> > > > > > private
> > > > > > > data. It would be upto the PMD writer, to store the per session process
> > > > > > function in
> > > > > > > the session private data.
> > > > > > >
> > > > > > > Process function would be a dev ops just like enc/deq operations and it
> > > > should
> > > > > > call
> > > > > > > The respective process API stored in the session private data.
> > > > > >
> > > > > > That model (via devops) is possible, but has several drawbacks from my
> > > > > > perspective:
> > > > > >
> > > > > > 1. It means we'll need to pass dev_id as a parameter to process() function.
> > > > > > Though in fact dev_id is not a relevant information for us here
> > > > > > (all we need is pointer to the session and pointer to the fuction to call)
> > > > > > and I tried to avoid using it in data-path functions for that API.
> > > > >
> > > > > You have a single vdev, but someone may have multiple vdevs for each
> > thread,
> > > > or may
> > > > > Have same dev with multiple queues for each core.
> > > >
> > > > That's fine. As I said above it is a SW backed implementation.
> > > > Each session has to be a separate entity that contains all necessary
> > information
> > > > (keys, alg/mode info,  etc.)  to process input buffers.
> > > > Plus we need the actual function pointer to call.
> > > > I just don't see what for we need a dev_id in that situation.
> > >
> > > To iterate the session private data in the session.
> > >
> > > > Again, here we don't need care about queues and their pinning to cores.
> > > > If let say someone would like to process buffers from the same IPsec SA on 2
> > > > different cores in parallel, he can just create 2 sessions for the same xform,
> > > > give one to thread #1  and second to thread #2.
> > > > After that both threads are free to call process(this_thread_ses, ...) at will.
> > >
> > > Say you have a 16core device to handle 100G of traffic on a single tunnel.
> > > Will we make 16 sessions with same parameters?
> >
> > Absolutely same question we can ask for current crypto-op API.
> > You have lookaside crypto-dev with 16 HW queues, each queue is serviced by
> > different CPU.
> > For the same SA, do you need a separate session per queue, or is it ok to reuse
> > current one?
> > AFAIK, right now this is a grey area not clearly defined.
> > For crypto-devs I am aware - user can reuse the same session (as PMD uses it
> > read-only).
> > But again, right now I think it is not clearly defined and is implementation
> > specific.
> 
> User can use the same session, that is what I am also insisting, but it may have separate
> Session private data. Cryptodev session create API provide that functionality and we can
> Leverage that.

rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which means we can't use
the same rte_cryptodev_sym_session to hold sessions for both sync and async mode
for the same device. Off course we can add a hard requirement that any driver that wants to
support process() has to create sessions that can handle both  process and enqueue/dequeue,
but then again  what for to create such overhead?

BTW, to be honest, I don't consider current rte_cryptodev_sym_session construct for multiple device_ids:
__extension__ struct {
                void *data;
                uint16_t refcnt;
        } sess_data[0];
        /**< Driver specific session material, variable size */

as an advantage.
It looks too error prone for me:
1. Simultaneous session initialization/de-initialization for devices with the same driver_id is not possible.
2. It assumes that all device driver will be loaded before we start to create session pools.

Right now it seems ok, as no-one requires such functionality, but I don't know how it will be in future.
For me rte_security session model, where for each security context user have to create new session
looks much more robust.
 
> 
> BTW, I can see a v2 to this RFC which is still based on security library.

Yes, v2 was concentrated on fixing found issues, some code restructuring, 
i.e. - changes that would be needed anyway whatever API aproach we'll choose.

> When do you plan
> To submit the patches for crypto based APIs. We have RC1 merge deadline for this
> patchset on 21st Oct.

We'd like to start working on it ASAP, but it seems we still have a major disagreement
about how this crypto-dev API should look like.  
Which makes me think - should we return to our original proposal via rte_security?
It still looks to me like clean and straightforward way to enable this new API,
and probably wouldn't cause that much controversy.
What do you think? 

> 
> As per my understanding you only need a new dev-op for sync support. Session APIs
> Will remain the same and you will have some extra fields packed in xform structs.
> 
> The PMD will need to maintain a pointer to the per session process function while creating
> Session and will be used by the dev-op API at runtime without any extra check at runtime.
> 
> >
> > >
> > > >
> > > > >
> > > > > > 2. As you pointed in that case it will be just one process() function per
> > device.
> > > > > > So if PMD would like to have several process() functions for different type
> > of
> > > > > > sessions
> > > > > > (let say one per alg) first thing it has to do inside it's process() - read
> > session
> > > > data
> > > > > > and
> > > > > > based on that, do a jump/call to particular internal sub-routine.
> > > > > > Something like:
> > > > > > driver_id = get_pmd_driver_id();
> > > > > > priv_ses = ses->sess_data[driver_id];
> > > > > > Then either:
> > > > > > switch(priv_sess->alg) {case XXX: process_XXX(priv_sess, ...);break;...}
> > > > > > OR
> > > > > > priv_ses->process(priv_sess, ...);
> > > > > >
> > > > > > to select and call the proper function.
> > > > > > Looks like totally unnecessary overhead to me.
> > > > > > Though if we'll have ability to query/extract some sort session_ops based
> > on
> > > > the
> > > > > > xform -
> > > > > > we can avoid  this extra de-refererence+jump/call thing.
> > > > >
> > > > > What is the issue in the priv_ses->process(); approach?
> > > >
> > > > Nothing at all.
> > > > What I am saying that schema with dev_ops
> > > > dev[dev_id]->dev_ops.process(ses->priv_ses[driver_id], ...)
> > > >    |
> > > >    |-> priv_ses->process(...)
> > > >
> > > > Has bigger overhead then just:
> > > > process(ses,...);
> > > >
> > > > So what for to introduce extra-level of indirection here?
> > >
> > > Explained above.
> > >
> > > >
> > > > > I don't understand what are you saving by not doing this.
> > > > > In any case you would need to identify which session correspond to which
> > > > process().
> > > >
> > > > Yes, sure, but I think we can make user to store information that relationship,
> > > > in a way he likes: store process() pointer for each session, or group sessions
> > > > that share the same process() somehow, or...
> > >
> > > So whatever relationship that user will make and store will make its life
> > complicated.
> > > If we can hide that information in the driver, then what is the issue in that and
> > user
> > > Will not need to worry. He would just call the process() and driver will choose
> > which
> > > Process need to be called.
> >
> > Driver can do that at config/init time.
> > Then at run-time we can avoid that choice at all and call already chosen function.
> >
> > >
> > > I think we should have a POC around this and see the difference in the cycle
> > count.
> > > IMO it would be negligible and we would end up making a generic API set
> > which
> > > can be used by others as well.
> > >
> > > >
> > > > > For that you would be doing it somewhere in your data path.
> > > >
> > > > Why at data-path?
> > > > Only once at session creation/initialization time.
> > > > Or might be even once per group of sessions.
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > I am not sure if you would need a new session init API for this as
> > nothing
> > > > would
> > > > > > be visible to
> > > > > > > the app or lib.
> > > > > > >
> > > > > > > > + * - Not storing process() pointer inside the session -
> > > > > > > > + *   Allows user to choose does he want to store a process() pointer
> > > > > > > > + *   per session, or per group of sessions for that device that share
> > > > > > > > + *   the same input xforms. I.E. extra flexibility for the user,
> > > > > > > > + *   plus allows us to keep cpu_sym_session totally opaque, see
> > above.
> > > > > > >
> > > > > > > If multiple sessions need to be processed via the same process function,
> > > > > > > PMD would save the same process in all the sessions, I don't think there
> > > > would
> > > > > > > be any perf overhead with that.
> > > > > >
> > > > > > I think it would, see above.
> > > > > >
> > > > > > >
> > > > > > > > + * Sketched usage model:
> > > > > > > > + * ....
> > > > > > > > + * /* control path, alloc/init session */
> > > > > > > > + * int32_t sz = rte_crypto_cpu_sym_session_size(dev_id, &xform);
> > > > > > > > + * struct rte_crypto_cpu_sym_session *ses = user_alloc(..., sz);
> > > > > > > > + * rte_crypto_cpu_sym_process_t process =
> > > > > > > > + *     rte_crypto_cpu_sym_session_func(dev_id, &xform);
> > > > > > > > + * rte_crypto_cpu_sym_session_init(dev_id, ses, &xform);
> > > > > > > > + * ...
> > > > > > > > + * /* data-path*/
> > > > > > > > + * process(ses, ....);
> > > > > > > > + * ....
> > > > > > > > + * /* control path, termiante/free session */
> > > > > > > > + * rte_crypto_cpu_sym_session_fini(dev_id, ses);
> > > > > > > > + */
> > > > > > > > +
> > > > > > > > +/**
> > > > > > > > + * vector structure, contains pointer to vector array and the length
> > > > > > > > + * of the array
> > > > > > > > + */
> > > > > > > > +struct rte_crypto_vec {
> > > > > > > > +       struct iovec *vec;
> > > > > > > > +       uint32_t num;
> > > > > > > > +};
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * Data-path bulk process crypto function.
> > > > > > > > + */
> > > > > > > > +typedef void (*rte_crypto_cpu_sym_process_t)(
> > > > > > > > +               struct rte_crypto_cpu_sym_session *sess,
> > > > > > > > +               struct rte_crypto_vec buf[], void *iv[], void *aad[],
> > > > > > > > +               void *digest[], int status[], uint32_t num);
> > > > > > > > +/*
> > > > > > > > + * for given device return process function specific to input xforms
> > > > > > > > + * on error - return NULL and set rte_errno value.
> > > > > > > > + * Note that for same input xfroms for the same device should
> > return
> > > > > > > > + * the same process function.
> > > > > > > > + */
> > > > > > > > +__rte_experimental
> > > > > > > > +rte_crypto_cpu_sym_process_t
> > > > > > > > +rte_crypto_cpu_sym_session_func(uint8_t dev_id,
> > > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * Return required session size in bytes for given set of xforms.
> > > > > > > > + * if xforms == NULL, then return the max possible session size,
> > > > > > > > + * that would fit session for any supported by the device algorithm.
> > > > > > > > + * if CPU mode is not supported at all, or requeted in xform
> > > > > > > > + * algorithm is not supported, then return -ENOTSUP.
> > > > > > > > + */
> > > > > > > > +__rte_experimental
> > > > > > > > +int
> > > > > > > > +rte_crypto_cpu_sym_session_size(uint8_t dev_id,
> > > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * Initialize session.
> > > > > > > > + * It is caller responsibility to allocate enough space for it.
> > > > > > > > + * See rte_crypto_cpu_sym_session_size above.
> > > > > > > > + */
> > > > > > > > +__rte_experimental
> > > > > > > > +int rte_crypto_cpu_sym_session_init(uint8_t dev_id,
> > > > > > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > > +
> > > > > > > > +__rte_experimental
> > > > > > > > +void
> > > > > > > > +rte_crypto_cpu_sym_session_fini(uint8_t dev_id,
> > > > > > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > > > > > +
> > > > > > > > +
> > > > > > > >  #ifdef __cplusplus
> > > > > > > >  }
> > > > > > > >  #endif
> > > > > > > > diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > > > b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > > > index defe05ea0..ed7e63fab 100644
> > > > > > > > --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > > > +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > > > > > > > @@ -310,6 +310,20 @@ typedef void
> > > > > > (*cryptodev_sym_free_session_t)(struct
> > > > > > > > rte_cryptodev *dev,
> > > > > > > >  typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev
> > > > *dev,
> > > > > > > >                 struct rte_cryptodev_asym_session *sess);
> > > > > > > >
> > > > > > > > +typedef int (*cryptodev_cpu_sym_session_size_t) (struct
> > rte_cryptodev
> > > > > > *dev,
> > > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > > +
> > > > > > > > +typedef int (*cryptodev_cpu_sym_session_init_t) (struct
> > rte_cryptodev
> > > > > > *dev,
> > > > > > > > +                       struct rte_crypto_cpu_sym_session *sess,
> > > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > > +
> > > > > > > > +typedef void (*cryptodev_cpu_sym_session_fini_t) (struct
> > rte_cryptodev
> > > > > > *dev,
> > > > > > > > +                       struct rte_crypto_cpu_sym_session *sess);
> > > > > > > > +
> > > > > > > > +typedef rte_crypto_cpu_sym_process_t
> > > > > > (*cryptodev_cpu_sym_session_func_t)
> > > > > > > > (
> > > > > > > > +                       struct rte_cryptodev *dev,
> > > > > > > > +                       const struct rte_crypto_sym_xform *xforms);
> > > > > > > > +
> > > > > > > >  /** Crypto device operations function pointer table */
> > > > > > > >  struct rte_cryptodev_ops {
> > > > > > > >         cryptodev_configure_t dev_configure;    /**< Configure device.
> > */
> > > > > > > > @@ -343,6 +357,11 @@ struct rte_cryptodev_ops {
> > > > > > > >         /**< Clear a Crypto sessions private data. */
> > > > > > > >         cryptodev_asym_free_session_t asym_session_clear;
> > > > > > > >         /**< Clear a Crypto sessions private data. */
> > > > > > > > +
> > > > > > > > +       cryptodev_cpu_sym_session_size_t sym_cpu_session_get_size;
> > > > > > > > +       cryptodev_cpu_sym_session_func_t sym_cpu_session_get_func;
> > > > > > > > +       cryptodev_cpu_sym_session_init_t sym_cpu_session_init;
> > > > > > > > +       cryptodev_cpu_sym_session_fini_t sym_cpu_session_fini;
> > > > > > > >  };
> > > > > > > >
> > > > > > > >
> > > > > > > >


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-10-09 13:43                                       ` Ananyev, Konstantin
@ 2019-10-11 13:23                                         ` Akhil Goyal
  2019-10-13 23:07                                           ` Zhang, Roy Fan
  2019-10-16 22:07                                           ` Ananyev, Konstantin
  0 siblings, 2 replies; 87+ messages in thread
From: Akhil Goyal @ 2019-10-11 13:23 UTC (permalink / raw)
  To: Ananyev, Konstantin, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon',
	Zhang, Roy Fan, Doherty, Declan
  Cc: 'Anoob Joseph'

Hi Konstantin,

> 
> Hi Akhil,
> 
..[snip]

> > > > > > OK let us assume that you have a separate structure. But I have a few
> > > queries:
> > > > > > 1. how can multiple drivers use a same session
> > > > >
> > > > > As a short answer: they can't.
> > > > > It is pretty much the same approach as with rte_security - each device
> needs
> > > to
> > > > > create/init its own session.
> > > > > So upper layer would need to maintain its own array (or so) for such case.
> > > > > Though the question is why would you like to have same session over
> > > multiple
> > > > > SW backed devices?
> > > > > As it would be anyway just a synchronous function call that will be
> executed
> > > on
> > > > > the same cpu.
> > > >
> > > > I may have single FAT tunnel which may be distributed over multiple
> > > > Cores, and each core is affined to a different SW device.
> > >
> > > If it is pure SW, then we don't need multiple devices for such scenario.
> > > Device in that case is pure abstraction that we can skip.
> >
> > Yes agreed, but that liberty is given to the application whether it need multiple
> > devices with single queue or a single device with multiple queues.
> > I think that independence should not be broken in this new API.
> > >
> > > > So a single session may be accessed by multiple devices.
> > > >
> > > > One more example would be depending on packet sizes, I may switch
> between
> > > > HW/SW PMDs with the same session.
> > >
> > > Sure, but then we'll have multiple sessions.
> >
> > No, the session will be same and it will have multiple private data for each of
> the PMD.
> >
> > > BTW, we have same thing now - these private session pointers are just
> stored
> > > inside the same rte_crypto_sym_session.
> > > And if user wants to support this model, he would also need to store <dev_id,
> > > queue_id>
> > > pair for each HW device anyway.
> >
> > Yes agreed, but how is that thing happening in your new struct, you cannot
> support that.
> 
> User can store all these info in his own struct.
> That's exactly what we have right now.
> Let say ipsec-secgw has to store for each IPsec SA:
> pointer to crypto-session and/or pointer to security session
> plus (for lookaside-devices) cdev_id_qp that allows it to extract
> dev_id + queue_id information.
> As I understand that works for now, as each ipsec_sa uses only one
> dev+queue. Though if someone would like to use multiple devices/queues
> for the same SA - he would need to have an array of these <dev+queue> pairs.
> So even right now rte_cryptodev_sym_session is not self-consistent and
> requires extra information to be maintained by user.

Why are you increasing the complexity for the user application.
The new APIs and struct should be such that it need to do minimum changes in the stack
so that stack is portable on multiple vendors.
You should try to hide as much complexity in the driver or lib to give the user simple APIs.

Having a same session for multiple devices was added by Intel only for some use cases.
And we had split that session create API into 2. Now if those are not useful shall we move back
to the single API. I think @Doherty, Declan and @De Lara Guarch, Pablo can comment on this.

> 
> >
> > >
> > > >
> > > > >
> > > > > > 2. Can somebody use the scheduler pmd for scheduling the different
> type
> > > of
> > > > > payloads for the same session?
> > > > >
> > > > > In theory yes.
> > > > > Though for that scheduler pmd should have inside it's
> > > > > rte_crypto_cpu_sym_session an array of pointers to
> > > > > the underlying devices sessions.
> > > > >
> > > > > >
> > > > > > With your proposal the APIs would be very specific to your use case
> only.
> > > > >
> > > > > Yes in some way.
> > > > > I consider that API specific for SW backed crypto PMDs.
> > > > > I can hardly see how any 'real HW' PMDs (lksd-none, lksd-proto) will
> benefit
> > > > > from it.
> > > > > Current crypto-op API is very much HW oriented.
> > > > > Which is ok, that's for it was intended for, but I think we also need one
> that
> > > > > would be designed
> > > > > for SW backed implementation in mind.
> > > >
> > > > We may re-use your API for HW PMDs as well which do not have
> requirement
> > > of
> > > > Crypto-op/mbuf etc.
> > > > The return type of your new process API may have a status which say
> > > 'processed'
> > > > Or can be say 'enqueued'. So if it is  'enqueued', we may have a new API for
> > > raw
> > > > Bufs dequeue as well.
> > > >
> > > > This requirement can be for any hardware PMDs like QAT as well.
> > >
> > > I don't think it is a good idea to extend this API for async (lookaside) devices.
> > > You'll need to:
> > >  - provide dev_id and queue_id for each process(enqueue) and dequeuer
> > > operation.
> > >  - provide IOVA for all buffers passing to that function (data buffers, digest,
> IV,
> > > aad).
> > >  - On dequeue provide some way to associate dequed data and digest buffers
> > > with
> > >    crypto-session that was used  (and probably with mbuf).
> > >  So most likely we'll end up with another just version of our current crypto-op
> > > structure.
> > > If you'd like to get rid of mbufs dependency within current crypto-op API that
> > > understandable,
> > > but I don't think we should have same API for both sync (CPU) and async
> > > (lookaside) cases.
> > > It doesn't seem feasible at all and voids whole purpose of that patch.
> >
> > At this moment we are not much concerned about the dequeue API and about
> the
> > HW PMD support. It is just that the new API should be generic enough to be
> used in
> > some future scenarios as well. I am just highlighting the possible usecases
> which can
> > be there in future.
> 
> Sorry, but I strongly disagree with such approach.
> We should stop adding/modifying API 'just in case' and because 'it might be
> useful for some future HW'.
> Inside DPDK we already do have too many dev level APIs without any
> implementations.
> That's quite bad practice and very dis-orienting for end-users.
> I think to justify API additions/changes we need at least one proper
> implementation for it,
> or at least some strong evidence that people are really committed to support it
> in nearest future.
> BTW, that what TB agreed on, nearly a year ago.
> 
> This new API (if we'll go ahead with it of course) would stay experimental for
> some time anyway
> to make sure we don't miss anything needed (I think for about a year time-
> frame).
> So if you guys *really* want to extend it support _async_ devices too -
> I am open for modifications/additions here.
> Though personally I think such addition would over-complicate things and we'll
> end up with
> another reincarnation of current crypto-op.
> We actually discussed it internally, and decided to drop that idea because of that.
> Again, my opinion - for lookaside devices it might be better to try to optimize
> current crypto-op path (remove mbuf requirement, probably add  ability to
> group by session on enqueue/dequeue, etc.).

I agree that the new API is experimental and can be modified later. So no issues in that,
but we can keep some things in mind while defining APIs. These were some comments from
my side, if those are impacting the current scenario, you can drop those. We will take care of those
later.

> 
> >
> > What is the issue that you face in making a dev-op for this new API. Do you see
> any
> > performance impact with that?
> 
> There are two main things:
> 1. user would need to maintain and provide for each process() call
> dev_id+queue_id.
> That's means extra (and totally unnecessary for SW) overhead.

You are using a crypto device for performing the processing,
you must use dev_id to identify which SW device it is. This is how the DPDK
Framework works.
.

> 2. yes I would expect some perf overhead too - it would be extra call or branch.
> Again as it would be data-dependency - most likely cpu wouldn't be able to
> pipeline
> it efficiently:
> 
> rte_crypto_sym_process(uint8_t dev_id, uint16 qp_id, rte_crypto_sym_session
> *ses, ...)
> {
>      struct rte_cryptodev *dev = &rte_cryptodevs[dev_id];
>      return (*dev->process)(sess->data[dev->driver_id, ...);
> }
> 
> driver_specific_process(driver_specific_sym_session *sess)
> {
>    return sess->process(sess, ...) ;
> }
> 
> I didn't make any exact measurements but sure it would be slower than just:
> session_udata->process(session->udata->sess, ...);
> Again it would be much more noticeable on low end cpus.
> Let say here:
> http://mails.dpdk.org/archives/dev/2019-September/144350.html
> Jerin claims 1.5-3% drop for introducing extra call via hiding eth_dev contents -
> I suppose we would have something similar here.
> I do realize that in majority of cases crypto is more expensive then RX/TX, but
> still.
> 
> If it would be a really unavoidable tradeoff (support already existing API, or so)
> I wouldn't mind, but I don't see any real need for it right now.

Calling session_udata->process(session->udata->sess, ...); from the application and
Application need to maintain for each PMD the process() API in its memory will make
the application not portable to other vendors.

What we are doing here is defining another way to create sessions for the same stuff
that is already done. This make applications non-portable and confusing for the application
writer.

I would say you should do some profiling first. As you also mentioned crypto workload is more
Cycle consuming, it will not impact this case.


> 
> >
> > >
> > > > That is why a dev-ops would be a better option.
> > > >
> > > > >
> > > > > > When you would add more functionality to this sync API/struct, it will
> end
> > > up
> > > > > being the same API/struct.
> > > > > >
> > > > > > Let us  see how close/ far we are from the existing APIs when the
> actual
> > > > > implementation is done.
> > > > > >
> > > > > > > > I am not sure if that would be needed.
> > > > > > > > It would be internal to the driver that if synchronous processing is
> > > > > > > supported(from feature flag) and
> > > > > > > > Have relevant fields in xform(the newly added ones which are
> packed
> > > as
> > > > > per
> > > > > > > your suggestions) set,
> > > > > > > > It will create that type of session.
> > > > > > > >
> > > > > > > >
> > > > > > > > > + * Main points:
> > > > > > > > > + * - Current crypto-dev API is reasonably mature and it is
> desirable
> > > > > > > > > + *   to keep it unchanged (API/ABI stability). From other side, this
> > > > > > > > > + *   new sync API is new one and probably would require extra
> > > changes.
> > > > > > > > > + *   Having it as a new one allows to mark it as experimental,
> without
> > > > > > > > > + *   affecting existing one.
> > > > > > > > > + * - Fully opaque cpu_sym_session structure gives more flexibility
> > > > > > > > > + *   to the PMD writers and again allows to avoid ABI breakages
> in
> > > future.
> > > > > > > > > + * - process() function per set of xforms
> > > > > > > > > + *   allows to expose different process() functions for different
> > > > > > > > > + *   xform combinations. PMD writer can decide, does he wants
> to
> > > > > > > > > + *   push all supported algorithms into one process() function,
> > > > > > > > > + *   or spread it across several ones.
> > > > > > > > > + *   I.E. More flexibility for PMD writer.
> > > > > > > >
> > > > > > > > Which process function should be chosen is internal to PMD, how
> > > would
> > > > > that
> > > > > > > info
> > > > > > > > be visible to the application or the library. These will get stored in
> the
> > > > > session
> > > > > > > private
> > > > > > > > data. It would be upto the PMD writer, to store the per session
> process
> > > > > > > function in
> > > > > > > > the session private data.
> > > > > > > >
> > > > > > > > Process function would be a dev ops just like enc/deq operations
> and it
> > > > > should
> > > > > > > call
> > > > > > > > The respective process API stored in the session private data.
> > > > > > >
> > > > > > > That model (via devops) is possible, but has several drawbacks from
> my
> > > > > > > perspective:
> > > > > > >
> > > > > > > 1. It means we'll need to pass dev_id as a parameter to process()
> function.
> > > > > > > Though in fact dev_id is not a relevant information for us here
> > > > > > > (all we need is pointer to the session and pointer to the fuction to call)
> > > > > > > and I tried to avoid using it in data-path functions for that API.
> > > > > >
> > > > > > You have a single vdev, but someone may have multiple vdevs for each
> > > thread,
> > > > > or may
> > > > > > Have same dev with multiple queues for each core.
> > > > >
> > > > > That's fine. As I said above it is a SW backed implementation.
> > > > > Each session has to be a separate entity that contains all necessary
> > > information
> > > > > (keys, alg/mode info,  etc.)  to process input buffers.
> > > > > Plus we need the actual function pointer to call.
> > > > > I just don't see what for we need a dev_id in that situation.
> > > >
> > > > To iterate the session private data in the session.
> > > >
> > > > > Again, here we don't need care about queues and their pinning to cores.
> > > > > If let say someone would like to process buffers from the same IPsec SA
> on 2
> > > > > different cores in parallel, he can just create 2 sessions for the same
> xform,
> > > > > give one to thread #1  and second to thread #2.
> > > > > After that both threads are free to call process(this_thread_ses, ...) at will.
> > > >
> > > > Say you have a 16core device to handle 100G of traffic on a single tunnel.
> > > > Will we make 16 sessions with same parameters?
> > >
> > > Absolutely same question we can ask for current crypto-op API.
> > > You have lookaside crypto-dev with 16 HW queues, each queue is serviced by
> > > different CPU.
> > > For the same SA, do you need a separate session per queue, or is it ok to
> reuse
> > > current one?
> > > AFAIK, right now this is a grey area not clearly defined.
> > > For crypto-devs I am aware - user can reuse the same session (as PMD uses it
> > > read-only).
> > > But again, right now I think it is not clearly defined and is implementation
> > > specific.
> >
> > User can use the same session, that is what I am also insisting, but it may have
> separate
> > Session private data. Cryptodev session create API provide that functionality
> and we can
> > Leverage that.
> 
> rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which means
> we can't use
> the same rte_cryptodev_sym_session to hold sessions for both sync and async
> mode
> for the same device. Off course we can add a hard requirement that any driver
> that wants to
> support process() has to create sessions that can handle both  process and
> enqueue/dequeue,
> but then again  what for to create such overhead?
> 
> BTW, to be honest, I don't consider current rte_cryptodev_sym_session
> construct for multiple device_ids:
> __extension__ struct {
>                 void *data;
>                 uint16_t refcnt;
>         } sess_data[0];
>         /**< Driver specific session material, variable size */
> 
Yes I also feel the same. I was also not in favor of this when it was introduced.
Please go ahead and remove this. I have no issues with that.

> as an advantage.
> It looks too error prone for me:
> 1. Simultaneous session initialization/de-initialization for devices with the same
> driver_id is not possible.
> 2. It assumes that all device driver will be loaded before we start to create
> session pools.
> 
> Right now it seems ok, as no-one requires such functionality, but I don't know
> how it will be in future.
> For me rte_security session model, where for each security context user have to
> create new session
> looks much more robust.
Agreed

> 
> >
> > BTW, I can see a v2 to this RFC which is still based on security library.
> 
> Yes, v2 was concentrated on fixing found issues, some code restructuring,
> i.e. - changes that would be needed anyway whatever API aproach we'll choose.
> 
> > When do you plan
> > To submit the patches for crypto based APIs. We have RC1 merge deadline for
> this
> > patchset on 21st Oct.
> 
> We'd like to start working on it ASAP, but it seems we still have a major
> disagreement
> about how this crypto-dev API should look like.
> Which makes me think - should we return to our original proposal via
> rte_security?
> It still looks to me like clean and straightforward way to enable this new API,
> and probably wouldn't cause that much controversy.
> What do you think?

I cannot spend more time discussing on this until RC1 date. I have some other stuff pending.
You can send the patches early next week with the approach that I mentioned or else we
can discuss this post RC1(which would mean deferring to 20.02).

But moving back to security is not acceptable to me. The code should be put where it is
intended and not where it is easy to put. You are not doing any rte_security stuff.


Regards,
Akhil

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-10-11 13:23                                         ` Akhil Goyal
@ 2019-10-13 23:07                                           ` Zhang, Roy Fan
  2019-10-14 11:10                                             ` Ananyev, Konstantin
  2019-10-15 15:00                                             ` Akhil Goyal
  2019-10-16 22:07                                           ` Ananyev, Konstantin
  1 sibling, 2 replies; 87+ messages in thread
From: Zhang, Roy Fan @ 2019-10-13 23:07 UTC (permalink / raw)
  To: Akhil Goyal, Ananyev, Konstantin, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon',
	Doherty, Declan
  Cc: 'Anoob Joseph'

Hi Akhil,

Thanks for the review and comments! 
Knowing you are extremely busy. Here is my point in brief:
I think placing the CPU synchronous crypto in the rte_security make sense, as

1. rte_security contains inline crypto and lookaside crypto action type already, adding cpu_crypto action type is reasonable.
2. rte_security contains the security features may not supported by all devices, such as crypto, ipsec, and PDCP. cpu_crypto follow this category, again crypto.
3. placing CPU synchronous crypto API in rte_security is natural - as inline mode works synchronously, too. However cryptodev doesn't.
4. placing CPU synchronous crypto API in rte_security helps boosting SW crypto performance, I have already provided a simple perf test inside the unit test in the patchset for the user to try out - just comparing its output against DPDK crypto perf app output.
5. placing CPU synchronous crypto API in cryptodev will never serve HW lookaside crypto PMDs, as making them to work synchronously have huge performance penalty. However Cryptodev framework's existing design is providing APIs that will work in all crypto PMDs (rte_cryptodev_enqueue_burst / dequeue_burst for example), this does not fit in cryptodev's principle.
6. placing CPU synchronous crypto API in cryptodev confuses the user, as: 
	- the session created for async mode may not work in sync mode
	- both enqueue/dequeue and cpu_crypto_process does the same crypto processing, but one PMD may support only one API (set), the other may support another, and the third PMD supports both. We have to provide another API to let the user query which one to support which.
	- two completely different code paths for async/sync mode.
7. You said in the end of the email - placing CPU synchronous crypto API into rte_security is not acceptable as it does not do any rte_security stuff - crypto isn't? You may call this a quibble, but in my idea, in the patchset both PMDs' implementations did offload the work to the CPU's special circuit designed dedicated to accelerate the crypto processing.

To me cryptodev is the one CPU synchronous crypto API should not go into, rte_security is.

Regards,
Fan

> -----Original Message-----
> From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> Sent: Friday, October 11, 2019 2:24 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; 'dev@dpdk.org'
> <dev@dpdk.org>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>;
> 'Thomas Monjalon' <thomas@monjalon.net>; Zhang, Roy Fan
> <roy.fan.zhang@intel.com>; Doherty, Declan <declan.doherty@intel.com>
> Cc: 'Anoob Joseph' <anoobj@marvell.com>
> Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type and
> API
> 
> Hi Konstantin,
> 
> >
> > Hi Akhil,
> >
> ..[snip]
> 
> > > > > > > OK let us assume that you have a separate structure. But I
> > > > > > > have a few
> > > > queries:
> > > > > > > 1. how can multiple drivers use a same session
> > > > > >
> > > > > > As a short answer: they can't.
> > > > > > It is pretty much the same approach as with rte_security -
> > > > > > each device
> > needs
> > > > to
> > > > > > create/init its own session.
> > > > > > So upper layer would need to maintain its own array (or so) for such
> case.
> > > > > > Though the question is why would you like to have same session
> > > > > > over
> > > > multiple
> > > > > > SW backed devices?
> > > > > > As it would be anyway just a synchronous function call that
> > > > > > will be
> > executed
> > > > on
> > > > > > the same cpu.
> > > > >
> > > > > I may have single FAT tunnel which may be distributed over
> > > > > multiple Cores, and each core is affined to a different SW device.
> > > >
> > > > If it is pure SW, then we don't need multiple devices for such scenario.
> > > > Device in that case is pure abstraction that we can skip.
> > >
> > > Yes agreed, but that liberty is given to the application whether it
> > > need multiple devices with single queue or a single device with multiple
> queues.
> > > I think that independence should not be broken in this new API.
> > > >
> > > > > So a single session may be accessed by multiple devices.
> > > > >
> > > > > One more example would be depending on packet sizes, I may
> > > > > switch
> > between
> > > > > HW/SW PMDs with the same session.
> > > >
> > > > Sure, but then we'll have multiple sessions.
> > >
> > > No, the session will be same and it will have multiple private data
> > > for each of
> > the PMD.
> > >
> > > > BTW, we have same thing now - these private session pointers are
> > > > just
> > stored
> > > > inside the same rte_crypto_sym_session.
> > > > And if user wants to support this model, he would also need to
> > > > store <dev_id, queue_id> pair for each HW device anyway.
> > >
> > > Yes agreed, but how is that thing happening in your new struct, you
> > > cannot
> > support that.
> >
> > User can store all these info in his own struct.
> > That's exactly what we have right now.
> > Let say ipsec-secgw has to store for each IPsec SA:
> > pointer to crypto-session and/or pointer to security session plus (for
> > lookaside-devices) cdev_id_qp that allows it to extract dev_id +
> > queue_id information.
> > As I understand that works for now, as each ipsec_sa uses only one
> > dev+queue. Though if someone would like to use multiple devices/queues
> > for the same SA - he would need to have an array of these <dev+queue>
> pairs.
> > So even right now rte_cryptodev_sym_session is not self-consistent and
> > requires extra information to be maintained by user.
> 
> Why are you increasing the complexity for the user application.
> The new APIs and struct should be such that it need to do minimum changes
> in the stack so that stack is portable on multiple vendors.
> You should try to hide as much complexity in the driver or lib to give the user
> simple APIs.
> 
> Having a same session for multiple devices was added by Intel only for some
> use cases.
> And we had split that session create API into 2. Now if those are not useful
> shall we move back to the single API. I think @Doherty, Declan and @De Lara
> Guarch, Pablo can comment on this.
> 
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > > 2. Can somebody use the scheduler pmd for scheduling the
> > > > > > > different
> > type
> > > > of
> > > > > > payloads for the same session?
> > > > > >
> > > > > > In theory yes.
> > > > > > Though for that scheduler pmd should have inside it's
> > > > > > rte_crypto_cpu_sym_session an array of pointers to the
> > > > > > underlying devices sessions.
> > > > > >
> > > > > > >
> > > > > > > With your proposal the APIs would be very specific to your
> > > > > > > use case
> > only.
> > > > > >
> > > > > > Yes in some way.
> > > > > > I consider that API specific for SW backed crypto PMDs.
> > > > > > I can hardly see how any 'real HW' PMDs (lksd-none,
> > > > > > lksd-proto) will
> > benefit
> > > > > > from it.
> > > > > > Current crypto-op API is very much HW oriented.
> > > > > > Which is ok, that's for it was intended for, but I think we
> > > > > > also need one
> > that
> > > > > > would be designed
> > > > > > for SW backed implementation in mind.
> > > > >
> > > > > We may re-use your API for HW PMDs as well which do not have
> > requirement
> > > > of
> > > > > Crypto-op/mbuf etc.
> > > > > The return type of your new process API may have a status which
> > > > > say
> > > > 'processed'
> > > > > Or can be say 'enqueued'. So if it is  'enqueued', we may have a
> > > > > new API for
> > > > raw
> > > > > Bufs dequeue as well.
> > > > >
> > > > > This requirement can be for any hardware PMDs like QAT as well.
> > > >
> > > > I don't think it is a good idea to extend this API for async (lookaside)
> devices.
> > > > You'll need to:
> > > >  - provide dev_id and queue_id for each process(enqueue) and
> > > > dequeuer operation.
> > > >  - provide IOVA for all buffers passing to that function (data
> > > > buffers, digest,
> > IV,
> > > > aad).
> > > >  - On dequeue provide some way to associate dequed data and digest
> > > > buffers with
> > > >    crypto-session that was used  (and probably with mbuf).
> > > >  So most likely we'll end up with another just version of our
> > > > current crypto-op structure.
> > > > If you'd like to get rid of mbufs dependency within current
> > > > crypto-op API that understandable, but I don't think we should
> > > > have same API for both sync (CPU) and async
> > > > (lookaside) cases.
> > > > It doesn't seem feasible at all and voids whole purpose of that patch.
> > >
> > > At this moment we are not much concerned about the dequeue API and
> > > about
> > the
> > > HW PMD support. It is just that the new API should be generic enough
> > > to be
> > used in
> > > some future scenarios as well. I am just highlighting the possible
> > > usecases
> > which can
> > > be there in future.
> >
> > Sorry, but I strongly disagree with such approach.
> > We should stop adding/modifying API 'just in case' and because 'it
> > might be useful for some future HW'.
> > Inside DPDK we already do have too many dev level APIs without any
> > implementations.
> > That's quite bad practice and very dis-orienting for end-users.
> > I think to justify API additions/changes we need at least one proper
> > implementation for it, or at least some strong evidence that people
> > are really committed to support it in nearest future.
> > BTW, that what TB agreed on, nearly a year ago.
> >
> > This new API (if we'll go ahead with it of course) would stay
> > experimental for some time anyway to make sure we don't miss anything
> > needed (I think for about a year time- frame).
> > So if you guys *really* want to extend it support _async_ devices too
> > - I am open for modifications/additions here.
> > Though personally I think such addition would over-complicate things
> > and we'll end up with another reincarnation of current crypto-op.
> > We actually discussed it internally, and decided to drop that idea because
> of that.
> > Again, my opinion - for lookaside devices it might be better to try to
> > optimize current crypto-op path (remove mbuf requirement, probably add
> > ability to group by session on enqueue/dequeue, etc.).
> 
> I agree that the new API is experimental and can be modified later. So no
> issues in that, but we can keep some things in mind while defining APIs.
> These were some comments from my side, if those are impacting the current
> scenario, you can drop those. We will take care of those later.
> 
> >
> > >
> > > What is the issue that you face in making a dev-op for this new API.
> > > Do you see
> > any
> > > performance impact with that?
> >
> > There are two main things:
> > 1. user would need to maintain and provide for each process() call
> > dev_id+queue_id.
> > That's means extra (and totally unnecessary for SW) overhead.
> 
> You are using a crypto device for performing the processing, you must use
> dev_id to identify which SW device it is. This is how the DPDK Framework
> works.
> .
> 
> > 2. yes I would expect some perf overhead too - it would be extra call or
> branch.
> > Again as it would be data-dependency - most likely cpu wouldn't be
> > able to pipeline it efficiently:
> >
> > rte_crypto_sym_process(uint8_t dev_id, uint16 qp_id,
> > rte_crypto_sym_session *ses, ...) {
> >      struct rte_cryptodev *dev = &rte_cryptodevs[dev_id];
> >      return (*dev->process)(sess->data[dev->driver_id, ...); }
> >
> > driver_specific_process(driver_specific_sym_session *sess) {
> >    return sess->process(sess, ...) ;
> > }
> >
> > I didn't make any exact measurements but sure it would be slower than
> just:
> > session_udata->process(session->udata->sess, ...); Again it would be
> > much more noticeable on low end cpus.
> > Let say here:
> > http://mails.dpdk.org/archives/dev/2019-September/144350.html
> > Jerin claims 1.5-3% drop for introducing extra call via hiding eth_dev
> > contents - I suppose we would have something similar here.
> > I do realize that in majority of cases crypto is more expensive then
> > RX/TX, but still.
> >
> > If it would be a really unavoidable tradeoff (support already existing
> > API, or so) I wouldn't mind, but I don't see any real need for it right now.
> 
> Calling session_udata->process(session->udata->sess, ...); from the
> application and Application need to maintain for each PMD the process() API
> in its memory will make the application not portable to other vendors.
> 
> What we are doing here is defining another way to create sessions for the
> same stuff that is already done. This make applications non-portable and
> confusing for the application writer.
> 
> I would say you should do some profiling first. As you also mentioned crypto
> workload is more Cycle consuming, it will not impact this case.
> 
> 
> >
> > >
> > > >
> > > > > That is why a dev-ops would be a better option.
> > > > >
> > > > > >
> > > > > > > When you would add more functionality to this sync
> > > > > > > API/struct, it will
> > end
> > > > up
> > > > > > being the same API/struct.
> > > > > > >
> > > > > > > Let us  see how close/ far we are from the existing APIs
> > > > > > > when the
> > actual
> > > > > > implementation is done.
> > > > > > >
> > > > > > > > > I am not sure if that would be needed.
> > > > > > > > > It would be internal to the driver that if synchronous
> > > > > > > > > processing is
> > > > > > > > supported(from feature flag) and
> > > > > > > > > Have relevant fields in xform(the newly added ones which
> > > > > > > > > are
> > packed
> > > > as
> > > > > > per
> > > > > > > > your suggestions) set,
> > > > > > > > > It will create that type of session.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > + * Main points:
> > > > > > > > > > + * - Current crypto-dev API is reasonably mature and
> > > > > > > > > > + it is
> > desirable
> > > > > > > > > > + *   to keep it unchanged (API/ABI stability). From other
> side, this
> > > > > > > > > > + *   new sync API is new one and probably would require
> extra
> > > > changes.
> > > > > > > > > > + *   Having it as a new one allows to mark it as experimental,
> > without
> > > > > > > > > > + *   affecting existing one.
> > > > > > > > > > + * - Fully opaque cpu_sym_session structure gives more
> flexibility
> > > > > > > > > > + *   to the PMD writers and again allows to avoid ABI
> breakages
> > in
> > > > future.
> > > > > > > > > > + * - process() function per set of xforms
> > > > > > > > > > + *   allows to expose different process() functions for
> different
> > > > > > > > > > + *   xform combinations. PMD writer can decide, does he
> wants
> > to
> > > > > > > > > > + *   push all supported algorithms into one process()
> function,
> > > > > > > > > > + *   or spread it across several ones.
> > > > > > > > > > + *   I.E. More flexibility for PMD writer.
> > > > > > > > >
> > > > > > > > > Which process function should be chosen is internal to
> > > > > > > > > PMD, how
> > > > would
> > > > > > that
> > > > > > > > info
> > > > > > > > > be visible to the application or the library. These will
> > > > > > > > > get stored in
> > the
> > > > > > session
> > > > > > > > private
> > > > > > > > > data. It would be upto the PMD writer, to store the per
> > > > > > > > > session
> > process
> > > > > > > > function in
> > > > > > > > > the session private data.
> > > > > > > > >
> > > > > > > > > Process function would be a dev ops just like enc/deq
> > > > > > > > > operations
> > and it
> > > > > > should
> > > > > > > > call
> > > > > > > > > The respective process API stored in the session private data.
> > > > > > > >
> > > > > > > > That model (via devops) is possible, but has several
> > > > > > > > drawbacks from
> > my
> > > > > > > > perspective:
> > > > > > > >
> > > > > > > > 1. It means we'll need to pass dev_id as a parameter to
> > > > > > > > process()
> > function.
> > > > > > > > Though in fact dev_id is not a relevant information for us
> > > > > > > > here (all we need is pointer to the session and pointer to
> > > > > > > > the fuction to call) and I tried to avoid using it in data-path
> functions for that API.
> > > > > > >
> > > > > > > You have a single vdev, but someone may have multiple vdevs
> > > > > > > for each
> > > > thread,
> > > > > > or may
> > > > > > > Have same dev with multiple queues for each core.
> > > > > >
> > > > > > That's fine. As I said above it is a SW backed implementation.
> > > > > > Each session has to be a separate entity that contains all
> > > > > > necessary
> > > > information
> > > > > > (keys, alg/mode info,  etc.)  to process input buffers.
> > > > > > Plus we need the actual function pointer to call.
> > > > > > I just don't see what for we need a dev_id in that situation.
> > > > >
> > > > > To iterate the session private data in the session.
> > > > >
> > > > > > Again, here we don't need care about queues and their pinning to
> cores.
> > > > > > If let say someone would like to process buffers from the same
> > > > > > IPsec SA
> > on 2
> > > > > > different cores in parallel, he can just create 2 sessions for
> > > > > > the same
> > xform,
> > > > > > give one to thread #1  and second to thread #2.
> > > > > > After that both threads are free to call process(this_thread_ses, ...)
> at will.
> > > > >
> > > > > Say you have a 16core device to handle 100G of traffic on a single
> tunnel.
> > > > > Will we make 16 sessions with same parameters?
> > > >
> > > > Absolutely same question we can ask for current crypto-op API.
> > > > You have lookaside crypto-dev with 16 HW queues, each queue is
> > > > serviced by different CPU.
> > > > For the same SA, do you need a separate session per queue, or is
> > > > it ok to
> > reuse
> > > > current one?
> > > > AFAIK, right now this is a grey area not clearly defined.
> > > > For crypto-devs I am aware - user can reuse the same session (as
> > > > PMD uses it read-only).
> > > > But again, right now I think it is not clearly defined and is
> > > > implementation specific.
> > >
> > > User can use the same session, that is what I am also insisting, but
> > > it may have
> > separate
> > > Session private data. Cryptodev session create API provide that
> > > functionality
> > and we can
> > > Leverage that.
> >
> > rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which
> > means we can't use the same rte_cryptodev_sym_session to hold sessions
> > for both sync and async mode for the same device. Off course we can
> > add a hard requirement that any driver that wants to support process()
> > has to create sessions that can handle both  process and
> > enqueue/dequeue, but then again  what for to create such overhead?
> >
> > BTW, to be honest, I don't consider current rte_cryptodev_sym_session
> > construct for multiple device_ids:
> > __extension__ struct {
> >                 void *data;
> >                 uint16_t refcnt;
> >         } sess_data[0];
> >         /**< Driver specific session material, variable size */
> >
> Yes I also feel the same. I was also not in favor of this when it was introduced.
> Please go ahead and remove this. I have no issues with that.
> 
> > as an advantage.
> > It looks too error prone for me:
> > 1. Simultaneous session initialization/de-initialization for devices
> > with the same driver_id is not possible.
> > 2. It assumes that all device driver will be loaded before we start to
> > create session pools.
> >
> > Right now it seems ok, as no-one requires such functionality, but I
> > don't know how it will be in future.
> > For me rte_security session model, where for each security context
> > user have to create new session looks much more robust.
> Agreed
> 
> >
> > >
> > > BTW, I can see a v2 to this RFC which is still based on security library.
> >
> > Yes, v2 was concentrated on fixing found issues, some code
> > restructuring, i.e. - changes that would be needed anyway whatever API
> aproach we'll choose.
> >
> > > When do you plan
> > > To submit the patches for crypto based APIs. We have RC1 merge
> > > deadline for
> > this
> > > patchset on 21st Oct.
> >
> > We'd like to start working on it ASAP, but it seems we still have a
> > major disagreement about how this crypto-dev API should look like.
> > Which makes me think - should we return to our original proposal via
> > rte_security?
> > It still looks to me like clean and straightforward way to enable this
> > new API, and probably wouldn't cause that much controversy.
> > What do you think?
> 
> I cannot spend more time discussing on this until RC1 date. I have some other
> stuff pending.
> You can send the patches early next week with the approach that I
> mentioned or else we can discuss this post RC1(which would mean deferring
> to 20.02).
> 
> But moving back to security is not acceptable to me. The code should be put
> where it is intended and not where it is easy to put. You are not doing any
> rte_security stuff.
> 
> 
> Regards,
> Akhil

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-10-13 23:07                                           ` Zhang, Roy Fan
@ 2019-10-14 11:10                                             ` Ananyev, Konstantin
  2019-10-15 15:02                                               ` Akhil Goyal
  2019-10-15 15:00                                             ` Akhil Goyal
  1 sibling, 1 reply; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-10-14 11:10 UTC (permalink / raw)
  To: Zhang, Roy Fan, Akhil Goyal, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon',
	Doherty, Declan
  Cc: 'Anoob Joseph', Jerin Jacob, Hemant Agrawal


> Hi Akhil,
> 
> Thanks for the review and comments!
> Knowing you are extremely busy. Here is my point in brief:
> I think placing the CPU synchronous crypto in the rte_security make sense, as
> 
> 1. rte_security contains inline crypto and lookaside crypto action type already, adding cpu_crypto action type is reasonable.
> 2. rte_security contains the security features may not supported by all devices, such as crypto, ipsec, and PDCP. cpu_crypto follow this
> category, again crypto.
> 3. placing CPU synchronous crypto API in rte_security is natural - as inline mode works synchronously, too. However cryptodev doesn't.
> 4. placing CPU synchronous crypto API in rte_security helps boosting SW crypto performance, I have already provided a simple perf test
> inside the unit test in the patchset for the user to try out - just comparing its output against DPDK crypto perf app output.
> 5. placing CPU synchronous crypto API in cryptodev will never serve HW lookaside crypto PMDs, as making them to work synchronously
> have huge performance penalty. However Cryptodev framework's existing design is providing APIs that will work in all crypto PMDs
> (rte_cryptodev_enqueue_burst / dequeue_burst for example), this does not fit in cryptodev's principle.
> 6. placing CPU synchronous crypto API in cryptodev confuses the user, as:
> 	- the session created for async mode may not work in sync mode
> 	- both enqueue/dequeue and cpu_crypto_process does the same crypto processing, but one PMD may support only one API (set),
> the other may support another, and the third PMD supports both. We have to provide another API to let the user query which one to
> support which.
> 	- two completely different code paths for async/sync mode.
> 7. You said in the end of the email - placing CPU synchronous crypto API into rte_security is not acceptable as it does not do any
> rte_security stuff - crypto isn't? You may call this a quibble, but in my idea, in the patchset both PMDs' implementations did offload the work
> to the CPU's special circuit designed dedicated to accelerate the crypto processing.
> 
> To me cryptodev is the one CPU synchronous crypto API should not go into, rte_security is.

I also don't understand why rte_security is not an option here.
We do have inline-crypto right now, why we can't have cpu-crypto with new process() API here?
Actually would like to hear more opinions from the community here -
what other interested parties think is the best way for introducing cpu-crypto specific API? 

Konstantin

> 
> Regards,
> Fan
> 
> > -----Original Message-----
> > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > Sent: Friday, October 11, 2019 2:24 PM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; 'dev@dpdk.org'
> > <dev@dpdk.org>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>;
> > 'Thomas Monjalon' <thomas@monjalon.net>; Zhang, Roy Fan
> > <roy.fan.zhang@intel.com>; Doherty, Declan <declan.doherty@intel.com>
> > Cc: 'Anoob Joseph' <anoobj@marvell.com>
> > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type and
> > API
> >
> > Hi Konstantin,
> >
> > >
> > > Hi Akhil,
> > >
> > ..[snip]
> >
> > > > > > > > OK let us assume that you have a separate structure. But I
> > > > > > > > have a few
> > > > > queries:
> > > > > > > > 1. how can multiple drivers use a same session
> > > > > > >
> > > > > > > As a short answer: they can't.
> > > > > > > It is pretty much the same approach as with rte_security -
> > > > > > > each device
> > > needs
> > > > > to
> > > > > > > create/init its own session.
> > > > > > > So upper layer would need to maintain its own array (or so) for such
> > case.
> > > > > > > Though the question is why would you like to have same session
> > > > > > > over
> > > > > multiple
> > > > > > > SW backed devices?
> > > > > > > As it would be anyway just a synchronous function call that
> > > > > > > will be
> > > executed
> > > > > on
> > > > > > > the same cpu.
> > > > > >
> > > > > > I may have single FAT tunnel which may be distributed over
> > > > > > multiple Cores, and each core is affined to a different SW device.
> > > > >
> > > > > If it is pure SW, then we don't need multiple devices for such scenario.
> > > > > Device in that case is pure abstraction that we can skip.
> > > >
> > > > Yes agreed, but that liberty is given to the application whether it
> > > > need multiple devices with single queue or a single device with multiple
> > queues.
> > > > I think that independence should not be broken in this new API.
> > > > >
> > > > > > So a single session may be accessed by multiple devices.
> > > > > >
> > > > > > One more example would be depending on packet sizes, I may
> > > > > > switch
> > > between
> > > > > > HW/SW PMDs with the same session.
> > > > >
> > > > > Sure, but then we'll have multiple sessions.
> > > >
> > > > No, the session will be same and it will have multiple private data
> > > > for each of
> > > the PMD.
> > > >
> > > > > BTW, we have same thing now - these private session pointers are
> > > > > just
> > > stored
> > > > > inside the same rte_crypto_sym_session.
> > > > > And if user wants to support this model, he would also need to
> > > > > store <dev_id, queue_id> pair for each HW device anyway.
> > > >
> > > > Yes agreed, but how is that thing happening in your new struct, you
> > > > cannot
> > > support that.
> > >
> > > User can store all these info in his own struct.
> > > That's exactly what we have right now.
> > > Let say ipsec-secgw has to store for each IPsec SA:
> > > pointer to crypto-session and/or pointer to security session plus (for
> > > lookaside-devices) cdev_id_qp that allows it to extract dev_id +
> > > queue_id information.
> > > As I understand that works for now, as each ipsec_sa uses only one
> > > dev+queue. Though if someone would like to use multiple devices/queues
> > > for the same SA - he would need to have an array of these <dev+queue>
> > pairs.
> > > So even right now rte_cryptodev_sym_session is not self-consistent and
> > > requires extra information to be maintained by user.
> >
> > Why are you increasing the complexity for the user application.
> > The new APIs and struct should be such that it need to do minimum changes
> > in the stack so that stack is portable on multiple vendors.
> > You should try to hide as much complexity in the driver or lib to give the user
> > simple APIs.
> >
> > Having a same session for multiple devices was added by Intel only for some
> > use cases.
> > And we had split that session create API into 2. Now if those are not useful
> > shall we move back to the single API. I think @Doherty, Declan and @De Lara
> > Guarch, Pablo can comment on this.
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > 2. Can somebody use the scheduler pmd for scheduling the
> > > > > > > > different
> > > type
> > > > > of
> > > > > > > payloads for the same session?
> > > > > > >
> > > > > > > In theory yes.
> > > > > > > Though for that scheduler pmd should have inside it's
> > > > > > > rte_crypto_cpu_sym_session an array of pointers to the
> > > > > > > underlying devices sessions.
> > > > > > >
> > > > > > > >
> > > > > > > > With your proposal the APIs would be very specific to your
> > > > > > > > use case
> > > only.
> > > > > > >
> > > > > > > Yes in some way.
> > > > > > > I consider that API specific for SW backed crypto PMDs.
> > > > > > > I can hardly see how any 'real HW' PMDs (lksd-none,
> > > > > > > lksd-proto) will
> > > benefit
> > > > > > > from it.
> > > > > > > Current crypto-op API is very much HW oriented.
> > > > > > > Which is ok, that's for it was intended for, but I think we
> > > > > > > also need one
> > > that
> > > > > > > would be designed
> > > > > > > for SW backed implementation in mind.
> > > > > >
> > > > > > We may re-use your API for HW PMDs as well which do not have
> > > requirement
> > > > > of
> > > > > > Crypto-op/mbuf etc.
> > > > > > The return type of your new process API may have a status which
> > > > > > say
> > > > > 'processed'
> > > > > > Or can be say 'enqueued'. So if it is  'enqueued', we may have a
> > > > > > new API for
> > > > > raw
> > > > > > Bufs dequeue as well.
> > > > > >
> > > > > > This requirement can be for any hardware PMDs like QAT as well.
> > > > >
> > > > > I don't think it is a good idea to extend this API for async (lookaside)
> > devices.
> > > > > You'll need to:
> > > > >  - provide dev_id and queue_id for each process(enqueue) and
> > > > > dequeuer operation.
> > > > >  - provide IOVA for all buffers passing to that function (data
> > > > > buffers, digest,
> > > IV,
> > > > > aad).
> > > > >  - On dequeue provide some way to associate dequed data and digest
> > > > > buffers with
> > > > >    crypto-session that was used  (and probably with mbuf).
> > > > >  So most likely we'll end up with another just version of our
> > > > > current crypto-op structure.
> > > > > If you'd like to get rid of mbufs dependency within current
> > > > > crypto-op API that understandable, but I don't think we should
> > > > > have same API for both sync (CPU) and async
> > > > > (lookaside) cases.
> > > > > It doesn't seem feasible at all and voids whole purpose of that patch.
> > > >
> > > > At this moment we are not much concerned about the dequeue API and
> > > > about
> > > the
> > > > HW PMD support. It is just that the new API should be generic enough
> > > > to be
> > > used in
> > > > some future scenarios as well. I am just highlighting the possible
> > > > usecases
> > > which can
> > > > be there in future.
> > >
> > > Sorry, but I strongly disagree with such approach.
> > > We should stop adding/modifying API 'just in case' and because 'it
> > > might be useful for some future HW'.
> > > Inside DPDK we already do have too many dev level APIs without any
> > > implementations.
> > > That's quite bad practice and very dis-orienting for end-users.
> > > I think to justify API additions/changes we need at least one proper
> > > implementation for it, or at least some strong evidence that people
> > > are really committed to support it in nearest future.
> > > BTW, that what TB agreed on, nearly a year ago.
> > >
> > > This new API (if we'll go ahead with it of course) would stay
> > > experimental for some time anyway to make sure we don't miss anything
> > > needed (I think for about a year time- frame).
> > > So if you guys *really* want to extend it support _async_ devices too
> > > - I am open for modifications/additions here.
> > > Though personally I think such addition would over-complicate things
> > > and we'll end up with another reincarnation of current crypto-op.
> > > We actually discussed it internally, and decided to drop that idea because
> > of that.
> > > Again, my opinion - for lookaside devices it might be better to try to
> > > optimize current crypto-op path (remove mbuf requirement, probably add
> > > ability to group by session on enqueue/dequeue, etc.).
> >
> > I agree that the new API is experimental and can be modified later. So no
> > issues in that, but we can keep some things in mind while defining APIs.
> > These were some comments from my side, if those are impacting the current
> > scenario, you can drop those. We will take care of those later.
> >
> > >
> > > >
> > > > What is the issue that you face in making a dev-op for this new API.
> > > > Do you see
> > > any
> > > > performance impact with that?
> > >
> > > There are two main things:
> > > 1. user would need to maintain and provide for each process() call
> > > dev_id+queue_id.
> > > That's means extra (and totally unnecessary for SW) overhead.
> >
> > You are using a crypto device for performing the processing, you must use
> > dev_id to identify which SW device it is. This is how the DPDK Framework
> > works.
> > .
> >
> > > 2. yes I would expect some perf overhead too - it would be extra call or
> > branch.
> > > Again as it would be data-dependency - most likely cpu wouldn't be
> > > able to pipeline it efficiently:
> > >
> > > rte_crypto_sym_process(uint8_t dev_id, uint16 qp_id,
> > > rte_crypto_sym_session *ses, ...) {
> > >      struct rte_cryptodev *dev = &rte_cryptodevs[dev_id];
> > >      return (*dev->process)(sess->data[dev->driver_id, ...); }
> > >
> > > driver_specific_process(driver_specific_sym_session *sess) {
> > >    return sess->process(sess, ...) ;
> > > }
> > >
> > > I didn't make any exact measurements but sure it would be slower than
> > just:
> > > session_udata->process(session->udata->sess, ...); Again it would be
> > > much more noticeable on low end cpus.
> > > Let say here:
> > > http://mails.dpdk.org/archives/dev/2019-September/144350.html
> > > Jerin claims 1.5-3% drop for introducing extra call via hiding eth_dev
> > > contents - I suppose we would have something similar here.
> > > I do realize that in majority of cases crypto is more expensive then
> > > RX/TX, but still.
> > >
> > > If it would be a really unavoidable tradeoff (support already existing
> > > API, or so) I wouldn't mind, but I don't see any real need for it right now.
> >
> > Calling session_udata->process(session->udata->sess, ...); from the
> > application and Application need to maintain for each PMD the process() API
> > in its memory will make the application not portable to other vendors.
> >
> > What we are doing here is defining another way to create sessions for the
> > same stuff that is already done. This make applications non-portable and
> > confusing for the application writer.
> >
> > I would say you should do some profiling first. As you also mentioned crypto
> > workload is more Cycle consuming, it will not impact this case.
> >
> >
> > >
> > > >
> > > > >
> > > > > > That is why a dev-ops would be a better option.
> > > > > >
> > > > > > >
> > > > > > > > When you would add more functionality to this sync
> > > > > > > > API/struct, it will
> > > end
> > > > > up
> > > > > > > being the same API/struct.
> > > > > > > >
> > > > > > > > Let us  see how close/ far we are from the existing APIs
> > > > > > > > when the
> > > actual
> > > > > > > implementation is done.
> > > > > > > >
> > > > > > > > > > I am not sure if that would be needed.
> > > > > > > > > > It would be internal to the driver that if synchronous
> > > > > > > > > > processing is
> > > > > > > > > supported(from feature flag) and
> > > > > > > > > > Have relevant fields in xform(the newly added ones which
> > > > > > > > > > are
> > > packed
> > > > > as
> > > > > > > per
> > > > > > > > > your suggestions) set,
> > > > > > > > > > It will create that type of session.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > + * Main points:
> > > > > > > > > > > + * - Current crypto-dev API is reasonably mature and
> > > > > > > > > > > + it is
> > > desirable
> > > > > > > > > > > + *   to keep it unchanged (API/ABI stability). From other
> > side, this
> > > > > > > > > > > + *   new sync API is new one and probably would require
> > extra
> > > > > changes.
> > > > > > > > > > > + *   Having it as a new one allows to mark it as experimental,
> > > without
> > > > > > > > > > > + *   affecting existing one.
> > > > > > > > > > > + * - Fully opaque cpu_sym_session structure gives more
> > flexibility
> > > > > > > > > > > + *   to the PMD writers and again allows to avoid ABI
> > breakages
> > > in
> > > > > future.
> > > > > > > > > > > + * - process() function per set of xforms
> > > > > > > > > > > + *   allows to expose different process() functions for
> > different
> > > > > > > > > > > + *   xform combinations. PMD writer can decide, does he
> > wants
> > > to
> > > > > > > > > > > + *   push all supported algorithms into one process()
> > function,
> > > > > > > > > > > + *   or spread it across several ones.
> > > > > > > > > > > + *   I.E. More flexibility for PMD writer.
> > > > > > > > > >
> > > > > > > > > > Which process function should be chosen is internal to
> > > > > > > > > > PMD, how
> > > > > would
> > > > > > > that
> > > > > > > > > info
> > > > > > > > > > be visible to the application or the library. These will
> > > > > > > > > > get stored in
> > > the
> > > > > > > session
> > > > > > > > > private
> > > > > > > > > > data. It would be upto the PMD writer, to store the per
> > > > > > > > > > session
> > > process
> > > > > > > > > function in
> > > > > > > > > > the session private data.
> > > > > > > > > >
> > > > > > > > > > Process function would be a dev ops just like enc/deq
> > > > > > > > > > operations
> > > and it
> > > > > > > should
> > > > > > > > > call
> > > > > > > > > > The respective process API stored in the session private data.
> > > > > > > > >
> > > > > > > > > That model (via devops) is possible, but has several
> > > > > > > > > drawbacks from
> > > my
> > > > > > > > > perspective:
> > > > > > > > >
> > > > > > > > > 1. It means we'll need to pass dev_id as a parameter to
> > > > > > > > > process()
> > > function.
> > > > > > > > > Though in fact dev_id is not a relevant information for us
> > > > > > > > > here (all we need is pointer to the session and pointer to
> > > > > > > > > the fuction to call) and I tried to avoid using it in data-path
> > functions for that API.
> > > > > > > >
> > > > > > > > You have a single vdev, but someone may have multiple vdevs
> > > > > > > > for each
> > > > > thread,
> > > > > > > or may
> > > > > > > > Have same dev with multiple queues for each core.
> > > > > > >
> > > > > > > That's fine. As I said above it is a SW backed implementation.
> > > > > > > Each session has to be a separate entity that contains all
> > > > > > > necessary
> > > > > information
> > > > > > > (keys, alg/mode info,  etc.)  to process input buffers.
> > > > > > > Plus we need the actual function pointer to call.
> > > > > > > I just don't see what for we need a dev_id in that situation.
> > > > > >
> > > > > > To iterate the session private data in the session.
> > > > > >
> > > > > > > Again, here we don't need care about queues and their pinning to
> > cores.
> > > > > > > If let say someone would like to process buffers from the same
> > > > > > > IPsec SA
> > > on 2
> > > > > > > different cores in parallel, he can just create 2 sessions for
> > > > > > > the same
> > > xform,
> > > > > > > give one to thread #1  and second to thread #2.
> > > > > > > After that both threads are free to call process(this_thread_ses, ...)
> > at will.
> > > > > >
> > > > > > Say you have a 16core device to handle 100G of traffic on a single
> > tunnel.
> > > > > > Will we make 16 sessions with same parameters?
> > > > >
> > > > > Absolutely same question we can ask for current crypto-op API.
> > > > > You have lookaside crypto-dev with 16 HW queues, each queue is
> > > > > serviced by different CPU.
> > > > > For the same SA, do you need a separate session per queue, or is
> > > > > it ok to
> > > reuse
> > > > > current one?
> > > > > AFAIK, right now this is a grey area not clearly defined.
> > > > > For crypto-devs I am aware - user can reuse the same session (as
> > > > > PMD uses it read-only).
> > > > > But again, right now I think it is not clearly defined and is
> > > > > implementation specific.
> > > >
> > > > User can use the same session, that is what I am also insisting, but
> > > > it may have
> > > separate
> > > > Session private data. Cryptodev session create API provide that
> > > > functionality
> > > and we can
> > > > Leverage that.
> > >
> > > rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which
> > > means we can't use the same rte_cryptodev_sym_session to hold sessions
> > > for both sync and async mode for the same device. Off course we can
> > > add a hard requirement that any driver that wants to support process()
> > > has to create sessions that can handle both  process and
> > > enqueue/dequeue, but then again  what for to create such overhead?
> > >
> > > BTW, to be honest, I don't consider current rte_cryptodev_sym_session
> > > construct for multiple device_ids:
> > > __extension__ struct {
> > >                 void *data;
> > >                 uint16_t refcnt;
> > >         } sess_data[0];
> > >         /**< Driver specific session material, variable size */
> > >
> > Yes I also feel the same. I was also not in favor of this when it was introduced.
> > Please go ahead and remove this. I have no issues with that.
> >
> > > as an advantage.
> > > It looks too error prone for me:
> > > 1. Simultaneous session initialization/de-initialization for devices
> > > with the same driver_id is not possible.
> > > 2. It assumes that all device driver will be loaded before we start to
> > > create session pools.
> > >
> > > Right now it seems ok, as no-one requires such functionality, but I
> > > don't know how it will be in future.
> > > For me rte_security session model, where for each security context
> > > user have to create new session looks much more robust.
> > Agreed
> >
> > >
> > > >
> > > > BTW, I can see a v2 to this RFC which is still based on security library.
> > >
> > > Yes, v2 was concentrated on fixing found issues, some code
> > > restructuring, i.e. - changes that would be needed anyway whatever API
> > aproach we'll choose.
> > >
> > > > When do you plan
> > > > To submit the patches for crypto based APIs. We have RC1 merge
> > > > deadline for
> > > this
> > > > patchset on 21st Oct.
> > >
> > > We'd like to start working on it ASAP, but it seems we still have a
> > > major disagreement about how this crypto-dev API should look like.
> > > Which makes me think - should we return to our original proposal via
> > > rte_security?
> > > It still looks to me like clean and straightforward way to enable this
> > > new API, and probably wouldn't cause that much controversy.
> > > What do you think?
> >
> > I cannot spend more time discussing on this until RC1 date. I have some other
> > stuff pending.
> > You can send the patches early next week with the approach that I
> > mentioned or else we can discuss this post RC1(which would mean deferring
> > to 20.02).
> >
> > But moving back to security is not acceptable to me. The code should be put
> > where it is intended and not where it is easy to put. You are not doing any
> > rte_security stuff.
> >
> >
> > Regards,
> > Akhil

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-10-13 23:07                                           ` Zhang, Roy Fan
  2019-10-14 11:10                                             ` Ananyev, Konstantin
@ 2019-10-15 15:00                                             ` Akhil Goyal
  1 sibling, 0 replies; 87+ messages in thread
From: Akhil Goyal @ 2019-10-15 15:00 UTC (permalink / raw)
  To: Zhang, Roy Fan, Ananyev, Konstantin, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon',
	Doherty, Declan
  Cc: 'Anoob Joseph'

Hi Fan,

> 
> Hi Akhil,
> 
> Thanks for the review and comments!
> Knowing you are extremely busy. Here is my point in brief:
> I think placing the CPU synchronous crypto in the rte_security make sense, as
> 
> 1. rte_security contains inline crypto and lookaside crypto action type already,
> adding cpu_crypto action type is reasonable.

The argument here is not about cpu-crypto, any SW PMD is nothing but a cpu-crypto.
Hence cryptodev already support that.
Here we are concerned only with synchronous processing for crypto workloads.

> 2. rte_security contains the security features may not supported by all devices,
> such as crypto, ipsec, and PDCP. cpu_crypto follow this category, again crypto.

I do not get the intent of this comment. Looking at your patchset, what I get is,
You need a synchronous API for crypto workloads.
If sync processing is required for security payloads, we can add a sync API there as well.
I have made that comment before also. We can have sync API in both security and cryptodev.

> 3. placing CPU synchronous crypto API in rte_security is natural - as inline mode
> works synchronously, too. However cryptodev doesn't.

It is a valid use case for all the cryptodev SW PMDs
that there should be a synchronous API for crypto processing and that is what your usecase
need. 

> 4. placing CPU synchronous crypto API in rte_security helps boosting SW crypto
> performance, I have already provided a simple perf test inside the unit test in the
> patchset for the user to try out - just comparing its output against DPDK crypto
> perf app output.

I don't expect any performance difference while moving this from security to cryptodev.
Have you done any profiling?

> 5. placing CPU synchronous crypto API in cryptodev will never serve HW
> lookaside crypto PMDs, as making them to work synchronously have huge
> performance penalty. However Cryptodev framework's existing design is
> providing APIs that will work in all crypto PMDs (rte_cryptodev_enqueue_burst /
> dequeue_burst for example), this does not fit in cryptodev's principle.

Agreed that it is not for HW PMDs, however the op may be null in those cases.
Why it is against the cryptodev principle? There are some ops which 
PMDs may or may not support.

> 6. placing CPU synchronous crypto API in cryptodev confuses the user, as:
> 	- the session created for async mode may not work in sync mode

Why? The whole idea for my conversations on this patchset talks about that.
Same session should work for both sync and async processing.

> 	- both enqueue/dequeue and cpu_crypto_process does the same crypto
> processing, but one PMD may support only one API (set), the other may support
> another, and the third PMD supports both. We have to provide another API to
> let the user query which one to support which.

This should be based on a Feature flag. It would be upto the application developer
To decide which (sync/async) processing is required for which type of flows that
it is configuring.

> 	- two completely different code paths for async/sync mode.
> 7. You said in the end of the email - placing CPU synchronous crypto API into
> rte_security is not acceptable as it does not do any rte_security stuff - crypto
> isn't? You may call this a quibble, but in my idea, in the patchset both PMDs'
> implementations did offload the work to the CPU's special circuit designed
> dedicated to accelerate the crypto processing.

This is specific to Intel SW PMDs only. IMO, if you are talking about SW PMDs,
openssl can also benefit from this.

> 
> To me cryptodev is the one CPU synchronous crypto API should not go into,
> rte_security is.
> 
> Regards,
> Fan
> 
Regards,
Akhil

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-10-14 11:10                                             ` Ananyev, Konstantin
@ 2019-10-15 15:02                                               ` Akhil Goyal
  2019-10-16 13:04                                                 ` Ananyev, Konstantin
  0 siblings, 1 reply; 87+ messages in thread
From: Akhil Goyal @ 2019-10-15 15:02 UTC (permalink / raw)
  To: Ananyev, Konstantin, Zhang, Roy Fan, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon',
	Doherty, Declan
  Cc: 'Anoob Joseph', Jerin Jacob, Hemant Agrawal



> 
> 
> > Hi Akhil,
> >
> > Thanks for the review and comments!
> > Knowing you are extremely busy. Here is my point in brief:
> > I think placing the CPU synchronous crypto in the rte_security make sense, as
> >
> > 1. rte_security contains inline crypto and lookaside crypto action type already,
> adding cpu_crypto action type is reasonable.
> > 2. rte_security contains the security features may not supported by all devices,
> such as crypto, ipsec, and PDCP. cpu_crypto follow this
> > category, again crypto.
> > 3. placing CPU synchronous crypto API in rte_security is natural - as inline
> mode works synchronously, too. However cryptodev doesn't.
> > 4. placing CPU synchronous crypto API in rte_security helps boosting SW
> crypto performance, I have already provided a simple perf test
> > inside the unit test in the patchset for the user to try out - just comparing its
> output against DPDK crypto perf app output.
> > 5. placing CPU synchronous crypto API in cryptodev will never serve HW
> lookaside crypto PMDs, as making them to work synchronously
> > have huge performance penalty. However Cryptodev framework's existing
> design is providing APIs that will work in all crypto PMDs
> > (rte_cryptodev_enqueue_burst / dequeue_burst for example), this does not fit
> in cryptodev's principle.
> > 6. placing CPU synchronous crypto API in cryptodev confuses the user, as:
> > 	- the session created for async mode may not work in sync mode
> > 	- both enqueue/dequeue and cpu_crypto_process does the same crypto
> processing, but one PMD may support only one API (set),
> > the other may support another, and the third PMD supports both. We have to
> provide another API to let the user query which one to
> > support which.
> > 	- two completely different code paths for async/sync mode.
> > 7. You said in the end of the email - placing CPU synchronous crypto API into
> rte_security is not acceptable as it does not do any
> > rte_security stuff - crypto isn't? You may call this a quibble, but in my idea, in
> the patchset both PMDs' implementations did offload the work
> > to the CPU's special circuit designed dedicated to accelerate the crypto
> processing.
> >
> > To me cryptodev is the one CPU synchronous crypto API should not go into,
> rte_security is.
> 
> I also don't understand why rte_security is not an option here.
> We do have inline-crypto right now, why we can't have cpu-crypto with new
> process() API here?
> Actually would like to hear more opinions from the community here -
> what other interested parties think is the best way for introducing cpu-crypto
> specific API?

I have raised this concern in the weekly status meeting as well. But it looks like nobody
is interested.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-10-15 15:02                                               ` Akhil Goyal
@ 2019-10-16 13:04                                                 ` Ananyev, Konstantin
  0 siblings, 0 replies; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-10-16 13:04 UTC (permalink / raw)
  To: Akhil Goyal, Zhang, Roy Fan, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon',
	Doherty, Declan
  Cc: 'Anoob Joseph', Jerin Jacob, Hemant Agrawal, dpdk-techboard



> -----Original Message-----
> From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> Sent: Tuesday, October 15, 2019 4:02 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Zhang, Roy Fan <roy.fan.zhang@intel.com>; 'dev@dpdk.org' <dev@dpdk.org>;
> De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; 'Thomas Monjalon' <thomas@monjalon.net>; Doherty, Declan
> <declan.doherty@intel.com>
> Cc: 'Anoob Joseph' <anoobj@marvell.com>; Jerin Jacob <jerinj@marvell.com>; Hemant Agrawal <hemant.agrawal@nxp.com>
> Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
> 
> 
> 
> >
> >
> > > Hi Akhil,
> > >
> > > Thanks for the review and comments!
> > > Knowing you are extremely busy. Here is my point in brief:
> > > I think placing the CPU synchronous crypto in the rte_security make sense, as
> > >
> > > 1. rte_security contains inline crypto and lookaside crypto action type already,
> > adding cpu_crypto action type is reasonable.
> > > 2. rte_security contains the security features may not supported by all devices,
> > such as crypto, ipsec, and PDCP. cpu_crypto follow this
> > > category, again crypto.
> > > 3. placing CPU synchronous crypto API in rte_security is natural - as inline
> > mode works synchronously, too. However cryptodev doesn't.
> > > 4. placing CPU synchronous crypto API in rte_security helps boosting SW
> > crypto performance, I have already provided a simple perf test
> > > inside the unit test in the patchset for the user to try out - just comparing its
> > output against DPDK crypto perf app output.
> > > 5. placing CPU synchronous crypto API in cryptodev will never serve HW
> > lookaside crypto PMDs, as making them to work synchronously
> > > have huge performance penalty. However Cryptodev framework's existing
> > design is providing APIs that will work in all crypto PMDs
> > > (rte_cryptodev_enqueue_burst / dequeue_burst for example), this does not fit
> > in cryptodev's principle.
> > > 6. placing CPU synchronous crypto API in cryptodev confuses the user, as:
> > > 	- the session created for async mode may not work in sync mode
> > > 	- both enqueue/dequeue and cpu_crypto_process does the same crypto
> > processing, but one PMD may support only one API (set),
> > > the other may support another, and the third PMD supports both. We have to
> > provide another API to let the user query which one to
> > > support which.
> > > 	- two completely different code paths for async/sync mode.
> > > 7. You said in the end of the email - placing CPU synchronous crypto API into
> > rte_security is not acceptable as it does not do any
> > > rte_security stuff - crypto isn't? You may call this a quibble, but in my idea, in
> > the patchset both PMDs' implementations did offload the work
> > > to the CPU's special circuit designed dedicated to accelerate the crypto
> > processing.
> > >
> > > To me cryptodev is the one CPU synchronous crypto API should not go into,
> > rte_security is.
> >
> > I also don't understand why rte_security is not an option here.
> > We do have inline-crypto right now, why we can't have cpu-crypto with new
> > process() API here?
> > Actually would like to hear more opinions from the community here -
> > what other interested parties think is the best way for introducing cpu-crypto
> > specific API?
> 
> I have raised this concern in the weekly status meeting as well. But it looks like nobody
> is interested.

That's really a pity...
CC-ing it to TB members, hopefully someone would be interested,
or at least can forward to interested person.
Konstantin


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-10-11 13:23                                         ` Akhil Goyal
  2019-10-13 23:07                                           ` Zhang, Roy Fan
@ 2019-10-16 22:07                                           ` Ananyev, Konstantin
  2019-10-17 12:49                                             ` Ananyev, Konstantin
  2019-10-18 13:17                                             ` Akhil Goyal
  1 sibling, 2 replies; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-10-16 22:07 UTC (permalink / raw)
  To: Akhil Goyal, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon',
	Zhang, Roy Fan, Doherty, Declan
  Cc: 'Anoob Joseph'


Hi Akhil,

> > > User can use the same session, that is what I am also insisting, but it may have
> > separate
> > > Session private data. Cryptodev session create API provide that functionality
> > and we can
> > > Leverage that.
> >
> > rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which means
> > we can't use
> > the same rte_cryptodev_sym_session to hold sessions for both sync and async
> > mode
> > for the same device. Off course we can add a hard requirement that any driver
> > that wants to
> > support process() has to create sessions that can handle both  process and
> > enqueue/dequeue,
> > but then again  what for to create such overhead?
> >
> > BTW, to be honest, I don't consider current rte_cryptodev_sym_session
> > construct for multiple device_ids:
> > __extension__ struct {
> >                 void *data;
> >                 uint16_t refcnt;
> >         } sess_data[0];
> >         /**< Driver specific session material, variable size */
> >
> Yes I also feel the same. I was also not in favor of this when it was introduced.
> Please go ahead and remove this. I have no issues with that.

If you are not happy with that structure, and admit there are issues with it,
why do you push for reusing it for cpu-crypto API?  
Why  not to take step back, take into account current drawbacks
and define something that (hopefully) would suite us better?
Again new API will be experimental for some time, so we'll
have some opportunity to see does it works and if not fix it.  

About removing data[] from existing rte_cryptodev_sym_session - 
Personally would like to do that, but the change seems to be too massive.
Definitely not ready for such effort right now.

> 
> > as an advantage.
> > It looks too error prone for me:
> > 1. Simultaneous session initialization/de-initialization for devices with the same
> > driver_id is not possible.
> > 2. It assumes that all device driver will be loaded before we start to create
> > session pools.
> >
> > Right now it seems ok, as no-one requires such functionality, but I don't know
> > how it will be in future.
> > For me rte_security session model, where for each security context user have to
> > create new session
> > looks much more robust.
> Agreed
> 
> >
> > >
> > > BTW, I can see a v2 to this RFC which is still based on security library.
> >
> > Yes, v2 was concentrated on fixing found issues, some code restructuring,
> > i.e. - changes that would be needed anyway whatever API aproach we'll choose.
> >
> > > When do you plan
> > > To submit the patches for crypto based APIs. We have RC1 merge deadline for
> > this
> > > patchset on 21st Oct.
> >
> > We'd like to start working on it ASAP, but it seems we still have a major
> > disagreement
> > about how this crypto-dev API should look like.
> > Which makes me think - should we return to our original proposal via
> > rte_security?
> > It still looks to me like clean and straightforward way to enable this new API,
> > and probably wouldn't cause that much controversy.
> > What do you think?
> 
> I cannot spend more time discussing on this until RC1 date. I have some other stuff pending.
> You can send the patches early next week with the approach that I mentioned or else we
> can discuss this post RC1(which would mean deferring to 20.02).
> 
> But moving back to security is not acceptable to me. The code should be put where it is
> intended and not where it is easy to put. You are not doing any rte_security stuff.
> 

Ok, then my suggestion:
Let's at least write down all points about crypto-dev approach where we
disagree and then probably try to resolve them one by one....
If we fail to make an agreement/progress in next week or so,
(and no more reviews from the community) 
will have bring that subject to TB meeting to decide.
Sounds fair to you?

List is below.
Please add/correct me, if I missed something.

Konstantin

1. extra input parameters to create/init rte_(cpu)_sym_session.

Will leverage existing 6B gap inside rte_crypto_*_xform between 'algo' and 'key' fields.
New fields will be optional and would be used by PMD only when cpu-crypto session is requested.
For lksd-crypto session PMD is free to ignore these fields.  
No ABI breakage is required. 

Hopefully no controversy here with #1.

2. cpu-crypto create/init.
    a) Our suggestion - introduce new API for that:
        - rte_crypto_cpu_sym_init() that would init completely opaque  rte_crypto_cpu_sym_session.
        - struct rte_crypto_cpu_sym_session_ops {(*process)(...); (*clear); /*whatever else we'll need *'};
        - rte_crypto_cpu_sym_get_ops(const struct rte_crypto_sym_xform *xforms)
          that would return const struct rte_crypto_cpu_sym_session_ops *based on input xforms.
	Advantages:
	1)  totally opaque data structure (no ABI breakages in future), PMD writer is totally free
	     with it format and contents. 
	2) each session entity is self-contained, user doesn't need to bring along dev_id etc.
	    dev_id is needed  only at init stage, after that user will use session ops to perform
	    all operations on that session (process(), clear(), etc.).
	3) User can decide does he wants to store ops[] pointer on a per session basis,
	    or on a per group of same sessions, or...
	4) No mandatory mempools for private sessions. User can allocate memory for cpu-crypto
	    session whenever he likes.
	Disadvantages:
	5) Extra changes in control path
	6) User has to store session_ops pointer explicitly.
     b) Your suggestion - reuse existing rte_cryptodev_sym_session_init() and existing rte_cryptodev_sym_session
      structure.
	Advantages:
	1) allows to reuse same struct and init/create/clear() functions.
	    Probably less changes in control path.
	Disadvantages:
	2) rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which means that 
	    we can't use the same rte_cryptodev_sym_session to hold private sessions pointers
	    for both sync and async mode  for the same device.
                   So wthe only option we have - make PMD devops->sym_session_configure()
	    always create a session that can work in both cpu and lksd modes.
	    For some implementations that would probably mean that under the hood  PMD would create
	    2 different session structs (sync/async) and then use one or another depending on from what API been called.
	    Seems doable, but ...:
                   - will contradict with statement from 1: 
	      " New fields will be optional and would be used by PMD only when cpu-crypto session is requested."
                      Now it becomes mandatory for all apps to specify cpu-crypto related parameters too,
	       even if they don't plan to use that mode - i.e. behavior change, existing app change.
                     - might cause extra space overhead.
	3) not possible to store device (not driver) specific data within the session, but I think it is not really needed right now.  
	    So probably minor compared to 2.b.2.

Actually #3 follows from #2, but decided to have them separated.

3. process() parameters/behavior
    a) Our suggestion: user stores ptr to session ops (or to (*process) itself) and just does:
        session_ops->process(sess, ...);
	Advantages:
	1) fastest possible execution path
	2) no need to carry on dev_id for data-path
	Disadvantages:
	3) user has to carry on session_ops pointer explicitly
    b) Your suggestion: add  (*cpu_process) inside rte_cryptodev_ops and then:
        rte_crypto_cpu_sym_process(uint8_t dev_id, rte_cryptodev_sym_session  *sess, /*data parameters*/) {...
                     rte_cryptodevs[dev_id].dev_ops->cpu_process(ses, ...);
                      /*and then inside PMD specifc process: */
                     pmd_private_session = sess->sess_data[this_pmd_driver_id].data;
                     /* and then most likely either */
                     pmd_private_session->process(pmd_private_session, ...);
                     /* or jump based on session/input data */
	Advantages:
	1) don't see any...
	Disadvantages:
	2) User has to carry on dev_id inside data-path
	3) Extra level of indirection (plus data dependency) - both for data and instructions.
	    Possible slowdown compared to a) (not measured). 
	 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-10-16 22:07                                           ` Ananyev, Konstantin
@ 2019-10-17 12:49                                             ` Ananyev, Konstantin
  2019-10-18 13:17                                             ` Akhil Goyal
  1 sibling, 0 replies; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-10-17 12:49 UTC (permalink / raw)
  To: Ananyev, Konstantin, Akhil Goyal, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon',
	Zhang, Roy Fan, Doherty, Declan
  Cc: 'Anoob Joseph'


> 
> > > > User can use the same session, that is what I am also insisting, but it may have
> > > separate
> > > > Session private data. Cryptodev session create API provide that functionality
> > > and we can
> > > > Leverage that.
> > >
> > > rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which means
> > > we can't use
> > > the same rte_cryptodev_sym_session to hold sessions for both sync and async
> > > mode
> > > for the same device. Off course we can add a hard requirement that any driver
> > > that wants to
> > > support process() has to create sessions that can handle both  process and
> > > enqueue/dequeue,
> > > but then again  what for to create such overhead?
> > >
> > > BTW, to be honest, I don't consider current rte_cryptodev_sym_session
> > > construct for multiple device_ids:
> > > __extension__ struct {
> > >                 void *data;
> > >                 uint16_t refcnt;
> > >         } sess_data[0];
> > >         /**< Driver specific session material, variable size */
> > >
> > Yes I also feel the same. I was also not in favor of this when it was introduced.
> > Please go ahead and remove this. I have no issues with that.
> 
> If you are not happy with that structure, and admit there are issues with it,
> why do you push for reusing it for cpu-crypto API?
> Why  not to take step back, take into account current drawbacks
> and define something that (hopefully) would suite us better?
> Again new API will be experimental for some time, so we'll
> have some opportunity to see does it works and if not fix it.
> 
> About removing data[] from existing rte_cryptodev_sym_session -
> Personally would like to do that, but the change seems to be too massive.
> Definitely not ready for such effort right now.
> 
> >
> > > as an advantage.
> > > It looks too error prone for me:
> > > 1. Simultaneous session initialization/de-initialization for devices with the same
> > > driver_id is not possible.
> > > 2. It assumes that all device driver will be loaded before we start to create
> > > session pools.
> > >
> > > Right now it seems ok, as no-one requires such functionality, but I don't know
> > > how it will be in future.
> > > For me rte_security session model, where for each security context user have to
> > > create new session
> > > looks much more robust.
> > Agreed
> >
> > >
> > > >
> > > > BTW, I can see a v2 to this RFC which is still based on security library.
> > >
> > > Yes, v2 was concentrated on fixing found issues, some code restructuring,
> > > i.e. - changes that would be needed anyway whatever API aproach we'll choose.
> > >
> > > > When do you plan
> > > > To submit the patches for crypto based APIs. We have RC1 merge deadline for
> > > this
> > > > patchset on 21st Oct.
> > >
> > > We'd like to start working on it ASAP, but it seems we still have a major
> > > disagreement
> > > about how this crypto-dev API should look like.
> > > Which makes me think - should we return to our original proposal via
> > > rte_security?
> > > It still looks to me like clean and straightforward way to enable this new API,
> > > and probably wouldn't cause that much controversy.
> > > What do you think?
> >
> > I cannot spend more time discussing on this until RC1 date. I have some other stuff pending.
> > You can send the patches early next week with the approach that I mentioned or else we
> > can discuss this post RC1(which would mean deferring to 20.02).
> >
> > But moving back to security is not acceptable to me. The code should be put where it is
> > intended and not where it is easy to put. You are not doing any rte_security stuff.
> >
> 
> Ok, then my suggestion:
> Let's at least write down all points about crypto-dev approach where we
> disagree and then probably try to resolve them one by one....
> If we fail to make an agreement/progress in next week or so,
> (and no more reviews from the community)
> will have bring that subject to TB meeting to decide.
> Sounds fair to you?
> 
> List is below.
> Please add/correct me, if I missed something.
> 
> Konstantin
> 
> 1. extra input parameters to create/init rte_(cpu)_sym_session.
> 
> Will leverage existing 6B gap inside rte_crypto_*_xform between 'algo' and 'key' fields.
> New fields will be optional and would be used by PMD only when cpu-crypto session is requested.
> For lksd-crypto session PMD is free to ignore these fields.
> No ABI breakage is required.
> 
> Hopefully no controversy here with #1.
> 
> 2. cpu-crypto create/init.
>     a) Our suggestion - introduce new API for that:
>         - rte_crypto_cpu_sym_init() that would init completely opaque  rte_crypto_cpu_sym_session.
>         - struct rte_crypto_cpu_sym_session_ops {(*process)(...); (*clear); /*whatever else we'll need *'};
>         - rte_crypto_cpu_sym_get_ops(const struct rte_crypto_sym_xform *xforms)
>           that would return const struct rte_crypto_cpu_sym_session_ops *based on input xforms.
> 	Advantages:
> 	1)  totally opaque data structure (no ABI breakages in future), PMD writer is totally free
> 	     with it format and contents.
> 	2) each session entity is self-contained, user doesn't need to bring along dev_id etc.
> 	    dev_id is needed  only at init stage, after that user will use session ops to perform
> 	    all operations on that session (process(), clear(), etc.).
> 	3) User can decide does he wants to store ops[] pointer on a per session basis,
> 	    or on a per group of same sessions, or...
> 	4) No mandatory mempools for private sessions. User can allocate memory for cpu-crypto
> 	    session whenever he likes.
> 	Disadvantages:
> 	5) Extra changes in control path
> 	6) User has to store session_ops pointer explicitly.

After another thought if 2.a.6 is really that big deal we can have small shim layer on top:

rte_crypto_cpu_sym_session { void *ses; struct rte_crypto_cpu_sym_session_ops * const ops; }
OR even
rte_crypto_cpu_sym_session { void *ses; struct rte_crypto_cpu_sym_session_ops ops; }

And merge rte_crypto_cpu_sym_init() and rte_crypto_cpu_sym_get_ops() into one (init).

Then process() can become a wrapper:
rte_crypto_cpu_sym_process(ses, ...) {return ses->ops->process(ses->ses, ...);}
OR
rte_crypto_cpu_sym_process(ses, ...) {return ses->ops.process(ses->ses, ...);}

if that would help to reach consensus - works for me. 

>      b) Your suggestion - reuse existing rte_cryptodev_sym_session_init() and existing rte_cryptodev_sym_session
>       structure.
> 	Advantages:
> 	1) allows to reuse same struct and init/create/clear() functions.
> 	    Probably less changes in control path.
> 	Disadvantages:
> 	2) rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which means that
> 	    we can't use the same rte_cryptodev_sym_session to hold private sessions pointers
> 	    for both sync and async mode  for the same device.
>                    So wthe only option we have - make PMD devops->sym_session_configure()
> 	    always create a session that can work in both cpu and lksd modes.
> 	    For some implementations that would probably mean that under the hood  PMD would create
> 	    2 different session structs (sync/async) and then use one or another depending on from what API been called.
> 	    Seems doable, but ...:
>                    - will contradict with statement from 1:
> 	      " New fields will be optional and would be used by PMD only when cpu-crypto session is requested."
>                       Now it becomes mandatory for all apps to specify cpu-crypto related parameters too,
> 	       even if they don't plan to use that mode - i.e. behavior change, existing app change.
>                      - might cause extra space overhead.
> 	3) not possible to store device (not driver) specific data within the session, but I think it is not really needed right now.
> 	    So probably minor compared to 2.b.2.
> 
> Actually #3 follows from #2, but decided to have them separated.
> 
> 3. process() parameters/behavior
>     a) Our suggestion: user stores ptr to session ops (or to (*process) itself) and just does:
>         session_ops->process(sess, ...);
> 	Advantages:
> 	1) fastest possible execution path
> 	2) no need to carry on dev_id for data-path
> 	Disadvantages:
> 	3) user has to carry on session_ops pointer explicitly
>     b) Your suggestion: add  (*cpu_process) inside rte_cryptodev_ops and then:
>         rte_crypto_cpu_sym_process(uint8_t dev_id, rte_cryptodev_sym_session  *sess, /*data parameters*/) {...
>                      rte_cryptodevs[dev_id].dev_ops->cpu_process(ses, ...);
>                       /*and then inside PMD specifc process: */
>                      pmd_private_session = sess->sess_data[this_pmd_driver_id].data;
>                      /* and then most likely either */
>                      pmd_private_session->process(pmd_private_session, ...);
>                      /* or jump based on session/input data */
> 	Advantages:
> 	1) don't see any...
> 	Disadvantages:
> 	2) User has to carry on dev_id inside data-path
> 	3) Extra level of indirection (plus data dependency) - both for data and instructions.
> 	    Possible slowdown compared to a) (not measured).
> 

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-10-16 22:07                                           ` Ananyev, Konstantin
  2019-10-17 12:49                                             ` Ananyev, Konstantin
@ 2019-10-18 13:17                                             ` Akhil Goyal
  2019-10-21 13:47                                               ` Ananyev, Konstantin
  1 sibling, 1 reply; 87+ messages in thread
From: Akhil Goyal @ 2019-10-18 13:17 UTC (permalink / raw)
  To: Ananyev, Konstantin, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon',
	Zhang, Roy Fan, Doherty, Declan
  Cc: 'Anoob Joseph', Hemant Agrawal

Hi Konstantin,

Added my comments inline with your draft.
> 
> 
> Hi Akhil,
> 
> > > BTW, to be honest, I don't consider current rte_cryptodev_sym_session
> > > construct for multiple device_ids:
> > > __extension__ struct {
> > >                 void *data;
> > >                 uint16_t refcnt;
> > >         } sess_data[0];
> > >         /**< Driver specific session material, variable size */
> > >
> > Yes I also feel the same. I was also not in favor of this when it was introduced.
> > Please go ahead and remove this. I have no issues with that.
> 
> If you are not happy with that structure, and admit there are issues with it,
> why do you push for reusing it for cpu-crypto API?
> Why  not to take step back, take into account current drawbacks
> and define something that (hopefully) would suite us better?
> Again new API will be experimental for some time, so we'll
> have some opportunity to see does it works and if not fix it.

[Akhil] This structure is serving some use case which is agreed upon in the
Community, we cannot just remove a feature altogether. Rather it is Intel's
Use case only.

> 
> About removing data[] from existing rte_cryptodev_sym_session -
> Personally would like to do that, but the change seems to be too massive.
> Definitely not ready for such effort right now.
> 

[snip]..

> 
> Ok, then my suggestion:
> Let's at least write down all points about crypto-dev approach where we
> disagree and then probably try to resolve them one by one....
> If we fail to make an agreement/progress in next week or so,
> (and no more reviews from the community)
> will have bring that subject to TB meeting to decide.
> Sounds fair to you?
Agreed
> 
> List is below.
> Please add/correct me, if I missed something.
> 
> Konstantin

Before going into comparison, we should define the requirement as well.
What I understood from the patchset,
"You need a synchronous API to perform crypto operations on raw data using SW PMDs"
So,
- no crypto-ops,
- no separate enq-deq, only single process API for data path
- Do not need any value addition to the session parameters.
  (You would need some parameters from the crypto-op which
   Are constant per session and since you wont use crypto-op,
   You need some place to store that)

Now as per your mail, the comparison
1. extra input parameters to create/init rte_(cpu)_sym_session.

Will leverage existing 6B gap inside rte_crypto_*_xform between 'algo' and 'key' fields.
New fields will be optional and would be used by PMD only when cpu-crypto session is requested.
For lksd-crypto session PMD is free to ignore these fields.  
No ABI breakage is required. 

[Akhil] Agreed, no issues.

2. cpu-crypto create/init.
    a) Our suggestion - introduce new API for that:
        - rte_crypto_cpu_sym_init() that would init completely opaque  rte_crypto_cpu_sym_session.
        - struct rte_crypto_cpu_sym_session_ops {(*process)(...); (*clear); /*whatever else we'll need *'};
        - rte_crypto_cpu_sym_get_ops(const struct rte_crypto_sym_xform *xforms)
          that would return const struct rte_crypto_cpu_sym_session_ops *based on input xforms.
	Advantages:
	1)  totally opaque data structure (no ABI breakages in future), PMD writer is totally free
	     with it format and contents. 

[Akhil] It will have breakage at some point till we don't hit the union size.
Rather I don't suspect there will be more parameters added.
Or do we really care about the ABI breakage when the argument is about 
the correct place to add a piece of code or do we really agree to add code
anywhere just to avoid that breakage.

	2) each session entity is self-contained, user doesn't need to bring along dev_id etc.
	    dev_id is needed  only at init stage, after that user will use session ops to perform
	    all operations on that session (process(), clear(), etc.).

[Akhil] There is nothing called as session ops in current DPDK. What you are proposing
is a new concept which doesn't have any extra benefit, rather it is adding complexity
to have two different code paths for session create.


	3) User can decide does he wants to store ops[] pointer on a per session basis,
	    or on a per group of same sessions, or...

[Akhil] Will the user really care which process API should be called from the PMD.
Rather it should be driver's responsibility to store that in the session private data
which would be opaque to the user. As per my suggestion same process function can
be added to multiple sessions or a single session can be managed inside the PMD.


	4) No mandatory mempools for private sessions. User can allocate memory for cpu-crypto
	    session whenever he likes.

[Akhil] you mean session private data? You would need that memory anyways, user will be
allocating that already. You do not need to manage that.

	Disadvantages:
	5) Extra changes in control path
	6) User has to store session_ops pointer explicitly.

[Akhil] More disadvantages:
- All supporting PMDs will need to maintain TWO types of session for the
same crypto processing. Suppose a fix or a new feature(or algo) is added, PMD owner
will need to add code in both the session create APIs. Hence more maintenance and
error prone.
- Stacks which will be using these new APIs also need to maintain two
code path for the same processing while doing session initialization
for sync and async


     b) Your suggestion - reuse existing rte_cryptodev_sym_session_init() and existing rte_cryptodev_sym_session
      structure.
	Advantages:
	1) allows to reuse same struct and init/create/clear() functions.
	    Probably less changes in control path.
	Disadvantages:
	2) rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which means that 
	    we can't use the same rte_cryptodev_sym_session to hold private sessions pointers
	    for both sync and async mode  for the same device.
                   So the only option we have - make PMD devops->sym_session_configure()
	    always create a session that can work in both cpu and lksd modes.
	    For some implementations that would probably mean that under the hood  PMD would create
	    2 different session structs (sync/async) and then use one or another depending on from what API been called.
	    Seems doable, but ...:
                   - will contradict with statement from 1: 
	      " New fields will be optional and would be used by PMD only when cpu-crypto session is requested."
                      Now it becomes mandatory for all apps to specify cpu-crypto related parameters too,
	       even if they don't plan to use that mode - i.e. behavior change, existing app change.
                     - might cause extra space overhead.

[Akhil] It will not contradict with #1, you will only have few checks in the session init PMD
Which support this mode, find appropriate values and set the appropriate process() in it.
User should be able to call, legacy enq-deq as well as the new process() without any issue.
User would be at runtime will be able to change the datapath.
So this is not a disadvantage, it would be additional flexibility for the user.


	3) not possible to store device (not driver) specific data within the session, but I think it is not really needed right now.  
	    So probably minor compared to 2.b.2.

[Akhil] So lets omit this for current discussion. And I hope we can find some way to deal with it.


Actually #3 follows from #2, but decided to have them separated.

3. process() parameters/behavior
    a) Our suggestion: user stores ptr to session ops (or to (*process) itself) and just does:
        session_ops->process(sess, ...);
	Advantages:
	1) fastest possible execution path
	2) no need to carry on dev_id for data-path

[Akhil] I don't see any overhead of carrying dev id, at least it would be inline with the
current DPDK methodology.
What you are suggesting is a new way to get the things done without much benefit.
Also I don't see any performance difference as crypto workload is heavier than
Code cycles, so that wont matter.
So IMO, there is no advantage in your suggestion as well.


	Disadvantages:
	3) user has to carry on session_ops pointer explicitly
    b) Your suggestion: add  (*cpu_process) inside rte_cryptodev_ops and then:
        rte_crypto_cpu_sym_process(uint8_t dev_id, rte_cryptodev_sym_session  *sess, /*data parameters*/) {...
                     rte_cryptodevs[dev_id].dev_ops->cpu_process(ses, ...);
                      /*and then inside PMD specifc process: */
                     pmd_private_session = sess->sess_data[this_pmd_driver_id].data;
                     /* and then most likely either */
                     pmd_private_session->process(pmd_private_session, ...);
                     /* or jump based on session/input data */
	Advantages:
	1) don't see any...
	Disadvantages:
	2) User has to carry on dev_id inside data-path
	3) Extra level of indirection (plus data dependency) - both for data and instructions.
	    Possible slowdown compared to a) (not measured). 
	 
Having said all this, if the disagreements cannot be resolved, you can go for a pmd API specific
to your PMDs, because as per my understanding the solution doesn't look scalable to other PMDs.
Your approach is aligned only to Intel, will not benefit others like openssl which is used by all
vendors.

Regards,
Akhil



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-10-18 13:17                                             ` Akhil Goyal
@ 2019-10-21 13:47                                               ` Ananyev, Konstantin
  2019-10-22 13:31                                                 ` Akhil Goyal
  0 siblings, 1 reply; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-10-21 13:47 UTC (permalink / raw)
  To: Akhil Goyal, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon',
	Zhang, Roy Fan, Doherty, Declan
  Cc: 'Anoob Joseph', Hemant Agrawal


Hi Akhil,

 
> Added my comments inline with your draft.
> >
> >
> > Hi Akhil,
> >
> > > > BTW, to be honest, I don't consider current rte_cryptodev_sym_session
> > > > construct for multiple device_ids:
> > > > __extension__ struct {
> > > >                 void *data;
> > > >                 uint16_t refcnt;
> > > >         } sess_data[0];
> > > >         /**< Driver specific session material, variable size */
> > > >
> > > Yes I also feel the same. I was also not in favor of this when it was introduced.
> > > Please go ahead and remove this. I have no issues with that.
> >
> > If you are not happy with that structure, and admit there are issues with it,
> > why do you push for reusing it for cpu-crypto API?
> > Why  not to take step back, take into account current drawbacks
> > and define something that (hopefully) would suite us better?
> > Again new API will be experimental for some time, so we'll
> > have some opportunity to see does it works and if not fix it.
> 
> [Akhil] This structure is serving some use case which is agreed upon in the
> Community, we cannot just remove a feature altogether.

I understand that, but we don't suggest to remove anything that already here.
We are talking about extending existing/adding new API.  
All our debates around how much we can reuse from existing one and what new
needs to be added.

> Rather it is Intel's  Use case only.
> 
> >
> > About removing data[] from existing rte_cryptodev_sym_session -
> > Personally would like to do that, but the change seems to be too massive.
> > Definitely not ready for such effort right now.
> >
> 
> [snip]..
> 
> >
> > Ok, then my suggestion:
> > Let's at least write down all points about crypto-dev approach where we
> > disagree and then probably try to resolve them one by one....
> > If we fail to make an agreement/progress in next week or so,
> > (and no more reviews from the community)
> > will have bring that subject to TB meeting to decide.
> > Sounds fair to you?
> Agreed
> >
> > List is below.
> > Please add/correct me, if I missed something.
> >
> > Konstantin
> 
> Before going into comparison, we should define the requirement as well.

Good point.

> What I understood from the patchset,
> "You need a synchronous API to perform crypto operations on raw data using SW PMDs"
> So,
> - no crypto-ops,
> - no separate enq-deq, only single process API for data path
> - Do not need any value addition to the session parameters.
>   (You would need some parameters from the crypto-op which
>    Are constant per session and since you wont use crypto-op,
>    You need some place to store that)

Yes, this is correct, I think.

> 
> Now as per your mail, the comparison
> 1. extra input parameters to create/init rte_(cpu)_sym_session.
> 
> Will leverage existing 6B gap inside rte_crypto_*_xform between 'algo' and 'key' fields.
> New fields will be optional and would be used by PMD only when cpu-crypto session is requested.
> For lksd-crypto session PMD is free to ignore these fields.
> No ABI breakage is required.
> 
> [Akhil] Agreed, no issues.
> 
> 2. cpu-crypto create/init.
>     a) Our suggestion - introduce new API for that:
>         - rte_crypto_cpu_sym_init() that would init completely opaque  rte_crypto_cpu_sym_session.
>         - struct rte_crypto_cpu_sym_session_ops {(*process)(...); (*clear); /*whatever else we'll need *'};
>         - rte_crypto_cpu_sym_get_ops(const struct rte_crypto_sym_xform *xforms)
>           that would return const struct rte_crypto_cpu_sym_session_ops *based on input xforms.
> 	Advantages:
> 	1)  totally opaque data structure (no ABI breakages in future), PMD writer is totally free
> 	     with it format and contents.
> 
> [Akhil] It will have breakage at some point till we don't hit the union size.

Not sure, what union you are talking about?

> Rather I don't suspect there will be more parameters added.
> Or do we really care about the ABI breakage when the argument is about
> the correct place to add a piece of code or do we really agree to add code
> anywhere just to avoid that breakage.

I am talking about maintaining it in future.
if your struct is not seen externally, no chances to introduce ABI breakage. 

> 
> 	2) each session entity is self-contained, user doesn't need to bring along dev_id etc.
> 	    dev_id is needed  only at init stage, after that user will use session ops to perform
> 	    all operations on that session (process(), clear(), etc.).
> 
> [Akhil] There is nothing called as session ops in current DPDK.

True, but it doesn't mean we can't/shouldn't have it.

> What you are proposing
> is a new concept which doesn't have any extra benefit, rather it is adding complexity
> to have two different code paths for session create.
> 
> 
> 	3) User can decide does he wants to store ops[] pointer on a per session basis,
> 	    or on a per group of same sessions, or...
> 
> [Akhil] Will the user really care which process API should be called from the PMD.
> Rather it should be driver's responsibility to store that in the session private data
> which would be opaque to the user. As per my suggestion same process function can
> be added to multiple sessions or a single session can be managed inside the PMD.

In that case we either need to have a function per session (stored internally),
or make decision (branches) at run-time.
But as I said in other mail - I am ok to add small shim structure here:
either rte_crypto_cpu_sym_session { void *ses; struct rte_crypto_cpu_sym_session_ops ops; }
or rte_crypto_cpu_sym_session { void *ses; struct rte_crypto_cpu_sym_session_ops *ops; } 
And merge rte_crypto_cpu_sym_init() and rte_crypto_cpu_sym_get_ops() into one (init).

> 
> 
> 	4) No mandatory mempools for private sessions. User can allocate memory for cpu-crypto
> 	    session whenever he likes.
> 
> [Akhil] you mean session private data? 

Yes.

> You would need that memory anyways, user will be
> allocating that already.  You do not need to manage that.

What I am saying - right now user has no choice but to allocate it via mempool.
Which is probably not the best options for all cases.

> 
> 	Disadvantages:
> 	5) Extra changes in control path
> 	6) User has to store session_ops pointer explicitly.
> 
> [Akhil] More disadvantages:
> - All supporting PMDs will need to maintain TWO types of session for the
> same crypto processing. Suppose a fix or a new feature(or algo) is added, PMD owner
> will need to add code in both the session create APIs. Hence more maintenance and
> error prone.

I think majority of code for both paths will be common, plus even we'll reuse current sym_session_init() -
changes in PMD session_init() code will be unavoidable. 
But yes, it will be new entry in devops, that PMD will have to support.
Ok to add it as 7) to the list.

> - Stacks which will be using these new APIs also need to maintain two
> code path for the same processing while doing session initialization
> for sync and async

That's the same as #5 above, I think.

> 
> 
>      b) Your suggestion - reuse existing rte_cryptodev_sym_session_init() and existing rte_cryptodev_sym_session
>       structure.
> 	Advantages:
> 	1) allows to reuse same struct and init/create/clear() functions.
> 	    Probably less changes in control path.
> 	Disadvantages:
> 	2) rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which means that
> 	    we can't use the same rte_cryptodev_sym_session to hold private sessions pointers
> 	    for both sync and async mode  for the same device.
>                    So the only option we have - make PMD devops->sym_session_configure()
> 	    always create a session that can work in both cpu and lksd modes.
> 	    For some implementations that would probably mean that under the hood  PMD would create
> 	    2 different session structs (sync/async) and then use one or another depending on from what API been called.
> 	    Seems doable, but ...:
>                    - will contradict with statement from 1:
> 	      " New fields will be optional and would be used by PMD only when cpu-crypto session is requested."
>                       Now it becomes mandatory for all apps to specify cpu-crypto related parameters too,
> 	       even if they don't plan to use that mode - i.e. behavior change, existing app change.
>                      - might cause extra space overhead.
> 
> [Akhil] It will not contradict with #1, you will only have few checks in the session init PMD
> Which support this mode, find appropriate values and set the appropriate process() in it.
> User should be able to call, legacy enq-deq as well as the new process() without any issue.
> User would be at runtime will be able to change the datapath.
> So this is not a disadvantage, it would be additional flexibility for the user.

Ok, but that's what I am saying - if PMD would *always* have to create a session that can handle
both modes (sync/async), then user would *always* have to provide parameters for both modes too.
Otherwise if let say user didn't setup sync specific parameters at all, what PMD should do?
  - return with error?
  - init session that can be used with async path only?
My current assumption is #1.
If #2, then how user will be able to distinguish is that session valid for both modes, or only for one? 


> 
> 
> 	3) not possible to store device (not driver) specific data within the session, but I think it is not really needed right now.
> 	    So probably minor compared to 2.b.2.
> 
> [Akhil] So lets omit this for current discussion. And I hope we can find some way to deal with it.

I don't think there is an easy way to fix that with existing API.

> 
> 
> Actually #3 follows from #2, but decided to have them separated.
> 
> 3. process() parameters/behavior
>     a) Our suggestion: user stores ptr to session ops (or to (*process) itself) and just does:
>         session_ops->process(sess, ...);
> 	Advantages:
> 	1) fastest possible execution path
> 	2) no need to carry on dev_id for data-path
> 
> [Akhil] I don't see any overhead of carrying dev id, at least it would be inline with the
> current DPDK methodology.

If we'll add process() into rte_cryptodev itself (same as we have enqueue_burst/dequeue_burst),
then it will be an ABI breakage.
Also there are discussions to get rid of that approach completely:
http://mails.dpdk.org/archives/dev/2019-September/144674.html
So I am not sure this is a recommended way these days.

> What you are suggesting is a new way to get the things done without much benefit.

Would help with ABI stability plus better performance, isn't it enough?

> Also I don't see any performance difference as crypto workload is heavier than
> Code cycles, so that wont matter.

It depends.
Suppose function call costs you ~30 cycles.
If you have burst of big packets (let say crypto for each will take ~2K cycles) that belong
to the same session, then yes you wouldn't notice these extra 30 cycles at all.
If you have burst of small packets (let say crypto for each will take ~300 cycles)  each
belongs to different session, then it will cost you ~10% extra.

> So IMO, there is no advantage in your suggestion as well.
> 
> 
> 	Disadvantages:
> 	3) user has to carry on session_ops pointer explicitly
>     b) Your suggestion: add  (*cpu_process) inside rte_cryptodev_ops and then:
>         rte_crypto_cpu_sym_process(uint8_t dev_id, rte_cryptodev_sym_session  *sess, /*data parameters*/) {...
>                      rte_cryptodevs[dev_id].dev_ops->cpu_process(ses, ...);
>                       /*and then inside PMD specifc process: */
>                      pmd_private_session = sess->sess_data[this_pmd_driver_id].data;
>                      /* and then most likely either */
>                      pmd_private_session->process(pmd_private_session, ...);
>                      /* or jump based on session/input data */
> 	Advantages:
> 	1) don't see any...
> 	Disadvantages:
> 	2) User has to carry on dev_id inside data-path
> 	3) Extra level of indirection (plus data dependency) - both for data and instructions.
> 	    Possible slowdown compared to a) (not measured).
> 
> Having said all this, if the disagreements cannot be resolved, you can go for a pmd API specific
> to your PMDs,

I don't think it is good idea.
PMD specific API is sort of deprecated path, also there is no clean way to use it within the libraries.

> because as per my understanding the solution doesn't look scalable to other PMDs.
> Your approach is aligned only to Intel , will not benefit others like openssl which is used by all
> vendors.

I feel quite opposite, from my perspective majority of SW backed PMDs will benefit from it.
And I don't see anything Intel specific in my proposals above. 
About openssl PMD: I am not an expert here, but looking at the code, I think it will fit really well.
Look yourself at its internal functions: process_openssl_auth_op/process_openssl_crypto_op,
I think they doing exactly the same - they use sync API underneath, and they are session based
(AFAIK you don't need any device/queue data, everything that needed for crypto/auth is stored inside session).

Konstantin 
 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-10-21 13:47                                               ` Ananyev, Konstantin
@ 2019-10-22 13:31                                                 ` Akhil Goyal
  2019-10-22 17:44                                                   ` Ananyev, Konstantin
  0 siblings, 1 reply; 87+ messages in thread
From: Akhil Goyal @ 2019-10-22 13:31 UTC (permalink / raw)
  To: Ananyev, Konstantin, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon',
	Zhang, Roy Fan, Doherty, Declan
  Cc: 'Anoob Joseph', Hemant Agrawal


Hi Konstantin,
> 
> 
> Hi Akhil,
> 
> 
> > Added my comments inline with your draft.
> > [snip]..
> >
> > >
> > > Ok, then my suggestion:
> > > Let's at least write down all points about crypto-dev approach where we
> > > disagree and then probably try to resolve them one by one....
> > > If we fail to make an agreement/progress in next week or so,
> > > (and no more reviews from the community)
> > > will have bring that subject to TB meeting to decide.
> > > Sounds fair to you?
> > Agreed
> > >
> > > List is below.
> > > Please add/correct me, if I missed something.
> > >
> > > Konstantin
> >
> > Before going into comparison, we should define the requirement as well.
> 
> Good point.
> 
> > What I understood from the patchset,
> > "You need a synchronous API to perform crypto operations on raw data using
> SW PMDs"
> > So,
> > - no crypto-ops,
> > - no separate enq-deq, only single process API for data path
> > - Do not need any value addition to the session parameters.
> >   (You would need some parameters from the crypto-op which
> >    Are constant per session and since you wont use crypto-op,
> >    You need some place to store that)
> 
> Yes, this is correct, I think.
> 
> >
> > Now as per your mail, the comparison
> > 1. extra input parameters to create/init rte_(cpu)_sym_session.
> >
> > Will leverage existing 6B gap inside rte_crypto_*_xform between 'algo' and
> 'key' fields.
> > New fields will be optional and would be used by PMD only when cpu-crypto
> session is requested.
> > For lksd-crypto session PMD is free to ignore these fields.
> > No ABI breakage is required.
> >
> > [Akhil] Agreed, no issues.
> >
> > 2. cpu-crypto create/init.
> >     a) Our suggestion - introduce new API for that:
> >         - rte_crypto_cpu_sym_init() that would init completely opaque
> rte_crypto_cpu_sym_session.
> >         - struct rte_crypto_cpu_sym_session_ops {(*process)(...); (*clear);
> /*whatever else we'll need *'};
> >         - rte_crypto_cpu_sym_get_ops(const struct rte_crypto_sym_xform
> *xforms)
> >           that would return const struct rte_crypto_cpu_sym_session_ops *based
> on input xforms.
> > 	Advantages:
> > 	1)  totally opaque data structure (no ABI breakages in future), PMD
> writer is totally free
> > 	     with it format and contents.
> >
> > [Akhil] It will have breakage at some point till we don't hit the union size.
> 
> Not sure, what union you are talking about?

Union of xforms in rte_security_session_conf

> 
> > Rather I don't suspect there will be more parameters added.
> > Or do we really care about the ABI breakage when the argument is about
> > the correct place to add a piece of code or do we really agree to add code
> > anywhere just to avoid that breakage.
> 
> I am talking about maintaining it in future.
> if your struct is not seen externally, no chances to introduce ABI breakage.
> 
> >
> > 	2) each session entity is self-contained, user doesn't need to bring along
> dev_id etc.
> > 	    dev_id is needed  only at init stage, after that user will use session ops
> to perform
> > 	    all operations on that session (process(), clear(), etc.).
> >
> > [Akhil] There is nothing called as session ops in current DPDK.
> 
> True, but it doesn't mean we can't/shouldn't have it.

We can have it if it is not adding complexity for the user. Creating 2 different code
Paths for user is not desirable for the stack developers.

> 
> > What you are proposing
> > is a new concept which doesn't have any extra benefit, rather it is adding
> complexity
> > to have two different code paths for session create.
> >
> >
> > 	3) User can decide does he wants to store ops[] pointer on a per session
> basis,
> > 	    or on a per group of same sessions, or...
> >
> > [Akhil] Will the user really care which process API should be called from the
> PMD.
> > Rather it should be driver's responsibility to store that in the session private
> data
> > which would be opaque to the user. As per my suggestion same process
> function can
> > be added to multiple sessions or a single session can be managed inside the
> PMD.
> 
> In that case we either need to have a function per session (stored internally),
> or make decision (branches) at run-time.
> But as I said in other mail - I am ok to add small shim structure here:
> either rte_crypto_cpu_sym_session { void *ses; struct
> rte_crypto_cpu_sym_session_ops ops; }
> or rte_crypto_cpu_sym_session { void *ses; struct
> rte_crypto_cpu_sym_session_ops *ops; }
> And merge rte_crypto_cpu_sym_init() and rte_crypto_cpu_sym_get_ops() into
> one (init).

Again that will be a separate API call from the user perspective which is not good.

> 
> >
> >
> > 	4) No mandatory mempools for private sessions. User can allocate
> memory for cpu-crypto
> > 	    session whenever he likes.
> >
> > [Akhil] you mean session private data?
> 
> Yes.
> 
> > You would need that memory anyways, user will be
> > allocating that already.  You do not need to manage that.
> 
> What I am saying - right now user has no choice but to allocate it via mempool.
> Which is probably not the best options for all cases.
> 
> >
> > 	Disadvantages:
> > 	5) Extra changes in control path
> > 	6) User has to store session_ops pointer explicitly.
> >
> > [Akhil] More disadvantages:
> > - All supporting PMDs will need to maintain TWO types of session for the
> > same crypto processing. Suppose a fix or a new feature(or algo) is added, PMD
> owner
> > will need to add code in both the session create APIs. Hence more
> maintenance and
> > error prone.
> 
> I think majority of code for both paths will be common, plus even we'll reuse
> current sym_session_init() -
> changes in PMD session_init() code will be unavoidable.
> But yes, it will be new entry in devops, that PMD will have to support.
> Ok to add it as 7) to the list.
> 
> > - Stacks which will be using these new APIs also need to maintain two
> > code path for the same processing while doing session initialization
> > for sync and async
> 
> That's the same as #5 above, I think.
> 
> >
> >
> >      b) Your suggestion - reuse existing rte_cryptodev_sym_session_init() and
> existing rte_cryptodev_sym_session
> >       structure.
> > 	Advantages:
> > 	1) allows to reuse same struct and init/create/clear() functions.
> > 	    Probably less changes in control path.
> > 	Disadvantages:
> > 	2) rte_cryptodev_sym_session. sess_data[] is indexed by driver_id,
> which means that
> > 	    we can't use the same rte_cryptodev_sym_session to hold private
> sessions pointers
> > 	    for both sync and async mode  for the same device.
> >                    So the only option we have - make PMD devops-
> >sym_session_configure()
> > 	    always create a session that can work in both cpu and lksd modes.
> > 	    For some implementations that would probably mean that under the
> hood  PMD would create
> > 	    2 different session structs (sync/async) and then use one or another
> depending on from what API been called.
> > 	    Seems doable, but ...:
> >                    - will contradict with statement from 1:
> > 	      " New fields will be optional and would be used by PMD only when
> cpu-crypto session is requested."
> >                       Now it becomes mandatory for all apps to specify cpu-crypto
> related parameters too,
> > 	       even if they don't plan to use that mode - i.e. behavior change,
> existing app change.
> >                      - might cause extra space overhead.
> >
> > [Akhil] It will not contradict with #1, you will only have few checks in the
> session init PMD
> > Which support this mode, find appropriate values and set the appropriate
> process() in it.
> > User should be able to call, legacy enq-deq as well as the new process()
> without any issue.
> > User would be at runtime will be able to change the datapath.
> > So this is not a disadvantage, it would be additional flexibility for the user.
> 
> Ok, but that's what I am saying - if PMD would *always* have to create a
> session that can handle
> both modes (sync/async), then user would *always* have to provide parameters
> for both modes too.
> Otherwise if let say user didn't setup sync specific parameters at all, what PMD
> should do?
>   - return with error?
>   - init session that can be used with async path only?
> My current assumption is #1.
> If #2, then how user will be able to distinguish is that session valid for both
> modes, or only for one?

I would say a 3rd option, do nothing if sync params are not set.
Probably have a debug print in the PMD(which support sync mode) to specify that 
session is not configured properly for sync mode.
Internally the PMD will not store the process() API in the session priv data
And while calling the first packet, devops->process will give an assert that session
Is not configured for sync mode. The session validation would be done in any case
your suggestion or mine. So no extra overhead at runtime.

> 
> 
> >
> >
> > 	3) not possible to store device (not driver) specific data within the
> session, but I think it is not really needed right now.
> > 	    So probably minor compared to 2.b.2.
> >
> > [Akhil] So lets omit this for current discussion. And I hope we can find some
> way to deal with it.
> 
> I don't think there is an easy way to fix that with existing API.
> 
> >
> >
> > Actually #3 follows from #2, but decided to have them separated.
> >
> > 3. process() parameters/behavior
> >     a) Our suggestion: user stores ptr to session ops (or to (*process) itself) and
> just does:
> >         session_ops->process(sess, ...);
> > 	Advantages:
> > 	1) fastest possible execution path
> > 	2) no need to carry on dev_id for data-path
> >
> > [Akhil] I don't see any overhead of carrying dev id, at least it would be inline
> with the
> > current DPDK methodology.
> 
> If we'll add process() into rte_cryptodev itself (same as we have
> enqueue_burst/dequeue_burst),
> then it will be an ABI breakage.
> Also there are discussions to get rid of that approach completely:
> http://mails.dpdk.org/archives/dev/2019-September/144674.html
> So I am not sure this is a recommended way these days.

We can either have it in rte_cryptodev or in rte_cryptodev_ops whichever
is good for you.

Whether it is ABI breakage or not, as per your requirements, this is the correct
approach. Do you agree with this or not?

Now handling the API/ABI breakage is a separate story. In 19.11 release we 
Are not much concerned about the ABI breakages, this was discussed in
community. So adding a new dev_ops wouldn't have been an issue.
Now since we are so close to RC1 deadline, we should come up with some
other solution for next release. May be having a pmd API in 20.02 and 
converting it into formal one in 20.11


> 
> > What you are suggesting is a new way to get the things done without much
> benefit.
> 
> Would help with ABI stability plus better performance, isn't it enough?
> 
> > Also I don't see any performance difference as crypto workload is heavier than
> > Code cycles, so that wont matter.
> 
> It depends.
> Suppose function call costs you ~30 cycles.
> If you have burst of big packets (let say crypto for each will take ~2K cycles) that
> belong
> to the same session, then yes you wouldn't notice these extra 30 cycles at all.
> If you have burst of small packets (let say crypto for each will take ~300 cycles)
> each
> belongs to different session, then it will cost you ~10% extra.

Let us do some profiling on openssl with both the approaches and find out the
difference.

> 
> > So IMO, there is no advantage in your suggestion as well.
> >
> >
> > 	Disadvantages:
> > 	3) user has to carry on session_ops pointer explicitly
> >     b) Your suggestion: add  (*cpu_process) inside rte_cryptodev_ops and then:
> >         rte_crypto_cpu_sym_process(uint8_t dev_id, rte_cryptodev_sym_session
> *sess, /*data parameters*/) {...
> >                      rte_cryptodevs[dev_id].dev_ops->cpu_process(ses, ...);
> >                       /*and then inside PMD specifc process: */
> >                      pmd_private_session = sess->sess_data[this_pmd_driver_id].data;
> >                      /* and then most likely either */
> >                      pmd_private_session->process(pmd_private_session, ...);
> >                      /* or jump based on session/input data */
> > 	Advantages:
> > 	1) don't see any...
> > 	Disadvantages:
> > 	2) User has to carry on dev_id inside data-path
> > 	3) Extra level of indirection (plus data dependency) - both for data and
> instructions.
> > 	    Possible slowdown compared to a) (not measured).
> >
> > Having said all this, if the disagreements cannot be resolved, you can go for a
> pmd API specific
> > to your PMDs,
> 
> I don't think it is good idea.
> PMD specific API is sort of deprecated path, also there is no clean way to use it
> within the libraries.

I know that this is a deprecated path, we can use it until we are not allowed
to break ABI/API

> 
> > because as per my understanding the solution doesn't look scalable to other
> PMDs.
> > Your approach is aligned only to Intel , will not benefit others like openssl
> which is used by all
> > vendors.
> 
> I feel quite opposite, from my perspective majority of SW backed PMDs will
> benefit from it.
> And I don't see anything Intel specific in my proposals above.
> About openssl PMD: I am not an expert here, but looking at the code, I think it
> will fit really well.
> Look yourself at its internal functions:
> process_openssl_auth_op/process_openssl_crypto_op,
> I think they doing exactly the same - they use sync API underneath, and they are
> session based
> (AFAIK you don't need any device/queue data, everything that needed for
> crypto/auth is stored inside session).
> 
By vendor specific, I mean, 
- no PMD would like to have 2 different variants of session Init APIs for doing the same stuff.
- stacks will become vendor specific while using 2 separate session create APIs. No stack would
Like to support 2 variants of session create- one for HW PMDs and one for SW PMDs.

-Akhil


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-10-22 13:31                                                 ` Akhil Goyal
@ 2019-10-22 17:44                                                   ` Ananyev, Konstantin
  2019-10-22 22:21                                                     ` Ananyev, Konstantin
  2019-10-23 10:05                                                     ` Akhil Goyal
  0 siblings, 2 replies; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-10-22 17:44 UTC (permalink / raw)
  To: Akhil Goyal, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon',
	Zhang, Roy Fan, Doherty, Declan
  Cc: 'Anoob Joseph', Hemant Agrawal


Hi Akhil,


> > > Added my comments inline with your draft.
> > > [snip]..
> > >
> > > >
> > > > Ok, then my suggestion:
> > > > Let's at least write down all points about crypto-dev approach where we
> > > > disagree and then probably try to resolve them one by one....
> > > > If we fail to make an agreement/progress in next week or so,
> > > > (and no more reviews from the community)
> > > > will have bring that subject to TB meeting to decide.
> > > > Sounds fair to you?
> > > Agreed
> > > >
> > > > List is below.
> > > > Please add/correct me, if I missed something.
> > > >
> > > > Konstantin
> > >
> > > Before going into comparison, we should define the requirement as well.
> >
> > Good point.
> >
> > > What I understood from the patchset,
> > > "You need a synchronous API to perform crypto operations on raw data using
> > SW PMDs"
> > > So,
> > > - no crypto-ops,
> > > - no separate enq-deq, only single process API for data path
> > > - Do not need any value addition to the session parameters.
> > >   (You would need some parameters from the crypto-op which
> > >    Are constant per session and since you wont use crypto-op,
> > >    You need some place to store that)
> >
> > Yes, this is correct, I think.
> >
> > >
> > > Now as per your mail, the comparison
> > > 1. extra input parameters to create/init rte_(cpu)_sym_session.
> > >
> > > Will leverage existing 6B gap inside rte_crypto_*_xform between 'algo' and
> > 'key' fields.
> > > New fields will be optional and would be used by PMD only when cpu-crypto
> > session is requested.
> > > For lksd-crypto session PMD is free to ignore these fields.
> > > No ABI breakage is required.
> > >
> > > [Akhil] Agreed, no issues.
> > >
> > > 2. cpu-crypto create/init.
> > >     a) Our suggestion - introduce new API for that:
> > >         - rte_crypto_cpu_sym_init() that would init completely opaque
> > rte_crypto_cpu_sym_session.
> > >         - struct rte_crypto_cpu_sym_session_ops {(*process)(...); (*clear);
> > /*whatever else we'll need *'};
> > >         - rte_crypto_cpu_sym_get_ops(const struct rte_crypto_sym_xform
> > *xforms)
> > >           that would return const struct rte_crypto_cpu_sym_session_ops *based
> > on input xforms.
> > > 	Advantages:
> > > 	1)  totally opaque data structure (no ABI breakages in future), PMD
> > writer is totally free
> > > 	     with it format and contents.
> > >
> > > [Akhil] It will have breakage at some point till we don't hit the union size.
> >
> > Not sure, what union you are talking about?
> 
> Union of xforms in rte_security_session_conf

Hmm, how does it relates here?
I thought we discussing pure rte_cryptodev_sym_session, no?

> 
> >
> > > Rather I don't suspect there will be more parameters added.
> > > Or do we really care about the ABI breakage when the argument is about
> > > the correct place to add a piece of code or do we really agree to add code
> > > anywhere just to avoid that breakage.
> >
> > I am talking about maintaining it in future.
> > if your struct is not seen externally, no chances to introduce ABI breakage.
> >
> > >
> > > 	2) each session entity is self-contained, user doesn't need to bring along
> > dev_id etc.
> > > 	    dev_id is needed  only at init stage, after that user will use session ops
> > to perform
> > > 	    all operations on that session (process(), clear(), etc.).
> > >
> > > [Akhil] There is nothing called as session ops in current DPDK.
> >
> > True, but it doesn't mean we can't/shouldn't have it.
> 
> We can have it if it is not adding complexity for the user. Creating 2 different code
> Paths for user is not desirable for the stack developers.
> 
> >
> > > What you are proposing
> > > is a new concept which doesn't have any extra benefit, rather it is adding
> > complexity
> > > to have two different code paths for session create.
> > >
> > >
> > > 	3) User can decide does he wants to store ops[] pointer on a per session
> > basis,
> > > 	    or on a per group of same sessions, or...
> > >
> > > [Akhil] Will the user really care which process API should be called from the
> > PMD.
> > > Rather it should be driver's responsibility to store that in the session private
> > data
> > > which would be opaque to the user. As per my suggestion same process
> > function can
> > > be added to multiple sessions or a single session can be managed inside the
> > PMD.
> >
> > In that case we either need to have a function per session (stored internally),
> > or make decision (branches) at run-time.
> > But as I said in other mail - I am ok to add small shim structure here:
> > either rte_crypto_cpu_sym_session { void *ses; struct
> > rte_crypto_cpu_sym_session_ops ops; }
> > or rte_crypto_cpu_sym_session { void *ses; struct
> > rte_crypto_cpu_sym_session_ops *ops; }
> > And merge rte_crypto_cpu_sym_init() and rte_crypto_cpu_sym_get_ops() into
> > one (init).
> 
> Again that will be a separate API call from the user perspective which is not good.
> 
> >
> > >
> > >
> > > 	4) No mandatory mempools for private sessions. User can allocate
> > memory for cpu-crypto
> > > 	    session whenever he likes.
> > >
> > > [Akhil] you mean session private data?
> >
> > Yes.
> >
> > > You would need that memory anyways, user will be
> > > allocating that already.  You do not need to manage that.
> >
> > What I am saying - right now user has no choice but to allocate it via mempool.
> > Which is probably not the best options for all cases.
> >
> > >
> > > 	Disadvantages:
> > > 	5) Extra changes in control path
> > > 	6) User has to store session_ops pointer explicitly.
> > >
> > > [Akhil] More disadvantages:
> > > - All supporting PMDs will need to maintain TWO types of session for the
> > > same crypto processing. Suppose a fix or a new feature(or algo) is added, PMD
> > owner
> > > will need to add code in both the session create APIs. Hence more
> > maintenance and
> > > error prone.
> >
> > I think majority of code for both paths will be common, plus even we'll reuse
> > current sym_session_init() -
> > changes in PMD session_init() code will be unavoidable.
> > But yes, it will be new entry in devops, that PMD will have to support.
> > Ok to add it as 7) to the list.
> >
> > > - Stacks which will be using these new APIs also need to maintain two
> > > code path for the same processing while doing session initialization
> > > for sync and async
> >
> > That's the same as #5 above, I think.
> >
> > >
> > >
> > >      b) Your suggestion - reuse existing rte_cryptodev_sym_session_init() and
> > existing rte_cryptodev_sym_session
> > >       structure.
> > > 	Advantages:
> > > 	1) allows to reuse same struct and init/create/clear() functions.
> > > 	    Probably less changes in control path.
> > > 	Disadvantages:
> > > 	2) rte_cryptodev_sym_session. sess_data[] is indexed by driver_id,
> > which means that
> > > 	    we can't use the same rte_cryptodev_sym_session to hold private
> > sessions pointers
> > > 	    for both sync and async mode  for the same device.
> > >                    So the only option we have - make PMD devops-
> > >sym_session_configure()
> > > 	    always create a session that can work in both cpu and lksd modes.
> > > 	    For some implementations that would probably mean that under the
> > hood  PMD would create
> > > 	    2 different session structs (sync/async) and then use one or another
> > depending on from what API been called.
> > > 	    Seems doable, but ...:
> > >                    - will contradict with statement from 1:
> > > 	      " New fields will be optional and would be used by PMD only when
> > cpu-crypto session is requested."
> > >                       Now it becomes mandatory for all apps to specify cpu-crypto
> > related parameters too,
> > > 	       even if they don't plan to use that mode - i.e. behavior change,
> > existing app change.
> > >                      - might cause extra space overhead.
> > >
> > > [Akhil] It will not contradict with #1, you will only have few checks in the
> > session init PMD
> > > Which support this mode, find appropriate values and set the appropriate
> > process() in it.
> > > User should be able to call, legacy enq-deq as well as the new process()
> > without any issue.
> > > User would be at runtime will be able to change the datapath.
> > > So this is not a disadvantage, it would be additional flexibility for the user.
> >
> > Ok, but that's what I am saying - if PMD would *always* have to create a
> > session that can handle
> > both modes (sync/async), then user would *always* have to provide parameters
> > for both modes too.
> > Otherwise if let say user didn't setup sync specific parameters at all, what PMD
> > should do?
> >   - return with error?
> >   - init session that can be used with async path only?
> > My current assumption is #1.
> > If #2, then how user will be able to distinguish is that session valid for both
> > modes, or only for one?
> 
> I would say a 3rd option, do nothing if sync params are not set.
> Probably have a debug print in the PMD(which support sync mode) to specify that
> session is not configured properly for sync mode.

So, just print warning and proceed with init session that can be used with async path only?
Then it sounds the same as #2 above.	
Which actually means that sync mode parameters for sym_session_init() becomes optional.
Then we need an API to provide to the user information what modes
(sync+async/async only) is supported by that session for given dev_id.
And user would have to query/retain this information at control-path,
and store it somewhere in user-space together with session pointer and dev_ids
to use later at data-path (same as we do now for session type).
That definitely requires changes in control-path to start using it.
Plus the fact that this value can differ for different dev_ids for the same session -
doesn't make things easier here. 

> Internally the PMD will not store the process() API in the session priv data
> And while calling the first packet, devops->process will give an assert that session
> Is not configured for sync mode. The session validation would be done in any case
> your suggestion or mine. So no extra overhead at runtime.

I believe that after session_init() user should get either an error or
valid  session handler that he can use at runtime.
Pushing session validation to runtime doesn't seem like a good idea.

> 
> >
> >
> > >
> > >
> > > 	3) not possible to store device (not driver) specific data within the
> > session, but I think it is not really needed right now.
> > > 	    So probably minor compared to 2.b.2.
> > >
> > > [Akhil] So lets omit this for current discussion. And I hope we can find some
> > way to deal with it.
> >
> > I don't think there is an easy way to fix that with existing API.
> >
> > >
> > >
> > > Actually #3 follows from #2, but decided to have them separated.
> > >
> > > 3. process() parameters/behavior
> > >     a) Our suggestion: user stores ptr to session ops (or to (*process) itself) and
> > just does:
> > >         session_ops->process(sess, ...);
> > > 	Advantages:
> > > 	1) fastest possible execution path
> > > 	2) no need to carry on dev_id for data-path
> > >
> > > [Akhil] I don't see any overhead of carrying dev id, at least it would be inline
> > with the
> > > current DPDK methodology.
> >
> > If we'll add process() into rte_cryptodev itself (same as we have
> > enqueue_burst/dequeue_burst),
> > then it will be an ABI breakage.
> > Also there are discussions to get rid of that approach completely:
> > http://mails.dpdk.org/archives/dev/2019-September/144674.html
> > So I am not sure this is a recommended way these days.
> 
> We can either have it in rte_cryptodev or in rte_cryptodev_ops whichever
> is good for you.
> 
> Whether it is ABI breakage or not, as per your requirements, this is the correct
> approach. Do you agree with this or not?

I think it is possible approach, but not the best one:
it looks quite flakey to me (see all these uncertainty with sym_session_init above),
plus introduces extra overhead at data-path.

> 
> Now handling the API/ABI breakage is a separate story. In 19.11 release we
> Are not much concerned about the ABI breakages, this was discussed in
> community. So adding a new dev_ops wouldn't have been an issue.
> Now since we are so close to RC1 deadline, we should come up with some
> other solution for next release. May be having a pmd API in 20.02 and
> converting it into formal one in 20.11
> 
> 
> >
> > > What you are suggesting is a new way to get the things done without much
> > benefit.
> >
> > Would help with ABI stability plus better performance, isn't it enough?
> >
> > > Also I don't see any performance difference as crypto workload is heavier than
> > > Code cycles, so that wont matter.
> >
> > It depends.
> > Suppose function call costs you ~30 cycles.
> > If you have burst of big packets (let say crypto for each will take ~2K cycles) that
> > belong
> > to the same session, then yes you wouldn't notice these extra 30 cycles at all.
> > If you have burst of small packets (let say crypto for each will take ~300 cycles)
> > each
> > belongs to different session, then it will cost you ~10% extra.
> 
> Let us do some profiling on openssl with both the approaches and find out the
> difference.
> 
> >
> > > So IMO, there is no advantage in your suggestion as well.
> > >
> > >
> > > 	Disadvantages:
> > > 	3) user has to carry on session_ops pointer explicitly
> > >     b) Your suggestion: add  (*cpu_process) inside rte_cryptodev_ops and then:
> > >         rte_crypto_cpu_sym_process(uint8_t dev_id, rte_cryptodev_sym_session
> > *sess, /*data parameters*/) {...
> > >                      rte_cryptodevs[dev_id].dev_ops->cpu_process(ses, ...);
> > >                       /*and then inside PMD specifc process: */
> > >                      pmd_private_session = sess->sess_data[this_pmd_driver_id].data;
> > >                      /* and then most likely either */
> > >                      pmd_private_session->process(pmd_private_session, ...);
> > >                      /* or jump based on session/input data */
> > > 	Advantages:
> > > 	1) don't see any...
> > > 	Disadvantages:
> > > 	2) User has to carry on dev_id inside data-path
> > > 	3) Extra level of indirection (plus data dependency) - both for data and
> > instructions.
> > > 	    Possible slowdown compared to a) (not measured).
> > >
> > > Having said all this, if the disagreements cannot be resolved, you can go for a
> > pmd API specific
> > > to your PMDs,
> >
> > I don't think it is good idea.
> > PMD specific API is sort of deprecated path, also there is no clean way to use it
> > within the libraries.
> 
> I know that this is a deprecated path, we can use it until we are not allowed
> to break ABI/API
> 
> >
> > > because as per my understanding the solution doesn't look scalable to other
> > PMDs.
> > > Your approach is aligned only to Intel , will not benefit others like openssl
> > which is used by all
> > > vendors.
> >
> > I feel quite opposite, from my perspective majority of SW backed PMDs will
> > benefit from it.
> > And I don't see anything Intel specific in my proposals above.
> > About openssl PMD: I am not an expert here, but looking at the code, I think it
> > will fit really well.
> > Look yourself at its internal functions:
> > process_openssl_auth_op/process_openssl_crypto_op,
> > I think they doing exactly the same - they use sync API underneath, and they are
> > session based
> > (AFAIK you don't need any device/queue data, everything that needed for
> > crypto/auth is stored inside session).
> >
> By vendor specific, I mean,
> - no PMD would like to have 2 different variants of session Init APIs for doing the same stuff.
> - stacks will become vendor specific while using 2 separate session create APIs. No stack would
> Like to support 2 variants of session create- one for HW PMDs and one for SW PMDs.

I think what you refer on has nothing to do with 'vendor specific'.
I would name it 'extra overhead for PMD and stack writers'.
Yes, for sure there is extra overhead (as always with new API) -
for both producer (PMD writer) and consumer (stack writer): 
New function(s) to support,  probably more tests to create/run, etc.
Though this API is optional - if PMD/stack maintainer doesn't see
value in it, they are free not to support it.
From other side, re-using  rte_cryptodev_sym_session_init()
wouldn't help anyway - both data-path and control-path would differ
from async mode anyway.
BTW, right now to support different HW flavors
we do have 4 different control and data-paths for both
ipsec-secgw and librte_ipsec:
lkds-none/lksd-proto/inline-crypto/inline-proto.
And that is considered to be ok.
Honestly, I don't understand why SW backed implementations
can't have their own path that would suite them most.
Konstantin 



 



^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-10-22 17:44                                                   ` Ananyev, Konstantin
@ 2019-10-22 22:21                                                     ` Ananyev, Konstantin
  2019-10-23 10:05                                                     ` Akhil Goyal
  1 sibling, 0 replies; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-10-22 22:21 UTC (permalink / raw)
  To: 'Akhil Goyal', 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon',
	Zhang, Roy Fan, Doherty, Declan
  Cc: 'Anoob Joseph', 'Hemant Agrawal'

> > > > Added my comments inline with your draft.
> > > > [snip]..
> > > >
> > > > >
> > > > > Ok, then my suggestion:
> > > > > Let's at least write down all points about crypto-dev approach where we
> > > > > disagree and then probably try to resolve them one by one....
> > > > > If we fail to make an agreement/progress in next week or so,
> > > > > (and no more reviews from the community)
> > > > > will have bring that subject to TB meeting to decide.
> > > > > Sounds fair to you?
> > > > Agreed
> > > > >
> > > > > List is below.
> > > > > Please add/correct me, if I missed something.
> > > > >
> > > > > Konstantin
> > > >
> > > > Before going into comparison, we should define the requirement as well.
> > >
> > > Good point.
> > >
> > > > What I understood from the patchset,
> > > > "You need a synchronous API to perform crypto operations on raw data using
> > > SW PMDs"
> > > > So,
> > > > - no crypto-ops,
> > > > - no separate enq-deq, only single process API for data path
> > > > - Do not need any value addition to the session parameters.
> > > >   (You would need some parameters from the crypto-op which
> > > >    Are constant per session and since you wont use crypto-op,
> > > >    You need some place to store that)
> > >
> > > Yes, this is correct, I think.
> > >
> > > >
> > > > Now as per your mail, the comparison
> > > > 1. extra input parameters to create/init rte_(cpu)_sym_session.
> > > >
> > > > Will leverage existing 6B gap inside rte_crypto_*_xform between 'algo' and
> > > 'key' fields.
> > > > New fields will be optional and would be used by PMD only when cpu-crypto
> > > session is requested.
> > > > For lksd-crypto session PMD is free to ignore these fields.
> > > > No ABI breakage is required.
> > > >
> > > > [Akhil] Agreed, no issues.
> > > >
> > > > 2. cpu-crypto create/init.
> > > >     a) Our suggestion - introduce new API for that:
> > > >         - rte_crypto_cpu_sym_init() that would init completely opaque
> > > rte_crypto_cpu_sym_session.
> > > >         - struct rte_crypto_cpu_sym_session_ops {(*process)(...); (*clear);
> > > /*whatever else we'll need *'};
> > > >         - rte_crypto_cpu_sym_get_ops(const struct rte_crypto_sym_xform
> > > *xforms)
> > > >           that would return const struct rte_crypto_cpu_sym_session_ops *based
> > > on input xforms.
> > > > 	Advantages:
> > > > 	1)  totally opaque data structure (no ABI breakages in future), PMD
> > > writer is totally free
> > > > 	     with it format and contents.
> > > >
> > > > [Akhil] It will have breakage at some point till we don't hit the union size.
> > >
> > > Not sure, what union you are talking about?
> >
> > Union of xforms in rte_security_session_conf
> 
> Hmm, how does it relates here?
> I thought we discussing pure rte_cryptodev_sym_session, no?
> 
> >
> > >
> > > > Rather I don't suspect there will be more parameters added.
> > > > Or do we really care about the ABI breakage when the argument is about
> > > > the correct place to add a piece of code or do we really agree to add code
> > > > anywhere just to avoid that breakage.
> > >
> > > I am talking about maintaining it in future.
> > > if your struct is not seen externally, no chances to introduce ABI breakage.
> > >
> > > >
> > > > 	2) each session entity is self-contained, user doesn't need to bring along
> > > dev_id etc.
> > > > 	    dev_id is needed  only at init stage, after that user will use session ops
> > > to perform
> > > > 	    all operations on that session (process(), clear(), etc.).
> > > >
> > > > [Akhil] There is nothing called as session ops in current DPDK.
> > >
> > > True, but it doesn't mean we can't/shouldn't have it.
> >
> > We can have it if it is not adding complexity for the user. Creating 2 different code
> > Paths for user is not desirable for the stack developers.
> >
> > >
> > > > What you are proposing
> > > > is a new concept which doesn't have any extra benefit, rather it is adding
> > > complexity
> > > > to have two different code paths for session create.
> > > >
> > > >
> > > > 	3) User can decide does he wants to store ops[] pointer on a per session
> > > basis,
> > > > 	    or on a per group of same sessions, or...
> > > >
> > > > [Akhil] Will the user really care which process API should be called from the
> > > PMD.
> > > > Rather it should be driver's responsibility to store that in the session private
> > > data
> > > > which would be opaque to the user. As per my suggestion same process
> > > function can
> > > > be added to multiple sessions or a single session can be managed inside the
> > > PMD.
> > >
> > > In that case we either need to have a function per session (stored internally),
> > > or make decision (branches) at run-time.
> > > But as I said in other mail - I am ok to add small shim structure here:
> > > either rte_crypto_cpu_sym_session { void *ses; struct
> > > rte_crypto_cpu_sym_session_ops ops; }
> > > or rte_crypto_cpu_sym_session { void *ses; struct
> > > rte_crypto_cpu_sym_session_ops *ops; }
> > > And merge rte_crypto_cpu_sym_init() and rte_crypto_cpu_sym_get_ops() into
> > > one (init).
> >
> > Again that will be a separate API call from the user perspective which is not good.
> >
> > >
> > > >
> > > >
> > > > 	4) No mandatory mempools for private sessions. User can allocate
> > > memory for cpu-crypto
> > > > 	    session whenever he likes.
> > > >
> > > > [Akhil] you mean session private data?
> > >
> > > Yes.
> > >
> > > > You would need that memory anyways, user will be
> > > > allocating that already.  You do not need to manage that.
> > >
> > > What I am saying - right now user has no choice but to allocate it via mempool.
> > > Which is probably not the best options for all cases.
> > >
> > > >
> > > > 	Disadvantages:
> > > > 	5) Extra changes in control path
> > > > 	6) User has to store session_ops pointer explicitly.
> > > >
> > > > [Akhil] More disadvantages:
> > > > - All supporting PMDs will need to maintain TWO types of session for the
> > > > same crypto processing. Suppose a fix or a new feature(or algo) is added, PMD
> > > owner
> > > > will need to add code in both the session create APIs. Hence more
> > > maintenance and
> > > > error prone.
> > >
> > > I think majority of code for both paths will be common, plus even we'll reuse
> > > current sym_session_init() -
> > > changes in PMD session_init() code will be unavoidable.
> > > But yes, it will be new entry in devops, that PMD will have to support.
> > > Ok to add it as 7) to the list.
> > >
> > > > - Stacks which will be using these new APIs also need to maintain two
> > > > code path for the same processing while doing session initialization
> > > > for sync and async
> > >
> > > That's the same as #5 above, I think.
> > >
> > > >
> > > >
> > > >      b) Your suggestion - reuse existing rte_cryptodev_sym_session_init() and
> > > existing rte_cryptodev_sym_session
> > > >       structure.
> > > > 	Advantages:
> > > > 	1) allows to reuse same struct and init/create/clear() functions.
> > > > 	    Probably less changes in control path.
> > > > 	Disadvantages:
> > > > 	2) rte_cryptodev_sym_session. sess_data[] is indexed by driver_id,
> > > which means that
> > > > 	    we can't use the same rte_cryptodev_sym_session to hold private
> > > sessions pointers
> > > > 	    for both sync and async mode  for the same device.
> > > >                    So the only option we have - make PMD devops-
> > > >sym_session_configure()
> > > > 	    always create a session that can work in both cpu and lksd modes.
> > > > 	    For some implementations that would probably mean that under the
> > > hood  PMD would create
> > > > 	    2 different session structs (sync/async) and then use one or another
> > > depending on from what API been called.
> > > > 	    Seems doable, but ...:
> > > >                    - will contradict with statement from 1:
> > > > 	      " New fields will be optional and would be used by PMD only when
> > > cpu-crypto session is requested."
> > > >                       Now it becomes mandatory for all apps to specify cpu-crypto
> > > related parameters too,
> > > > 	       even if they don't plan to use that mode - i.e. behavior change,
> > > existing app change.
> > > >                      - might cause extra space overhead.
> > > >
> > > > [Akhil] It will not contradict with #1, you will only have few checks in the
> > > session init PMD
> > > > Which support this mode, find appropriate values and set the appropriate
> > > process() in it.
> > > > User should be able to call, legacy enq-deq as well as the new process()
> > > without any issue.
> > > > User would be at runtime will be able to change the datapath.
> > > > So this is not a disadvantage, it would be additional flexibility for the user.
> > >
> > > Ok, but that's what I am saying - if PMD would *always* have to create a
> > > session that can handle
> > > both modes (sync/async), then user would *always* have to provide parameters
> > > for both modes too.
> > > Otherwise if let say user didn't setup sync specific parameters at all, what PMD
> > > should do?
> > >   - return with error?
> > >   - init session that can be used with async path only?
> > > My current assumption is #1.
> > > If #2, then how user will be able to distinguish is that session valid for both
> > > modes, or only for one?
> >
> > I would say a 3rd option, do nothing if sync params are not set.
> > Probably have a debug print in the PMD(which support sync mode) to specify that
> > session is not configured properly for sync mode.
> 
> So, just print warning and proceed with init session that can be used with async path only?
> Then it sounds the same as #2 above.
> Which actually means that sync mode parameters for sym_session_init() becomes optional.
> Then we need an API to provide to the user information what modes
> (sync+async/async only) is supported by that session for given dev_id.
> And user would have to query/retain this information at control-path,
> and store it somewhere in user-space together with session pointer and dev_ids
> to use later at data-path (same as we do now for session type).
> That definitely requires changes in control-path to start using it.
> Plus the fact that this value can differ for different dev_ids for the same session -
> doesn't make things easier here.
> 
> > Internally the PMD will not store the process() API in the session priv data
> > And while calling the first packet, devops->process will give an assert that session
> > Is not configured for sync mode. The session validation would be done in any case
> > your suggestion or mine. So no extra overhead at runtime.
> 
> I believe that after session_init() user should get either an error or
> valid  session handler that he can use at runtime.
> Pushing session validation to runtime doesn't seem like a good idea.
> 
> >
> > >
> > >
> > > >
> > > >
> > > > 	3) not possible to store device (not driver) specific data within the
> > > session, but I think it is not really needed right now.
> > > > 	    So probably minor compared to 2.b.2.
> > > >
> > > > [Akhil] So lets omit this for current discussion. And I hope we can find some
> > > way to deal with it.
> > >
> > > I don't think there is an easy way to fix that with existing API.
> > >
> > > >
> > > >
> > > > Actually #3 follows from #2, but decided to have them separated.
> > > >
> > > > 3. process() parameters/behavior
> > > >     a) Our suggestion: user stores ptr to session ops (or to (*process) itself) and
> > > just does:
> > > >         session_ops->process(sess, ...);
> > > > 	Advantages:
> > > > 	1) fastest possible execution path
> > > > 	2) no need to carry on dev_id for data-path
> > > >
> > > > [Akhil] I don't see any overhead of carrying dev id, at least it would be inline
> > > with the
> > > > current DPDK methodology.
> > >
> > > If we'll add process() into rte_cryptodev itself (same as we have
> > > enqueue_burst/dequeue_burst),
> > > then it will be an ABI breakage.
> > > Also there are discussions to get rid of that approach completely:
> > > http://mails.dpdk.org/archives/dev/2019-September/144674.html
> > > So I am not sure this is a recommended way these days.
> >
> > We can either have it in rte_cryptodev or in rte_cryptodev_ops whichever
> > is good for you.
> >
> > Whether it is ABI breakage or not, as per your requirements, this is the correct
> > approach. Do you agree with this or not?
> 
> I think it is possible approach, but not the best one:
> it looks quite flakey to me (see all these uncertainty with sym_session_init above),
> plus introduces extra overhead at data-path.
> 
> >
> > Now handling the API/ABI breakage is a separate story. In 19.11 release we
> > Are not much concerned about the ABI breakages, this was discussed in
> > community. So adding a new dev_ops wouldn't have been an issue.
> > Now since we are so close to RC1 deadline, we should come up with some
> > other solution for next release. May be having a pmd API in 20.02 and
> > converting it into formal one in 20.11
> >
> >
> > >
> > > > What you are suggesting is a new way to get the things done without much
> > > benefit.
> > >
> > > Would help with ABI stability plus better performance, isn't it enough?
> > >
> > > > Also I don't see any performance difference as crypto workload is heavier than
> > > > Code cycles, so that wont matter.
> > >
> > > It depends.
> > > Suppose function call costs you ~30 cycles.
> > > If you have burst of big packets (let say crypto for each will take ~2K cycles) that
> > > belong
> > > to the same session, then yes you wouldn't notice these extra 30 cycles at all.
> > > If you have burst of small packets (let say crypto for each will take ~300 cycles)
> > > each
> > > belongs to different session, then it will cost you ~10% extra.
> >
> > Let us do some profiling on openssl with both the approaches and find out the
> > difference.
> >
> > >
> > > > So IMO, there is no advantage in your suggestion as well.
> > > >
> > > >
> > > > 	Disadvantages:
> > > > 	3) user has to carry on session_ops pointer explicitly
> > > >     b) Your suggestion: add  (*cpu_process) inside rte_cryptodev_ops and then:
> > > >         rte_crypto_cpu_sym_process(uint8_t dev_id, rte_cryptodev_sym_session
> > > *sess, /*data parameters*/) {...
> > > >                      rte_cryptodevs[dev_id].dev_ops->cpu_process(ses, ...);
> > > >                       /*and then inside PMD specifc process: */
> > > >                      pmd_private_session = sess->sess_data[this_pmd_driver_id].data;
> > > >                      /* and then most likely either */
> > > >                      pmd_private_session->process(pmd_private_session, ...);
> > > >                      /* or jump based on session/input data */
> > > > 	Advantages:
> > > > 	1) don't see any...
> > > > 	Disadvantages:
> > > > 	2) User has to carry on dev_id inside data-path
> > > > 	3) Extra level of indirection (plus data dependency) - both for data and
> > > instructions.
> > > > 	    Possible slowdown compared to a) (not measured).
> > > >
> > > > Having said all this, if the disagreements cannot be resolved, you can go for a
> > > pmd API specific
> > > > to your PMDs,
> > >
> > > I don't think it is good idea.
> > > PMD specific API is sort of deprecated path, also there is no clean way to use it
> > > within the libraries.
> >
> > I know that this is a deprecated path, we can use it until we are not allowed
> > to break ABI/API
> >
> > >
> > > > because as per my understanding the solution doesn't look scalable to other
> > > PMDs.
> > > > Your approach is aligned only to Intel , will not benefit others like openssl
> > > which is used by all
> > > > vendors.
> > >
> > > I feel quite opposite, from my perspective majority of SW backed PMDs will
> > > benefit from it.
> > > And I don't see anything Intel specific in my proposals above.
> > > About openssl PMD: I am not an expert here, but looking at the code, I think it
> > > will fit really well.
> > > Look yourself at its internal functions:
> > > process_openssl_auth_op/process_openssl_crypto_op,
> > > I think they doing exactly the same - they use sync API underneath, and they are
> > > session based
> > > (AFAIK you don't need any device/queue data, everything that needed for
> > > crypto/auth is stored inside session).

Looked at drivers/crypto/armv8 - same story here, I believe.

> > >
> > By vendor specific, I mean,
> > - no PMD would like to have 2 different variants of session Init APIs for doing the same stuff.
> > - stacks will become vendor specific while using 2 separate session create APIs. No stack would
> > Like to support 2 variants of session create- one for HW PMDs and one for SW PMDs.
> 
> I think what you refer on has nothing to do with 'vendor specific'.
> I would name it 'extra overhead for PMD and stack writers'.
> Yes, for sure there is extra overhead (as always with new API) -
> for both producer (PMD writer) and consumer (stack writer):
> New function(s) to support,  probably more tests to create/run, etc.
> Though this API is optional - if PMD/stack maintainer doesn't see
> value in it, they are free not to support it.
> From other side, re-using  rte_cryptodev_sym_session_init()
> wouldn't help anyway - both data-path and control-path would differ
> from async mode anyway.
> BTW, right now to support different HW flavors
> we do have 4 different control and data-paths for both
> ipsec-secgw and librte_ipsec:
> lkds-none/lksd-proto/inline-crypto/inline-proto.
> And that is considered to be ok.
> Honestly, I don't understand why SW backed implementations
> can't have their own path that would suite them most.
> Konstantin
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-10-22 17:44                                                   ` Ananyev, Konstantin
  2019-10-22 22:21                                                     ` Ananyev, Konstantin
@ 2019-10-23 10:05                                                     ` Akhil Goyal
  2019-10-30 14:23                                                       ` Ananyev, Konstantin
  1 sibling, 1 reply; 87+ messages in thread
From: Akhil Goyal @ 2019-10-23 10:05 UTC (permalink / raw)
  To: Ananyev, Konstantin, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon',
	Zhang, Roy Fan, Doherty, Declan
  Cc: 'Anoob Joseph', Hemant Agrawal


Hi Konstantin,
> 
> Hi Akhil,
> 
> 
> > > > Added my comments inline with your draft.
> > > > [snip]..
> > > >
> > > > >
> > > > > Ok, then my suggestion:
> > > > > Let's at least write down all points about crypto-dev approach where we
> > > > > disagree and then probably try to resolve them one by one....
> > > > > If we fail to make an agreement/progress in next week or so,
> > > > > (and no more reviews from the community)
> > > > > will have bring that subject to TB meeting to decide.
> > > > > Sounds fair to you?
> > > > Agreed
> > > > >
> > > > > List is below.
> > > > > Please add/correct me, if I missed something.
> > > > >
> > > > > Konstantin
> > > >
> > > > Before going into comparison, we should define the requirement as well.
> > >
> > > Good point.
> > >
> > > > What I understood from the patchset,
> > > > "You need a synchronous API to perform crypto operations on raw data
> using
> > > SW PMDs"
> > > > So,
> > > > - no crypto-ops,
> > > > - no separate enq-deq, only single process API for data path
> > > > - Do not need any value addition to the session parameters.
> > > >   (You would need some parameters from the crypto-op which
> > > >    Are constant per session and since you wont use crypto-op,
> > > >    You need some place to store that)
> > >
> > > Yes, this is correct, I think.
> > >
> > > >
> > > > Now as per your mail, the comparison
> > > > 1. extra input parameters to create/init rte_(cpu)_sym_session.
> > > >
> > > > Will leverage existing 6B gap inside rte_crypto_*_xform between 'algo'
> and
> > > 'key' fields.
> > > > New fields will be optional and would be used by PMD only when cpu-
> crypto
> > > session is requested.
> > > > For lksd-crypto session PMD is free to ignore these fields.
> > > > No ABI breakage is required.
> > > >
> > > > [Akhil] Agreed, no issues.
> > > >
> > > > 2. cpu-crypto create/init.
> > > >     a) Our suggestion - introduce new API for that:
> > > >         - rte_crypto_cpu_sym_init() that would init completely opaque
> > > rte_crypto_cpu_sym_session.
> > > >         - struct rte_crypto_cpu_sym_session_ops {(*process)(...); (*clear);
> > > /*whatever else we'll need *'};
> > > >         - rte_crypto_cpu_sym_get_ops(const struct rte_crypto_sym_xform
> > > *xforms)
> > > >           that would return const struct rte_crypto_cpu_sym_session_ops
> *based
> > > on input xforms.
> > > > 	Advantages:
> > > > 	1)  totally opaque data structure (no ABI breakages in future), PMD
> > > writer is totally free
> > > > 	     with it format and contents.
> > > >
> > > > [Akhil] It will have breakage at some point till we don't hit the union size.
> > >
> > > Not sure, what union you are talking about?
> >
> > Union of xforms in rte_security_session_conf
> 
> Hmm, how does it relates here?
> I thought we discussing pure rte_cryptodev_sym_session, no?
> 
> >
> > >
> > > > Rather I don't suspect there will be more parameters added.
> > > > Or do we really care about the ABI breakage when the argument is about
> > > > the correct place to add a piece of code or do we really agree to add code
> > > > anywhere just to avoid that breakage.
> > >
> > > I am talking about maintaining it in future.
> > > if your struct is not seen externally, no chances to introduce ABI breakage.
> > >
> > > >
> > > > 	2) each session entity is self-contained, user doesn't need to bring along
> > > dev_id etc.
> > > > 	    dev_id is needed  only at init stage, after that user will use session ops
> > > to perform
> > > > 	    all operations on that session (process(), clear(), etc.).
> > > >
> > > > [Akhil] There is nothing called as session ops in current DPDK.
> > >
> > > True, but it doesn't mean we can't/shouldn't have it.
> >
> > We can have it if it is not adding complexity for the user. Creating 2 different
> code
> > Paths for user is not desirable for the stack developers.
> >
> > >
> > > > What you are proposing
> > > > is a new concept which doesn't have any extra benefit, rather it is adding
> > > complexity
> > > > to have two different code paths for session create.
> > > >
> > > >
> > > > 	3) User can decide does he wants to store ops[] pointer on a per session
> > > basis,
> > > > 	    or on a per group of same sessions, or...
> > > >
> > > > [Akhil] Will the user really care which process API should be called from the
> > > PMD.
> > > > Rather it should be driver's responsibility to store that in the session private
> > > data
> > > > which would be opaque to the user. As per my suggestion same process
> > > function can
> > > > be added to multiple sessions or a single session can be managed inside the
> > > PMD.
> > >
> > > In that case we either need to have a function per session (stored internally),
> > > or make decision (branches) at run-time.
> > > But as I said in other mail - I am ok to add small shim structure here:
> > > either rte_crypto_cpu_sym_session { void *ses; struct
> > > rte_crypto_cpu_sym_session_ops ops; }
> > > or rte_crypto_cpu_sym_session { void *ses; struct
> > > rte_crypto_cpu_sym_session_ops *ops; }
> > > And merge rte_crypto_cpu_sym_init() and rte_crypto_cpu_sym_get_ops()
> into
> > > one (init).
> >
> > Again that will be a separate API call from the user perspective which is not
> good.
> >
> > >
> > > >
> > > >
> > > > 	4) No mandatory mempools for private sessions. User can allocate
> > > memory for cpu-crypto
> > > > 	    session whenever he likes.
> > > >
> > > > [Akhil] you mean session private data?
> > >
> > > Yes.
> > >
> > > > You would need that memory anyways, user will be
> > > > allocating that already.  You do not need to manage that.
> > >
> > > What I am saying - right now user has no choice but to allocate it via
> mempool.
> > > Which is probably not the best options for all cases.
> > >
> > > >
> > > > 	Disadvantages:
> > > > 	5) Extra changes in control path
> > > > 	6) User has to store session_ops pointer explicitly.
> > > >
> > > > [Akhil] More disadvantages:
> > > > - All supporting PMDs will need to maintain TWO types of session for the
> > > > same crypto processing. Suppose a fix or a new feature(or algo) is added,
> PMD
> > > owner
> > > > will need to add code in both the session create APIs. Hence more
> > > maintenance and
> > > > error prone.
> > >
> > > I think majority of code for both paths will be common, plus even we'll reuse
> > > current sym_session_init() -
> > > changes in PMD session_init() code will be unavoidable.
> > > But yes, it will be new entry in devops, that PMD will have to support.
> > > Ok to add it as 7) to the list.
> > >
> > > > - Stacks which will be using these new APIs also need to maintain two
> > > > code path for the same processing while doing session initialization
> > > > for sync and async
> > >
> > > That's the same as #5 above, I think.
> > >
> > > >
> > > >
> > > >      b) Your suggestion - reuse existing rte_cryptodev_sym_session_init() and
> > > existing rte_cryptodev_sym_session
> > > >       structure.
> > > > 	Advantages:
> > > > 	1) allows to reuse same struct and init/create/clear() functions.
> > > > 	    Probably less changes in control path.
> > > > 	Disadvantages:
> > > > 	2) rte_cryptodev_sym_session. sess_data[] is indexed by driver_id,
> > > which means that
> > > > 	    we can't use the same rte_cryptodev_sym_session to hold private
> > > sessions pointers
> > > > 	    for both sync and async mode  for the same device.
> > > >                    So the only option we have - make PMD devops-
> > > >sym_session_configure()
> > > > 	    always create a session that can work in both cpu and lksd modes.
> > > > 	    For some implementations that would probably mean that under the
> > > hood  PMD would create
> > > > 	    2 different session structs (sync/async) and then use one or another
> > > depending on from what API been called.
> > > > 	    Seems doable, but ...:
> > > >                    - will contradict with statement from 1:
> > > > 	      " New fields will be optional and would be used by PMD only when
> > > cpu-crypto session is requested."
> > > >                       Now it becomes mandatory for all apps to specify cpu-crypto
> > > related parameters too,
> > > > 	       even if they don't plan to use that mode - i.e. behavior change,
> > > existing app change.
> > > >                      - might cause extra space overhead.
> > > >
> > > > [Akhil] It will not contradict with #1, you will only have few checks in the
> > > session init PMD
> > > > Which support this mode, find appropriate values and set the appropriate
> > > process() in it.
> > > > User should be able to call, legacy enq-deq as well as the new process()
> > > without any issue.
> > > > User would be at runtime will be able to change the datapath.
> > > > So this is not a disadvantage, it would be additional flexibility for the user.
> > >
> > > Ok, but that's what I am saying - if PMD would *always* have to create a
> > > session that can handle
> > > both modes (sync/async), then user would *always* have to provide
> parameters
> > > for both modes too.
> > > Otherwise if let say user didn't setup sync specific parameters at all, what
> PMD
> > > should do?
> > >   - return with error?
> > >   - init session that can be used with async path only?
> > > My current assumption is #1.
> > > If #2, then how user will be able to distinguish is that session valid for both
> > > modes, or only for one?
> >
> > I would say a 3rd option, do nothing if sync params are not set.
> > Probably have a debug print in the PMD(which support sync mode) to specify
> that
> > session is not configured properly for sync mode.
> 
> So, just print warning and proceed with init session that can be used with async
> path only?
> Then it sounds the same as #2 above.
> Which actually means that sync mode parameters for sym_session_init()
> becomes optional.
> Then we need an API to provide to the user information what modes
> (sync+async/async only) is supported by that session for given dev_id.
> And user would have to query/retain this information at control-path,
> and store it somewhere in user-space together with session pointer and dev_ids
> to use later at data-path (same as we do now for session type).
> That definitely requires changes in control-path to start using it.
> Plus the fact that this value can differ for different dev_ids for the same session -
> doesn't make things easier here.

API wont be required to specify that. Feature flag will be sufficient, not a big change
From the application perspective.

Here is some pseudo code just to elaborate my understanding. This will need some

From application,
If(dev_info->feature_flags & RTE_CRYPTODEV_FF_SYNC) {
	/* set additional params in crypto xform */
}

Now in the driver,
pmd_sym_session_configure(dev,xform,sess,mempool) {
	...
	If(dev_info->feature_flags & RTE_CRYPTODEV_FF_SYNC
		&& xform->/*sync params are set*/) {
		/*Assign process function pointer in sess->priv_data*/
	} /* It may return error if FF_SYNC is set and params are not correct.
	        It would be upto the driver whether it support both SYNC and ASYNC.*/
}

Now the new sync API

pmd_process(...) {
	If(dev_info->feature_flags & RTE_CRYPTODEV_FF_SYNC
			 && sess_priv->process != NULL)
		sess_priv->process(...);
	else
		ASSERT("sync mode not configured properly or not supported");
}

In the data path, there is no extra processing happening.
Even in case of your suggestion, you should have these type of error checks,
You cannot blindly trust on the application that the pointers are correct.

> 
> > Internally the PMD will not store the process() API in the session priv data
> > And while calling the first packet, devops->process will give an assert that
> session
> > Is not configured for sync mode. The session validation would be done in any
> case
> > your suggestion or mine. So no extra overhead at runtime.
> 
> I believe that after session_init() user should get either an error or
> valid  session handler that he can use at runtime.
> Pushing session validation to runtime doesn't seem like a good idea.
> 
It may get a warning from the PMD, that FF_SYNC is set but params are not
Correct/available. See above.

> >
> > >
> > >
> > > >
> > > >
> > > > 	3) not possible to store device (not driver) specific data within the
> > > session, but I think it is not really needed right now.
> > > > 	    So probably minor compared to 2.b.2.
> > > >
> > > > [Akhil] So lets omit this for current discussion. And I hope we can find some
> > > way to deal with it.
> > >
> > > I don't think there is an easy way to fix that with existing API.
> > >
> > > >
> > > >
> > > > Actually #3 follows from #2, but decided to have them separated.
> > > >
> > > > 3. process() parameters/behavior
> > > >     a) Our suggestion: user stores ptr to session ops (or to (*process) itself)
> and
> > > just does:
> > > >         session_ops->process(sess, ...);
> > > > 	Advantages:
> > > > 	1) fastest possible execution path
> > > > 	2) no need to carry on dev_id for data-path
> > > >
> > > > [Akhil] I don't see any overhead of carrying dev id, at least it would be
> inline
> > > with the
> > > > current DPDK methodology.
> > >
> > > If we'll add process() into rte_cryptodev itself (same as we have
> > > enqueue_burst/dequeue_burst),
> > > then it will be an ABI breakage.
> > > Also there are discussions to get rid of that approach completely:
> > >
> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpd
> k.org%2Farchives%2Fdev%2F2019-
> September%2F144674.html&amp;data=02%7C01%7Cakhil.goyal%40nxp.com%7
> C1859dc1d29cd45a51e9908d7571784bb%7C686ea1d3bc2b4c6fa92cd99c5c301
> 635%7C0%7C0%7C637073630835415165&amp;sdata=Bz9jgisyVzRJNt1BijtvSlurh
> JU1vXBbynNwlMDjaco%3D&amp;reserved=0
> > > So I am not sure this is a recommended way these days.
> >
> > We can either have it in rte_cryptodev or in rte_cryptodev_ops whichever
> > is good for you.
> >
> > Whether it is ABI breakage or not, as per your requirements, this is the correct
> > approach. Do you agree with this or not?
> 
> I think it is possible approach, but not the best one:
> it looks quite flakey to me (see all these uncertainty with sym_session_init
> above),
> plus introduces extra overhead at data-path.

Uncertainties can be handled appropriately using a feature flag
And As per my understanding there is no extra overhead in data path.

> 
> >
> > Now handling the API/ABI breakage is a separate story. In 19.11 release we
> > Are not much concerned about the ABI breakages, this was discussed in
> > community. So adding a new dev_ops wouldn't have been an issue.
> > Now since we are so close to RC1 deadline, we should come up with some
> > other solution for next release. May be having a pmd API in 20.02 and
> > converting it into formal one in 20.11
> >
> >
> > >
> > > > What you are suggesting is a new way to get the things done without much
> > > benefit.
> > >
> > > Would help with ABI stability plus better performance, isn't it enough?
> > >
> > > > Also I don't see any performance difference as crypto workload is heavier
> than
> > > > Code cycles, so that wont matter.
> > >
> > > It depends.
> > > Suppose function call costs you ~30 cycles.
> > > If you have burst of big packets (let say crypto for each will take ~2K cycles)
> that
> > > belong
> > > to the same session, then yes you wouldn't notice these extra 30 cycles at all.
> > > If you have burst of small packets (let say crypto for each will take ~300
> cycles)
> > > each
> > > belongs to different session, then it will cost you ~10% extra.
> >
> > Let us do some profiling on openssl with both the approaches and find out the
> > difference.
> >
> > >
> > > > So IMO, there is no advantage in your suggestion as well.
> > > >
> > > >
> > > > 	Disadvantages:
> > > > 	3) user has to carry on session_ops pointer explicitly
> > > >     b) Your suggestion: add  (*cpu_process) inside rte_cryptodev_ops and
> then:
> > > >         rte_crypto_cpu_sym_process(uint8_t dev_id,
> rte_cryptodev_sym_session
> > > *sess, /*data parameters*/) {...
> > > >                      rte_cryptodevs[dev_id].dev_ops->cpu_process(ses, ...);
> > > >                       /*and then inside PMD specifc process: */
> > > >                      pmd_private_session = sess-
> >sess_data[this_pmd_driver_id].data;
> > > >                      /* and then most likely either */
> > > >                      pmd_private_session->process(pmd_private_session, ...);
> > > >                      /* or jump based on session/input data */
> > > > 	Advantages:
> > > > 	1) don't see any...
> > > > 	Disadvantages:
> > > > 	2) User has to carry on dev_id inside data-path
> > > > 	3) Extra level of indirection (plus data dependency) - both for data and
> > > instructions.
> > > > 	    Possible slowdown compared to a) (not measured).
> > > >
> > > > Having said all this, if the disagreements cannot be resolved, you can go
> for a
> > > pmd API specific
> > > > to your PMDs,
> > >
> > > I don't think it is good idea.
> > > PMD specific API is sort of deprecated path, also there is no clean way to use
> it
> > > within the libraries.
> >
> > I know that this is a deprecated path, we can use it until we are not allowed
> > to break ABI/API
> >
> > >
> > > > because as per my understanding the solution doesn't look scalable to
> other
> > > PMDs.
> > > > Your approach is aligned only to Intel , will not benefit others like openssl
> > > which is used by all
> > > > vendors.
> > >
> > > I feel quite opposite, from my perspective majority of SW backed PMDs will
> > > benefit from it.
> > > And I don't see anything Intel specific in my proposals above.
> > > About openssl PMD: I am not an expert here, but looking at the code, I think
> it
> > > will fit really well.
> > > Look yourself at its internal functions:
> > > process_openssl_auth_op/process_openssl_crypto_op,
> > > I think they doing exactly the same - they use sync API underneath, and they
> are
> > > session based
> > > (AFAIK you don't need any device/queue data, everything that needed for
> > > crypto/auth is stored inside session).
> > >
> > By vendor specific, I mean,
> > - no PMD would like to have 2 different variants of session Init APIs for doing
> the same stuff.
> > - stacks will become vendor specific while using 2 separate session create APIs.
> No stack would
> > Like to support 2 variants of session create- one for HW PMDs and one for SW
> PMDs.
> 
> I think what you refer on has nothing to do with 'vendor specific'.
> I would name it 'extra overhead for PMD and stack writers'.
> Yes, for sure there is extra overhead (as always with new API) -
> for both producer (PMD writer) and consumer (stack writer):
> New function(s) to support,  probably more tests to create/run, etc.
> Though this API is optional - if PMD/stack maintainer doesn't see
> value in it, they are free not to support it.
> From other side, re-using  rte_cryptodev_sym_session_init()
> wouldn't help anyway - both data-path and control-path would differ
> from async mode anyway.
> BTW, right now to support different HW flavors
> we do have 4 different control and data-paths for both
> ipsec-secgw and librte_ipsec:
> lkds-none/lksd-proto/inline-crypto/inline-proto.
> And that is considered to be ok.

No that is not ok. We cannot add new paths for every other case.
Those 4 are controlled using 2 set of APIs. We should try our best to
Have minimum overhead to the application writer. This pain was also discussed
In the one of DPDK conference as well.
DPDK is not a standalone entity, there are stacks running over it always.
We should not add API for every other use case when we have an alternative
Approach with the existing API set.

Now introducing another one would add to that pain and a lot of work for
Both producer and consumer.
It would be interesting to see how much performance difference will be there in the
Two approaches. As per my understanding it wont be much as compared to the
Extra work that you will be inducing.

-Akhil

> Honestly, I don't understand why SW backed implementations
> can't have their own path that would suite them most.
> Konstantin
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-10-23 10:05                                                     ` Akhil Goyal
@ 2019-10-30 14:23                                                       ` Ananyev, Konstantin
  2019-11-01 13:53                                                         ` Akhil Goyal
  0 siblings, 1 reply; 87+ messages in thread
From: Ananyev, Konstantin @ 2019-10-30 14:23 UTC (permalink / raw)
  To: Akhil Goyal, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon',
	Zhang, Roy Fan, Doherty, Declan
  Cc: 'Anoob Joseph', Hemant Agrawal


Hi Akhil,

> > > > > Added my comments inline with your draft.
> > > > > [snip]..
> > > > >
> > > > > >
> > > > > > Ok, then my suggestion:
> > > > > > Let's at least write down all points about crypto-dev approach where we
> > > > > > disagree and then probably try to resolve them one by one....
> > > > > > If we fail to make an agreement/progress in next week or so,
> > > > > > (and no more reviews from the community)
> > > > > > will have bring that subject to TB meeting to decide.
> > > > > > Sounds fair to you?
> > > > > Agreed
> > > > > >
> > > > > > List is below.
> > > > > > Please add/correct me, if I missed something.
> > > > > >
> > > > > > Konstantin
> > > > >
> > > > > Before going into comparison, we should define the requirement as well.
> > > >
> > > > Good point.
> > > >
> > > > > What I understood from the patchset,
> > > > > "You need a synchronous API to perform crypto operations on raw data
> > using
> > > > SW PMDs"
> > > > > So,
> > > > > - no crypto-ops,
> > > > > - no separate enq-deq, only single process API for data path
> > > > > - Do not need any value addition to the session parameters.
> > > > >   (You would need some parameters from the crypto-op which
> > > > >    Are constant per session and since you wont use crypto-op,
> > > > >    You need some place to store that)
> > > >
> > > > Yes, this is correct, I think.
> > > >
> > > > >
> > > > > Now as per your mail, the comparison
> > > > > 1. extra input parameters to create/init rte_(cpu)_sym_session.
> > > > >
> > > > > Will leverage existing 6B gap inside rte_crypto_*_xform between 'algo'
> > and
> > > > 'key' fields.
> > > > > New fields will be optional and would be used by PMD only when cpu-
> > crypto
> > > > session is requested.
> > > > > For lksd-crypto session PMD is free to ignore these fields.
> > > > > No ABI breakage is required.
> > > > >
> > > > > [Akhil] Agreed, no issues.
> > > > >
> > > > > 2. cpu-crypto create/init.
> > > > >     a) Our suggestion - introduce new API for that:
> > > > >         - rte_crypto_cpu_sym_init() that would init completely opaque
> > > > rte_crypto_cpu_sym_session.
> > > > >         - struct rte_crypto_cpu_sym_session_ops {(*process)(...); (*clear);
> > > > /*whatever else we'll need *'};
> > > > >         - rte_crypto_cpu_sym_get_ops(const struct rte_crypto_sym_xform
> > > > *xforms)
> > > > >           that would return const struct rte_crypto_cpu_sym_session_ops
> > *based
> > > > on input xforms.
> > > > > 	Advantages:
> > > > > 	1)  totally opaque data structure (no ABI breakages in future), PMD
> > > > writer is totally free
> > > > > 	     with it format and contents.
> > > > >
> > > > > [Akhil] It will have breakage at some point till we don't hit the union size.
> > > >
> > > > Not sure, what union you are talking about?
> > >
> > > Union of xforms in rte_security_session_conf
> >
> > Hmm, how does it relates here?
> > I thought we discussing pure rte_cryptodev_sym_session, no?
> >
> > >
> > > >
> > > > > Rather I don't suspect there will be more parameters added.
> > > > > Or do we really care about the ABI breakage when the argument is about
> > > > > the correct place to add a piece of code or do we really agree to add code
> > > > > anywhere just to avoid that breakage.
> > > >
> > > > I am talking about maintaining it in future.
> > > > if your struct is not seen externally, no chances to introduce ABI breakage.
> > > >
> > > > >
> > > > > 	2) each session entity is self-contained, user doesn't need to bring along
> > > > dev_id etc.
> > > > > 	    dev_id is needed  only at init stage, after that user will use session ops
> > > > to perform
> > > > > 	    all operations on that session (process(), clear(), etc.).
> > > > >
> > > > > [Akhil] There is nothing called as session ops in current DPDK.
> > > >
> > > > True, but it doesn't mean we can't/shouldn't have it.
> > >
> > > We can have it if it is not adding complexity for the user. Creating 2 different
> > code
> > > Paths for user is not desirable for the stack developers.
> > >
> > > >
> > > > > What you are proposing
> > > > > is a new concept which doesn't have any extra benefit, rather it is adding
> > > > complexity
> > > > > to have two different code paths for session create.
> > > > >
> > > > >
> > > > > 	3) User can decide does he wants to store ops[] pointer on a per session
> > > > basis,
> > > > > 	    or on a per group of same sessions, or...
> > > > >
> > > > > [Akhil] Will the user really care which process API should be called from the
> > > > PMD.
> > > > > Rather it should be driver's responsibility to store that in the session private
> > > > data
> > > > > which would be opaque to the user. As per my suggestion same process
> > > > function can
> > > > > be added to multiple sessions or a single session can be managed inside the
> > > > PMD.
> > > >
> > > > In that case we either need to have a function per session (stored internally),
> > > > or make decision (branches) at run-time.
> > > > But as I said in other mail - I am ok to add small shim structure here:
> > > > either rte_crypto_cpu_sym_session { void *ses; struct
> > > > rte_crypto_cpu_sym_session_ops ops; }
> > > > or rte_crypto_cpu_sym_session { void *ses; struct
> > > > rte_crypto_cpu_sym_session_ops *ops; }
> > > > And merge rte_crypto_cpu_sym_init() and rte_crypto_cpu_sym_get_ops()
> > into
> > > > one (init).
> > >
> > > Again that will be a separate API call from the user perspective which is not
> > good.
> > >
> > > >
> > > > >
> > > > >
> > > > > 	4) No mandatory mempools for private sessions. User can allocate
> > > > memory for cpu-crypto
> > > > > 	    session whenever he likes.
> > > > >
> > > > > [Akhil] you mean session private data?
> > > >
> > > > Yes.
> > > >
> > > > > You would need that memory anyways, user will be
> > > > > allocating that already.  You do not need to manage that.
> > > >
> > > > What I am saying - right now user has no choice but to allocate it via
> > mempool.
> > > > Which is probably not the best options for all cases.
> > > >
> > > > >
> > > > > 	Disadvantages:
> > > > > 	5) Extra changes in control path
> > > > > 	6) User has to store session_ops pointer explicitly.
> > > > >
> > > > > [Akhil] More disadvantages:
> > > > > - All supporting PMDs will need to maintain TWO types of session for the
> > > > > same crypto processing. Suppose a fix or a new feature(or algo) is added,
> > PMD
> > > > owner
> > > > > will need to add code in both the session create APIs. Hence more
> > > > maintenance and
> > > > > error prone.
> > > >
> > > > I think majority of code for both paths will be common, plus even we'll reuse
> > > > current sym_session_init() -
> > > > changes in PMD session_init() code will be unavoidable.
> > > > But yes, it will be new entry in devops, that PMD will have to support.
> > > > Ok to add it as 7) to the list.
> > > >
> > > > > - Stacks which will be using these new APIs also need to maintain two
> > > > > code path for the same processing while doing session initialization
> > > > > for sync and async
> > > >
> > > > That's the same as #5 above, I think.
> > > >
> > > > >
> > > > >
> > > > >      b) Your suggestion - reuse existing rte_cryptodev_sym_session_init() and
> > > > existing rte_cryptodev_sym_session
> > > > >       structure.
> > > > > 	Advantages:
> > > > > 	1) allows to reuse same struct and init/create/clear() functions.
> > > > > 	    Probably less changes in control path.
> > > > > 	Disadvantages:
> > > > > 	2) rte_cryptodev_sym_session. sess_data[] is indexed by driver_id,
> > > > which means that
> > > > > 	    we can't use the same rte_cryptodev_sym_session to hold private
> > > > sessions pointers
> > > > > 	    for both sync and async mode  for the same device.
> > > > >                    So the only option we have - make PMD devops-
> > > > >sym_session_configure()
> > > > > 	    always create a session that can work in both cpu and lksd modes.
> > > > > 	    For some implementations that would probably mean that under the
> > > > hood  PMD would create
> > > > > 	    2 different session structs (sync/async) and then use one or another
> > > > depending on from what API been called.
> > > > > 	    Seems doable, but ...:
> > > > >                    - will contradict with statement from 1:
> > > > > 	      " New fields will be optional and would be used by PMD only when
> > > > cpu-crypto session is requested."
> > > > >                       Now it becomes mandatory for all apps to specify cpu-crypto
> > > > related parameters too,
> > > > > 	       even if they don't plan to use that mode - i.e. behavior change,
> > > > existing app change.
> > > > >                      - might cause extra space overhead.
> > > > >
> > > > > [Akhil] It will not contradict with #1, you will only have few checks in the
> > > > session init PMD
> > > > > Which support this mode, find appropriate values and set the appropriate
> > > > process() in it.
> > > > > User should be able to call, legacy enq-deq as well as the new process()
> > > > without any issue.
> > > > > User would be at runtime will be able to change the datapath.
> > > > > So this is not a disadvantage, it would be additional flexibility for the user.
> > > >
> > > > Ok, but that's what I am saying - if PMD would *always* have to create a
> > > > session that can handle
> > > > both modes (sync/async), then user would *always* have to provide
> > parameters
> > > > for both modes too.
> > > > Otherwise if let say user didn't setup sync specific parameters at all, what
> > PMD
> > > > should do?
> > > >   - return with error?
> > > >   - init session that can be used with async path only?
> > > > My current assumption is #1.
> > > > If #2, then how user will be able to distinguish is that session valid for both
> > > > modes, or only for one?
> > >
> > > I would say a 3rd option, do nothing if sync params are not set.
> > > Probably have a debug print in the PMD(which support sync mode) to specify
> > that
> > > session is not configured properly for sync mode.
> >
> > So, just print warning and proceed with init session that can be used with async
> > path only?
> > Then it sounds the same as #2 above.
> > Which actually means that sync mode parameters for sym_session_init()
> > becomes optional.
> > Then we need an API to provide to the user information what modes
> > (sync+async/async only) is supported by that session for given dev_id.
> > And user would have to query/retain this information at control-path,
> > and store it somewhere in user-space together with session pointer and dev_ids
> > to use later at data-path (same as we do now for session type).
> > That definitely requires changes in control-path to start using it.
> > Plus the fact that this value can differ for different dev_ids for the same session -
> > doesn't make things easier here.
> 
> API wont be required to specify that. Feature flag will be sufficient, not a big change
> From the application perspective.
> 
> Here is some pseudo code just to elaborate my understanding. This will need some
> 
> From application,
> If(dev_info->feature_flags & RTE_CRYPTODEV_FF_SYNC) {
> 	/* set additional params in crypto xform */
> }
> 
> Now in the driver,
> pmd_sym_session_configure(dev,xform,sess,mempool) {
> 	...
> 	If(dev_info->feature_flags & RTE_CRYPTODEV_FF_SYNC
> 		&& xform->/*sync params are set*/) {
> 		/*Assign process function pointer in sess->priv_data*/
> 	} /* It may return error if FF_SYNC is set and params are not correct.

Then all apps will always *have to* setup  sync parameters in xform.
What you suggest is *mandatory* sync mode: user always has to setup sync
mode params if PMD does support it (no matter does he plan to use sync mode or not).   
Which means behavior change in existing apps.

> 	        It would be upto the driver whether it support both SYNC and ASYNC.*/
> }
> 
> Now the new sync API
> 
> pmd_process(...) {
> 	If(dev_info->feature_flags & RTE_CRYPTODEV_FF_SYNC
> 			 && sess_priv->process != NULL)
> 		sess_priv->process(...);
> 	else
> 		ASSERT("sync mode not configured properly or not supported");
> }
> 
> In the data path, there is no extra processing happening.
> Even in case of your suggestion, you should have these type of error checks,
> You cannot blindly trust on the application that the pointers are correct.
> 
> >
> > > Internally the PMD will not store the process() API in the session priv data
> > > And while calling the first packet, devops->process will give an assert that
> > session
> > > Is not configured for sync mode. The session validation would be done in any
> > case
> > > your suggestion or mine. So no extra overhead at runtime.
> >
> > I believe that after session_init() user should get either an error or
> > valid  session handler that he can use at runtime.
> > Pushing session validation to runtime doesn't seem like a good idea.
> >
> It may get a warning from the PMD, that FF_SYNC is set but params are not
> Correct/available. See above.

I think warning is not enough.
There should be a clear way (API) for developer to realize is the created session
can be used by sync API data-path or not. 

> 
> > >
> > > >
> > > >
> > > > >
> > > > >
> > > > > 	3) not possible to store device (not driver) specific data within the
> > > > session, but I think it is not really needed right now.
> > > > > 	    So probably minor compared to 2.b.2.
> > > > >
> > > > > [Akhil] So lets omit this for current discussion. And I hope we can find some
> > > > way to deal with it.
> > > >
> > > > I don't think there is an easy way to fix that with existing API.
> > > >
> > > > >
> > > > >
> > > > > Actually #3 follows from #2, but decided to have them separated.
> > > > >
> > > > > 3. process() parameters/behavior
> > > > >     a) Our suggestion: user stores ptr to session ops (or to (*process) itself)
> > and
> > > > just does:
> > > > >         session_ops->process(sess, ...);
> > > > > 	Advantages:
> > > > > 	1) fastest possible execution path
> > > > > 	2) no need to carry on dev_id for data-path
> > > > >
> > > > > [Akhil] I don't see any overhead of carrying dev id, at least it would be
> > inline
> > > > with the
> > > > > current DPDK methodology.
> > > >
> > > > If we'll add process() into rte_cryptodev itself (same as we have
> > > > enqueue_burst/dequeue_burst),
> > > > then it will be an ABI breakage.
> > > > Also there are discussions to get rid of that approach completely:
> > > >
> > https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpd
> > k.org%2Farchives%2Fdev%2F2019-
> > September%2F144674.html&amp;data=02%7C01%7Cakhil.goyal%40nxp.com%7
> > C1859dc1d29cd45a51e9908d7571784bb%7C686ea1d3bc2b4c6fa92cd99c5c301
> > 635%7C0%7C0%7C637073630835415165&amp;sdata=Bz9jgisyVzRJNt1BijtvSlurh
> > JU1vXBbynNwlMDjaco%3D&amp;reserved=0
> > > > So I am not sure this is a recommended way these days.
> > >
> > > We can either have it in rte_cryptodev or in rte_cryptodev_ops whichever
> > > is good for you.
> > >
> > > Whether it is ABI breakage or not, as per your requirements, this is the correct
> > > approach. Do you agree with this or not?
> >
> > I think it is possible approach, but not the best one:
> > it looks quite flakey to me (see all these uncertainty with sym_session_init
> > above),
> > plus introduces extra overhead at data-path.
> 
> Uncertainties can be handled appropriately using a feature flag
> 
> >
> > >
> > > Now handling the API/ABI breakage is a separate story. In 19.11 release we
> > > Are not much concerned about the ABI breakages, this was discussed in
> > > community. So adding a new dev_ops wouldn't have been an issue.
> > > Now since we are so close to RC1 deadline, we should come up with some
> > > other solution for next release. May be having a pmd API in 20.02 and
> > > converting it into formal one in 20.11
> > >
> > >
> > > >
> > > > > What you are suggesting is a new way to get the things done without much
> > > > benefit.
> > > >
> > > > Would help with ABI stability plus better performance, isn't it enough?
> > > >
> > > > > Also I don't see any performance difference as crypto workload is heavier
> > than
> > > > > Code cycles, so that wont matter.
> > > >
> > > > It depends.
> > > > Suppose function call costs you ~30 cycles.
> > > > If you have burst of big packets (let say crypto for each will take ~2K cycles)
> > that
> > > > belong
> > > > to the same session, then yes you wouldn't notice these extra 30 cycles at all.
> > > > If you have burst of small packets (let say crypto for each will take ~300
> > cycles)
> > > > each
> > > > belongs to different session, then it will cost you ~10% extra.
> > >
> > > Let us do some profiling on openssl with both the approaches and find out the
> > > difference.
> > >
> > > >
> > > > > So IMO, there is no advantage in your suggestion as well.
> > > > >
> > > > >
> > > > > 	Disadvantages:
> > > > > 	3) user has to carry on session_ops pointer explicitly
> > > > >     b) Your suggestion: add  (*cpu_process) inside rte_cryptodev_ops and
> > then:
> > > > >         rte_crypto_cpu_sym_process(uint8_t dev_id,
> > rte_cryptodev_sym_session
> > > > *sess, /*data parameters*/) {...
> > > > >                      rte_cryptodevs[dev_id].dev_ops->cpu_process(ses, ...);
> > > > >                       /*and then inside PMD specifc process: */
> > > > >                      pmd_private_session = sess-
> > >sess_data[this_pmd_driver_id].data;
> > > > >                      /* and then most likely either */
> > > > >                      pmd_private_session->process(pmd_private_session, ...);
> > > > >                      /* or jump based on session/input data */
> > > > > 	Advantages:
> > > > > 	1) don't see any...
> > > > > 	Disadvantages:
> > > > > 	2) User has to carry on dev_id inside data-path
> > > > > 	3) Extra level of indirection (plus data dependency) - both for data and
> > > > instructions.
> > > > > 	    Possible slowdown compared to a) (not measured).
> > > > >
> > > > > Having said all this, if the disagreements cannot be resolved, you can go
> > for a
> > > > pmd API specific
> > > > > to your PMDs,
> > > >
> > > > I don't think it is good idea.
> > > > PMD specific API is sort of deprecated path, also there is no clean way to use
> > it
> > > > within the libraries.
> > >
> > > I know that this is a deprecated path, we can use it until we are not allowed
> > > to break ABI/API
> > >
> > > >
> > > > > because as per my understanding the solution doesn't look scalable to
> > other
> > > > PMDs.
> > > > > Your approach is aligned only to Intel , will not benefit others like openssl
> > > > which is used by all
> > > > > vendors.
> > > >
> > > > I feel quite opposite, from my perspective majority of SW backed PMDs will
> > > > benefit from it.
> > > > And I don't see anything Intel specific in my proposals above.
> > > > About openssl PMD: I am not an expert here, but looking at the code, I think
> > it
> > > > will fit really well.
> > > > Look yourself at its internal functions:
> > > > process_openssl_auth_op/process_openssl_crypto_op,
> > > > I think they doing exactly the same - they use sync API underneath, and they
> > are
> > > > session based
> > > > (AFAIK you don't need any device/queue data, everything that needed for
> > > > crypto/auth is stored inside session).
> > > >
> > > By vendor specific, I mean,
> > > - no PMD would like to have 2 different variants of session Init APIs for doing
> > the same stuff.
> > > - stacks will become vendor specific while using 2 separate session create APIs.
> > No stack would
> > > Like to support 2 variants of session create- one for HW PMDs and one for SW
> > PMDs.
> >
> > I think what you refer on has nothing to do with 'vendor specific'.
> > I would name it 'extra overhead for PMD and stack writers'.
> > Yes, for sure there is extra overhead (as always with new API) -
> > for both producer (PMD writer) and consumer (stack writer):
> > New function(s) to support,  probably more tests to create/run, etc.
> > Though this API is optional - if PMD/stack maintainer doesn't see
> > value in it, they are free not to support it.
> > From other side, re-using  rte_cryptodev_sym_session_init()
> > wouldn't help anyway - both data-path and control-path would differ
> > from async mode anyway.
> > BTW, right now to support different HW flavors
> > we do have 4 different control and data-paths for both
> > ipsec-secgw and librte_ipsec:
> > lkds-none/lksd-proto/inline-crypto/inline-proto.
> > And that is considered to be ok.
> 
> No that is not ok. We cannot add new paths for every other case.

What I am saying: if let-say lookaside-proto/inline-crypto/inline-proto
deserves its own case in rte_security/rte_crypto API,
I don't understand why cpu-crypto doesn't.

> Those 4 are controlled using 2 set of APIs.

Yes there are 2 API sets (rte_cryptodev/rte_security),
but in fact if you look at ipsec-secgw and librte_ipsec we have 4 different code paths.
For both create_session() and ipsec_enqueue() we have a big switch() with 4 different cases.
Nearly the same for librte_ipsec - we have different prepare/process
function pointers for each security type.  

> We should try our best to
> Have minimum overhead to the application writer. This pain was also discussed
> In the one of DPDK conference as well.
> DPDK is not a standalone entity, there are stacks running over it always.
> We should not add API for every other use case when we have an alternative
> Approach with the existing API set.
> 
> Now introducing another one would add to that pain and a lot of work for
> Both producer and consumer.

If I would see a clean approach to implement desired functionality
without introducing new API - I would definitely support it.
The problem is  that from my perspective,
what you suggesting with existing API will bring more drawbacks then positives.
BTW, our first approach (via rte_security) does reuse existing API,
so if adding new API is the main concern - let's reconsider that path.    

> It would be interesting to see how much performance difference will be there in the
> Two approaches. As per my understanding it wont be much as compared to the
> Extra work that you will be inducing.
> 
> -Akhil
> 
> > Honestly, I don't understand why SW backed implementations
> > can't have their own path that would suite them most.
> > Konstantin
> >
> >
> >
> >
> >


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API
  2019-10-30 14:23                                                       ` Ananyev, Konstantin
@ 2019-11-01 13:53                                                         ` Akhil Goyal
  0 siblings, 0 replies; 87+ messages in thread
From: Akhil Goyal @ 2019-11-01 13:53 UTC (permalink / raw)
  To: Ananyev, Konstantin, 'dev@dpdk.org',
	De Lara Guarch, Pablo, 'Thomas Monjalon',
	Zhang, Roy Fan, Doherty, Declan
  Cc: 'Anoob Joseph', Hemant Agrawal

Hi Konstantin,

> 
> 
> Hi Akhil,
> 
> > > > > > Added my comments inline with your draft.
> > > > > > [snip]..
> > > > > >
> > > > > > >
> > > > > > > Ok, then my suggestion:
> > > > > > > Let's at least write down all points about crypto-dev approach where
> we
> > > > > > > disagree and then probably try to resolve them one by one....
> > > > > > > If we fail to make an agreement/progress in next week or so,
> > > > > > > (and no more reviews from the community)
> > > > > > > will have bring that subject to TB meeting to decide.
> > > > > > > Sounds fair to you?
> > > > > > Agreed
> > > > > > >
> > > > > > > List is below.
> > > > > > > Please add/correct me, if I missed something.
> > > > > > >
> > > > > > > Konstantin
> > > > > >
> > > > > > Before going into comparison, we should define the requirement as
> well.
> > > > >
> > > > > Good point.
> > > > >
> > > > > > What I understood from the patchset,
> > > > > > "You need a synchronous API to perform crypto operations on raw data
> > > using
> > > > > SW PMDs"
> > > > > > So,
> > > > > > - no crypto-ops,
> > > > > > - no separate enq-deq, only single process API for data path
> > > > > > - Do not need any value addition to the session parameters.
> > > > > >   (You would need some parameters from the crypto-op which
> > > > > >    Are constant per session and since you wont use crypto-op,
> > > > > >    You need some place to store that)
> > > > >
> > > > > Yes, this is correct, I think.
> > > > >
> > > > > >
> > > > > > Now as per your mail, the comparison
> > > > > > 1. extra input parameters to create/init rte_(cpu)_sym_session.
> > > > > >
> > > > > > Will leverage existing 6B gap inside rte_crypto_*_xform between 'algo'
> > > and
> > > > > 'key' fields.
> > > > > > New fields will be optional and would be used by PMD only when cpu-
> > > crypto
> > > > > session is requested.
> > > > > > For lksd-crypto session PMD is free to ignore these fields.
> > > > > > No ABI breakage is required.
> > > > > >
> > > > > > [Akhil] Agreed, no issues.
> > > > > >
> > > > > > 2. cpu-crypto create/init.
> > > > > >     a) Our suggestion - introduce new API for that:
> > > > > >         - rte_crypto_cpu_sym_init() that would init completely opaque
> > > > > rte_crypto_cpu_sym_session.
> > > > > >         - struct rte_crypto_cpu_sym_session_ops {(*process)(...); (*clear);
> > > > > /*whatever else we'll need *'};
> > > > > >         - rte_crypto_cpu_sym_get_ops(const struct rte_crypto_sym_xform
> > > > > *xforms)
> > > > > >           that would return const struct rte_crypto_cpu_sym_session_ops
> > > *based
> > > > > on input xforms.
> > > > > > 	Advantages:
> > > > > > 	1)  totally opaque data structure (no ABI breakages in future),
> PMD
> > > > > writer is totally free
> > > > > > 	     with it format and contents.
> > > > > >
> > > > > > [Akhil] It will have breakage at some point till we don't hit the union
> size.
> > > > >
> > > > > Not sure, what union you are talking about?
> > > >
> > > > Union of xforms in rte_security_session_conf
> > >
> > > Hmm, how does it relates here?
> > > I thought we discussing pure rte_cryptodev_sym_session, no?
> > >
> > > >
> > > > >
> > > > > > Rather I don't suspect there will be more parameters added.
> > > > > > Or do we really care about the ABI breakage when the argument is
> about
> > > > > > the correct place to add a piece of code or do we really agree to add
> code
> > > > > > anywhere just to avoid that breakage.
> > > > >
> > > > > I am talking about maintaining it in future.
> > > > > if your struct is not seen externally, no chances to introduce ABI
> breakage.
> > > > >
> > > > > >
> > > > > > 	2) each session entity is self-contained, user doesn't need to
> bring along
> > > > > dev_id etc.
> > > > > > 	    dev_id is needed  only at init stage, after that user will use
> session ops
> > > > > to perform
> > > > > > 	    all operations on that session (process(), clear(), etc.).
> > > > > >
> > > > > > [Akhil] There is nothing called as session ops in current DPDK.
> > > > >
> > > > > True, but it doesn't mean we can't/shouldn't have it.
> > > >
> > > > We can have it if it is not adding complexity for the user. Creating 2
> different
> > > code
> > > > Paths for user is not desirable for the stack developers.
> > > >
> > > > >
> > > > > > What you are proposing
> > > > > > is a new concept which doesn't have any extra benefit, rather it is
> adding
> > > > > complexity
> > > > > > to have two different code paths for session create.
> > > > > >
> > > > > >
> > > > > > 	3) User can decide does he wants to store ops[] pointer on a per
> session
> > > > > basis,
> > > > > > 	    or on a per group of same sessions, or...
> > > > > >
> > > > > > [Akhil] Will the user really care which process API should be called from
> the
> > > > > PMD.
> > > > > > Rather it should be driver's responsibility to store that in the session
> private
> > > > > data
> > > > > > which would be opaque to the user. As per my suggestion same process
> > > > > function can
> > > > > > be added to multiple sessions or a single session can be managed inside
> the
> > > > > PMD.
> > > > >
> > > > > In that case we either need to have a function per session (stored
> internally),
> > > > > or make decision (branches) at run-time.
> > > > > But as I said in other mail - I am ok to add small shim structure here:
> > > > > either rte_crypto_cpu_sym_session { void *ses; struct
> > > > > rte_crypto_cpu_sym_session_ops ops; }
> > > > > or rte_crypto_cpu_sym_session { void *ses; struct
> > > > > rte_crypto_cpu_sym_session_ops *ops; }
> > > > > And merge rte_crypto_cpu_sym_init() and rte_crypto_cpu_sym_get_ops()
> > > into
> > > > > one (init).
> > > >
> > > > Again that will be a separate API call from the user perspective which is not
> > > good.
> > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > 	4) No mandatory mempools for private sessions. User can
> allocate
> > > > > memory for cpu-crypto
> > > > > > 	    session whenever he likes.
> > > > > >
> > > > > > [Akhil] you mean session private data?
> > > > >
> > > > > Yes.
> > > > >
> > > > > > You would need that memory anyways, user will be
> > > > > > allocating that already.  You do not need to manage that.
> > > > >
> > > > > What I am saying - right now user has no choice but to allocate it via
> > > mempool.
> > > > > Which is probably not the best options for all cases.
> > > > >
> > > > > >
> > > > > > 	Disadvantages:
> > > > > > 	5) Extra changes in control path
> > > > > > 	6) User has to store session_ops pointer explicitly.
> > > > > >
> > > > > > [Akhil] More disadvantages:
> > > > > > - All supporting PMDs will need to maintain TWO types of session for
> the
> > > > > > same crypto processing. Suppose a fix or a new feature(or algo) is
> added,
> > > PMD
> > > > > owner
> > > > > > will need to add code in both the session create APIs. Hence more
> > > > > maintenance and
> > > > > > error prone.
> > > > >
> > > > > I think majority of code for both paths will be common, plus even we'll
> reuse
> > > > > current sym_session_init() -
> > > > > changes in PMD session_init() code will be unavoidable.
> > > > > But yes, it will be new entry in devops, that PMD will have to support.
> > > > > Ok to add it as 7) to the list.
> > > > >
> > > > > > - Stacks which will be using these new APIs also need to maintain two
> > > > > > code path for the same processing while doing session initialization
> > > > > > for sync and async
> > > > >
> > > > > That's the same as #5 above, I think.
> > > > >
> > > > > >
> > > > > >
> > > > > >      b) Your suggestion - reuse existing rte_cryptodev_sym_session_init()
> and
> > > > > existing rte_cryptodev_sym_session
> > > > > >       structure.
> > > > > > 	Advantages:
> > > > > > 	1) allows to reuse same struct and init/create/clear() functions.
> > > > > > 	    Probably less changes in control path.
> > > > > > 	Disadvantages:
> > > > > > 	2) rte_cryptodev_sym_session. sess_data[] is indexed by
> driver_id,
> > > > > which means that
> > > > > > 	    we can't use the same rte_cryptodev_sym_session to hold
> private
> > > > > sessions pointers
> > > > > > 	    for both sync and async mode  for the same device.
> > > > > >                    So the only option we have - make PMD devops-
> > > > > >sym_session_configure()
> > > > > > 	    always create a session that can work in both cpu and lksd
> modes.
> > > > > > 	    For some implementations that would probably mean that
> under the
> > > > > hood  PMD would create
> > > > > > 	    2 different session structs (sync/async) and then use one or
> another
> > > > > depending on from what API been called.
> > > > > > 	    Seems doable, but ...:
> > > > > >                    - will contradict with statement from 1:
> > > > > > 	      " New fields will be optional and would be used by PMD only
> when
> > > > > cpu-crypto session is requested."
> > > > > >                       Now it becomes mandatory for all apps to specify cpu-
> crypto
> > > > > related parameters too,
> > > > > > 	       even if they don't plan to use that mode - i.e. behavior
> change,
> > > > > existing app change.
> > > > > >                      - might cause extra space overhead.
> > > > > >
> > > > > > [Akhil] It will not contradict with #1, you will only have few checks in
> the
> > > > > session init PMD
> > > > > > Which support this mode, find appropriate values and set the
> appropriate
> > > > > process() in it.
> > > > > > User should be able to call, legacy enq-deq as well as the new process()
> > > > > without any issue.
> > > > > > User would be at runtime will be able to change the datapath.
> > > > > > So this is not a disadvantage, it would be additional flexibility for the
> user.
> > > > >
> > > > > Ok, but that's what I am saying - if PMD would *always* have to create a
> > > > > session that can handle
> > > > > both modes (sync/async), then user would *always* have to provide
> > > parameters
> > > > > for both modes too.
> > > > > Otherwise if let say user didn't setup sync specific parameters at all, what
> > > PMD
> > > > > should do?
> > > > >   - return with error?
> > > > >   - init session that can be used with async path only?
> > > > > My current assumption is #1.
> > > > > If #2, then how user will be able to distinguish is that session valid for
> both
> > > > > modes, or only for one?
> > > >
> > > > I would say a 3rd option, do nothing if sync params are not set.
> > > > Probably have a debug print in the PMD(which support sync mode) to
> specify
> > > that
> > > > session is not configured properly for sync mode.
> > >
> > > So, just print warning and proceed with init session that can be used with
> async
> > > path only?
> > > Then it sounds the same as #2 above.
> > > Which actually means that sync mode parameters for sym_session_init()
> > > becomes optional.
> > > Then we need an API to provide to the user information what modes
> > > (sync+async/async only) is supported by that session for given dev_id.
> > > And user would have to query/retain this information at control-path,
> > > and store it somewhere in user-space together with session pointer and
> dev_ids
> > > to use later at data-path (same as we do now for session type).
> > > That definitely requires changes in control-path to start using it.
> > > Plus the fact that this value can differ for different dev_ids for the same
> session -
> > > doesn't make things easier here.
> >
> > API wont be required to specify that. Feature flag will be sufficient, not a big
> change
> > From the application perspective.
> >
> > Here is some pseudo code just to elaborate my understanding. This will need
> some
> >
> > From application,
> > If(dev_info->feature_flags & RTE_CRYPTODEV_FF_SYNC) {
> > 	/* set additional params in crypto xform */
> > }
> >
> > Now in the driver,
> > pmd_sym_session_configure(dev,xform,sess,mempool) {
> > 	...
> > 	If(dev_info->feature_flags & RTE_CRYPTODEV_FF_SYNC
> > 		&& xform->/*sync params are set*/) {
> > 		/*Assign process function pointer in sess->priv_data*/
> > 	} /* It may return error if FF_SYNC is set and params are not correct.
> 
> Then all apps will always *have to* setup  sync parameters in xform.
> What you suggest is *mandatory* sync mode: user always has to setup sync
> mode params if PMD does support it (no matter does he plan to use sync mode
> or not).
> Which means behavior change in existing apps.

We are adding new params in xform, and user may not fill those params and defaults
To 0 for all the params. Or we can pack a flag in xform when all sync params are set.
It can be dealt with when we do the code.

I don't say, user will always have to set the params when sync mode is supported.
It will be a warning from the PMD and user may ignore it if he doesn't want to use sync mode. 


> 
> > 	        It would be upto the driver whether it support both SYNC and
> ASYNC.*/
> > }
> >
> > Now the new sync API
> >
> > pmd_process(...) {
> > 	If(dev_info->feature_flags & RTE_CRYPTODEV_FF_SYNC
> > 			 && sess_priv->process != NULL)
> > 		sess_priv->process(...);
> > 	else
> > 		ASSERT("sync mode not configured properly or not supported");
> > }
> >
> > In the data path, there is no extra processing happening.
> > Even in case of your suggestion, you should have these type of error checks,
> > You cannot blindly trust on the application that the pointers are correct.
> >
> > >
> > > > Internally the PMD will not store the process() API in the session priv data
> > > > And while calling the first packet, devops->process will give an assert that
> > > session
> > > > Is not configured for sync mode. The session validation would be done in
> any
> > > case
> > > > your suggestion or mine. So no extra overhead at runtime.
> > >
> > > I believe that after session_init() user should get either an error or
> > > valid  session handler that he can use at runtime.
> > > Pushing session validation to runtime doesn't seem like a good idea.
> > >
> > It may get a warning from the PMD, that FF_SYNC is set but params are not
> > Correct/available. See above.
> 
> I think warning is not enough.
> There should be a clear way (API) for developer to realize is the created session
> can be used by sync API data-path or not.

Warning is a clear notification to the user, that SYNC mode can be supported by the device
But user does not want to use that.
Moreover, when first packet is sent, sync API will throw error. So what is the issue.

> 
> >
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > 	3) not possible to store device (not driver) specific data within
> the
> > > > > session, but I think it is not really needed right now.
> > > > > > 	    So probably minor compared to 2.b.2.
> > > > > >
> > > > > > [Akhil] So lets omit this for current discussion. And I hope we can find
> some
> > > > > way to deal with it.
> > > > >
> > > > > I don't think there is an easy way to fix that with existing API.
> > > > >
> > > > > >
> > > > > >
> > > > > > Actually #3 follows from #2, but decided to have them separated.
> > > > > >
> > > > > > 3. process() parameters/behavior
> > > > > >     a) Our suggestion: user stores ptr to session ops (or to (*process)
> itself)
> > > and
> > > > > just does:
> > > > > >         session_ops->process(sess, ...);
> > > > > > 	Advantages:
> > > > > > 	1) fastest possible execution path
> > > > > > 	2) no need to carry on dev_id for data-path
> > > > > >
> > > > > > [Akhil] I don't see any overhead of carrying dev id, at least it would be
> > > inline
> > > > > with the
> > > > > > current DPDK methodology.
> > > > >
> > > > > If we'll add process() into rte_cryptodev itself (same as we have
> > > > > enqueue_burst/dequeue_burst),
> > > > > then it will be an ABI breakage.
> > > > > Also there are discussions to get rid of that approach completely:
> > > > >
> > >
> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpd
> > > k.org%2Farchives%2Fdev%2F2019-
> > >
> September%2F144674.html&amp;data=02%7C01%7Cakhil.goyal%40nxp.com%7
> > >
> C1859dc1d29cd45a51e9908d7571784bb%7C686ea1d3bc2b4c6fa92cd99c5c301
> > >
> 635%7C0%7C0%7C637073630835415165&amp;sdata=Bz9jgisyVzRJNt1BijtvSlurh
> > > JU1vXBbynNwlMDjaco%3D&amp;reserved=0
> > > > > So I am not sure this is a recommended way these days.
> > > >
> > > > We can either have it in rte_cryptodev or in rte_cryptodev_ops whichever
> > > > is good for you.
> > > >
> > > > Whether it is ABI breakage or not, as per your requirements, this is the
> correct
> > > > approach. Do you agree with this or not?
> > >
> > > I think it is possible approach, but not the best one:
> > > it looks quite flakey to me (see all these uncertainty with sym_session_init
> > > above),
> > > plus introduces extra overhead at data-path.
> >
> > Uncertainties can be handled appropriately using a feature flag
> >
> > >
> > > >
> > > > Now handling the API/ABI breakage is a separate story. In 19.11 release we
> > > > Are not much concerned about the ABI breakages, this was discussed in
> > > > community. So adding a new dev_ops wouldn't have been an issue.
> > > > Now since we are so close to RC1 deadline, we should come up with some
> > > > other solution for next release. May be having a pmd API in 20.02 and
> > > > converting it into formal one in 20.11
> > > >
> > > >
> > > > >
> > > > > > What you are suggesting is a new way to get the things done without
> much
> > > > > benefit.
> > > > >
> > > > > Would help with ABI stability plus better performance, isn't it enough?
> > > > >
> > > > > > Also I don't see any performance difference as crypto workload is
> heavier
> > > than
> > > > > > Code cycles, so that wont matter.
> > > > >
> > > > > It depends.
> > > > > Suppose function call costs you ~30 cycles.
> > > > > If you have burst of big packets (let say crypto for each will take ~2K
> cycles)
> > > that
> > > > > belong
> > > > > to the same session, then yes you wouldn't notice these extra 30 cycles
> at all.
> > > > > If you have burst of small packets (let say crypto for each will take ~300
> > > cycles)
> > > > > each
> > > > > belongs to different session, then it will cost you ~10% extra.
> > > >
> > > > Let us do some profiling on openssl with both the approaches and find out
> the
> > > > difference.
> > > >
> > > > >
> > > > > > So IMO, there is no advantage in your suggestion as well.
> > > > > >
> > > > > >
> > > > > > 	Disadvantages:
> > > > > > 	3) user has to carry on session_ops pointer explicitly
> > > > > >     b) Your suggestion: add  (*cpu_process) inside rte_cryptodev_ops
> and
> > > then:
> > > > > >         rte_crypto_cpu_sym_process(uint8_t dev_id,
> > > rte_cryptodev_sym_session
> > > > > *sess, /*data parameters*/) {...
> > > > > >                      rte_cryptodevs[dev_id].dev_ops->cpu_process(ses, ...);
> > > > > >                       /*and then inside PMD specifc process: */
> > > > > >                      pmd_private_session = sess-
> > > >sess_data[this_pmd_driver_id].data;
> > > > > >                      /* and then most likely either */
> > > > > >                      pmd_private_session->process(pmd_private_session, ...);
> > > > > >                      /* or jump based on session/input data */
> > > > > > 	Advantages:
> > > > > > 	1) don't see any...
> > > > > > 	Disadvantages:
> > > > > > 	2) User has to carry on dev_id inside data-path
> > > > > > 	3) Extra level of indirection (plus data dependency) - both for
> data and
> > > > > instructions.
> > > > > > 	    Possible slowdown compared to a) (not measured).
> > > > > >
> > > > > > Having said all this, if the disagreements cannot be resolved, you can
> go
> > > for a
> > > > > pmd API specific
> > > > > > to your PMDs,
> > > > >
> > > > > I don't think it is good idea.
> > > > > PMD specific API is sort of deprecated path, also there is no clean way to
> use
> > > it
> > > > > within the libraries.
> > > >
> > > > I know that this is a deprecated path, we can use it until we are not
> allowed
> > > > to break ABI/API
> > > >
> > > > >
> > > > > > because as per my understanding the solution doesn't look scalable to
> > > other
> > > > > PMDs.
> > > > > > Your approach is aligned only to Intel , will not benefit others like
> openssl
> > > > > which is used by all
> > > > > > vendors.
> > > > >
> > > > > I feel quite opposite, from my perspective majority of SW backed PMDs
> will
> > > > > benefit from it.
> > > > > And I don't see anything Intel specific in my proposals above.
> > > > > About openssl PMD: I am not an expert here, but looking at the code, I
> think
> > > it
> > > > > will fit really well.
> > > > > Look yourself at its internal functions:
> > > > > process_openssl_auth_op/process_openssl_crypto_op,
> > > > > I think they doing exactly the same - they use sync API underneath, and
> they
> > > are
> > > > > session based
> > > > > (AFAIK you don't need any device/queue data, everything that needed for
> > > > > crypto/auth is stored inside session).
> > > > >
> > > > By vendor specific, I mean,
> > > > - no PMD would like to have 2 different variants of session Init APIs for
> doing
> > > the same stuff.
> > > > - stacks will become vendor specific while using 2 separate session create
> APIs.
> > > No stack would
> > > > Like to support 2 variants of session create- one for HW PMDs and one for
> SW
> > > PMDs.
> > >
> > > I think what you refer on has nothing to do with 'vendor specific'.
> > > I would name it 'extra overhead for PMD and stack writers'.
> > > Yes, for sure there is extra overhead (as always with new API) -
> > > for both producer (PMD writer) and consumer (stack writer):
> > > New function(s) to support,  probably more tests to create/run, etc.
> > > Though this API is optional - if PMD/stack maintainer doesn't see
> > > value in it, they are free not to support it.
> > > From other side, re-using  rte_cryptodev_sym_session_init()
> > > wouldn't help anyway - both data-path and control-path would differ
> > > from async mode anyway.
> > > BTW, right now to support different HW flavors
> > > we do have 4 different control and data-paths for both
> > > ipsec-secgw and librte_ipsec:
> > > lkds-none/lksd-proto/inline-crypto/inline-proto.
> > > And that is considered to be ok.
> >
> > No that is not ok. We cannot add new paths for every other case.
> 
> What I am saying: if let-say lookaside-proto/inline-crypto/inline-proto
> deserves its own case in rte_security/rte_crypto API,
> I don't understand why cpu-crypto doesn't.

Because cpu-crypto is done by a crypto device and for that we have lookaside none.
SW PMDs are registered as crypto device and we should leverage crypto framework.
I would suggest in future may be 20.11, we can have a security wrapper over cryptodev API
For lookaside none use case. So that we will have a single API set for all cases.

> 
> > Those 4 are controlled using 2 set of APIs.
> 
> Yes there are 2 API sets (rte_cryptodev/rte_security),
> but in fact if you look at ipsec-secgw and librte_ipsec we have 4 different code
> paths.
> For both create_session() and ipsec_enqueue() we have a big switch() with 4
> different cases.
> Nearly the same for librte_ipsec - we have different prepare/process
> function pointers for each security type.
> 
> > We should try our best to
> > Have minimum overhead to the application writer. This pain was also
> discussed
> > In the one of DPDK conference as well.
> > DPDK is not a standalone entity, there are stacks running over it always.
> > We should not add API for every other use case when we have an alternative
> > Approach with the existing API set.
> >
> > Now introducing another one would add to that pain and a lot of work for
> > Both producer and consumer.
> 
> If I would see a clean approach to implement desired functionality
> without introducing new API - I would definitely support it.
> The problem is  that from my perspective,
> what you suggesting with existing API will bring more drawbacks then positives.

From my perspective I see more benefits than the negatives.
- less changes in driver/app
- no major performance gap
- easier migration for the stack from one SOC to other.

The main argument from my side is that:
You need synchronous processing for SW PMDs which is data path.
Why do you need a special session control path to do that. You should have some extra
Params packed in the same control API.

> BTW, our first approach (via rte_security) does reuse existing API,
> so if adding new API is the main concern - let's reconsider that path.
> 
That will be there only if we have security wrapper on cryptodev session create
For lookaside none use case. But the issue would still remain the same.
No special session create for supporting sync mode.


> > It would be interesting to see how much performance difference will be there
> in the
> > Two approaches. As per my understanding it wont be much as compared to
> the
> > Extra work that you will be inducing.
> >
> > -Akhil
> >
> > > Honestly, I don't understand why SW backed implementations
> > > can't have their own path that would suite them most.
> > > Konstantin
> > >
> > >
> > >
> > >
> > >


^ permalink raw reply	[flat|nested] 87+ messages in thread

end of thread, other threads:[~2019-11-01 13:53 UTC | newest]

Thread overview: 87+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-03 15:40 [dpdk-dev] [RFC PATCH 0/9] security: add software synchronous crypto process Fan Zhang
2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API Fan Zhang
2019-09-04 10:32   ` Akhil Goyal
2019-09-04 13:06     ` Zhang, Roy Fan
2019-09-06  9:01       ` Akhil Goyal
2019-09-06 13:12         ` Zhang, Roy Fan
2019-09-10 11:25           ` Akhil Goyal
2019-09-11 13:01             ` Ananyev, Konstantin
2019-09-06 13:27         ` Ananyev, Konstantin
2019-09-10 10:44           ` Akhil Goyal
2019-09-11 12:29             ` Ananyev, Konstantin
2019-09-12 14:12               ` Akhil Goyal
2019-09-16 14:53                 ` Ananyev, Konstantin
2019-09-16 15:08                   ` Ananyev, Konstantin
2019-09-17  6:02                   ` Akhil Goyal
2019-09-18  7:44                     ` Ananyev, Konstantin
2019-09-25 18:24                       ` Ananyev, Konstantin
2019-09-27  9:26                         ` Akhil Goyal
2019-09-30 12:22                           ` Ananyev, Konstantin
2019-09-30 13:43                             ` Akhil Goyal
2019-10-01 14:49                               ` Ananyev, Konstantin
2019-10-03 13:24                                 ` Akhil Goyal
2019-10-07 12:53                                   ` Ananyev, Konstantin
2019-10-09  7:20                                     ` Akhil Goyal
2019-10-09 13:43                                       ` Ananyev, Konstantin
2019-10-11 13:23                                         ` Akhil Goyal
2019-10-13 23:07                                           ` Zhang, Roy Fan
2019-10-14 11:10                                             ` Ananyev, Konstantin
2019-10-15 15:02                                               ` Akhil Goyal
2019-10-16 13:04                                                 ` Ananyev, Konstantin
2019-10-15 15:00                                             ` Akhil Goyal
2019-10-16 22:07                                           ` Ananyev, Konstantin
2019-10-17 12:49                                             ` Ananyev, Konstantin
2019-10-18 13:17                                             ` Akhil Goyal
2019-10-21 13:47                                               ` Ananyev, Konstantin
2019-10-22 13:31                                                 ` Akhil Goyal
2019-10-22 17:44                                                   ` Ananyev, Konstantin
2019-10-22 22:21                                                     ` Ananyev, Konstantin
2019-10-23 10:05                                                     ` Akhil Goyal
2019-10-30 14:23                                                       ` Ananyev, Konstantin
2019-11-01 13:53                                                         ` Akhil Goyal
2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 2/9] crypto/aesni_gcm: add rte_security handler Fan Zhang
2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 3/9] app/test: add security cpu crypto autotest Fan Zhang
2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 4/9] app/test: add security cpu crypto perftest Fan Zhang
2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 5/9] crypto/aesni_mb: add rte_security handler Fan Zhang
2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 6/9] app/test: add aesni_mb security cpu crypto autotest Fan Zhang
2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 7/9] app/test: add aesni_mb security cpu crypto perftest Fan Zhang
2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 8/9] ipsec: add rte_security cpu_crypto action support Fan Zhang
2019-09-03 15:40 ` [dpdk-dev] [RFC PATCH 9/9] examples/ipsec-secgw: add security " Fan Zhang
2019-09-06 13:13 ` [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process Fan Zhang
2019-09-06 13:13   ` [dpdk-dev] [PATCH 01/10] security: introduce CPU Crypto action type and API Fan Zhang
2019-09-18 12:45     ` Ananyev, Konstantin
2019-09-29  6:00     ` Hemant Agrawal
2019-09-29 16:59       ` Ananyev, Konstantin
2019-09-30  9:43         ` Hemant Agrawal
2019-10-01 15:27           ` Ananyev, Konstantin
2019-10-02  2:47             ` Hemant Agrawal
2019-09-06 13:13   ` [dpdk-dev] [PATCH 02/10] crypto/aesni_gcm: add rte_security handler Fan Zhang
2019-09-18 10:24     ` Ananyev, Konstantin
2019-09-06 13:13   ` [dpdk-dev] [PATCH 03/10] app/test: add security cpu crypto autotest Fan Zhang
2019-09-06 13:13   ` [dpdk-dev] [PATCH 04/10] app/test: add security cpu crypto perftest Fan Zhang
2019-09-06 13:13   ` [dpdk-dev] [PATCH 05/10] crypto/aesni_mb: add rte_security handler Fan Zhang
2019-09-18 15:20     ` Ananyev, Konstantin
2019-09-06 13:13   ` [dpdk-dev] [PATCH 06/10] app/test: add aesni_mb security cpu crypto autotest Fan Zhang
2019-09-06 13:13   ` [dpdk-dev] [PATCH 07/10] app/test: add aesni_mb security cpu crypto perftest Fan Zhang
2019-09-06 13:13   ` [dpdk-dev] [PATCH 08/10] ipsec: add rte_security cpu_crypto action support Fan Zhang
2019-09-26 23:20     ` Ananyev, Konstantin
2019-09-27 10:38     ` Ananyev, Konstantin
2019-09-06 13:13   ` [dpdk-dev] [PATCH 09/10] examples/ipsec-secgw: add security " Fan Zhang
2019-09-06 13:13   ` [dpdk-dev] [PATCH 10/10] doc: update security cpu process description Fan Zhang
2019-09-09 12:43   ` [dpdk-dev] [PATCH 00/10] security: add software synchronous crypto process Aaron Conole
2019-10-07 16:28   ` [dpdk-dev] [PATCH v2 " Fan Zhang
2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 01/10] security: introduce CPU Crypto action type and API Fan Zhang
2019-10-08 13:42       ` Ananyev, Konstantin
2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 02/10] crypto/aesni_gcm: add rte_security handler Fan Zhang
2019-10-08 13:44       ` Ananyev, Konstantin
2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 03/10] app/test: add security cpu crypto autotest Fan Zhang
2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 04/10] app/test: add security cpu crypto perftest Fan Zhang
2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 05/10] crypto/aesni_mb: add rte_security handler Fan Zhang
2019-10-08 16:23       ` Ananyev, Konstantin
2019-10-09  8:29       ` Ananyev, Konstantin
2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 06/10] app/test: add aesni_mb security cpu crypto autotest Fan Zhang
2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 07/10] app/test: add aesni_mb security cpu crypto perftest Fan Zhang
2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 08/10] ipsec: add rte_security cpu_crypto action support Fan Zhang
2019-10-08 23:28       ` Ananyev, Konstantin
2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 09/10] examples/ipsec-secgw: add security " Fan Zhang
2019-10-07 16:28     ` [dpdk-dev] [PATCH v2 10/10] doc: update security cpu process description Fan Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).