DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [RFC 0/4] cpu-crypto API choices
@ 2019-11-05 18:41 Konstantin Ananyev
  2019-11-05 18:41 ` [dpdk-dev] [RFC 1/4] cpu-crypto: Introduce basic data structures Konstantin Ananyev
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Konstantin Ananyev @ 2019-11-05 18:41 UTC (permalink / raw)
  To: dev, techboard
  Cc: roy.fan.zhang, declan.doherty, akhil.goyal, Konstantin Ananyev

Originally both SW and HW crypto PMDs use rte_crypot_op based API to
process the crypto workload asynchronously. This way provides uniformity
to both PMD types, but also introduce unnecessary performance penalty to
SW PMDs that have to "simulate" HW async behavior
(crypto-ops enqueue/dequeue, HW addresses computations,
storing/dereferencing user provided data (mbuf) for each crypto-op,
etc).

The aim is to introduce a new optional API for SW crypto-devices
to perform crypto processing in a synchronous manner.
As summarized by Akhil, we need a synchronous API to perform crypto
operations on raw data using SW PMDs, that provides:
 - no crypto-ops.
 - avoid using mbufs inside this API, use raw data buffers instead.
 - no separate enqueue-dequeue, only single process() API for data path.
 - input data buffers should be grouped by session,
   i.e. each process() call takes one session and group of input buffers
   that  belong to that session. 
 - All parameters that are constant accross session, should be stored
   inside the session itself and reused by all incoming data buffers.

While there seems no controversy about need of such functionality,
there seems to be no agreement on what would be the best API for that.
So I am requesting for TB input on that matter.

Series structure:
- patch #1 - intorduce basic data structures to be used by sync API
  (no controversy here, I hope ..)
  [RFC 1/4] cpu-crypto: Introduce basic data structures
- patch #2 - Intel initial approach for new API (via rte_security)
  [RFC 2/4] security: introduce cpu-crypto API
- patch #3 - approach that reuses existing rte_cryptodev API as much as
  possible
  [RFC 3/4] cryptodev: introduce cpu-crypto API
- patch #4 - approach via introducing new session data structure and API
  [RFC 4/4] cryptodev: introduce rte_crypto_cpu_sym_session API

Patches 2,3,4 are mutually exclusive,
and we probably have to choose which one to go forward with.
I put some explanations in each of the patches, hopefully that will help
to  understand pros and cons of each one.

Akhil strongly supports #3, AFAIK mainly because it allows PMDs to
reuse existing API and minimize API level changes.  
My favorite is #4, #2 is less preferable but ok too. 
#3 seems problematic to me by the reasons I outlined in #4 patch
description.

Please provide your opinion.

Konstantin Ananyev (4):
  cpu-crypto: Introduce basic data structures
  security: introduce cpu-crypto API
  cryptodev: introduce cpu-crypto API
  cryptodev: introduce rte_crypto_cpu_sym_session API

 lib/librte_cryptodev/rte_crypto_sym.h     | 63 +++++++++++++++++++++--
 lib/librte_cryptodev/rte_cryptodev.c      | 14 +++++
 lib/librte_cryptodev/rte_cryptodev.h      | 24 +++++++++
 lib/librte_cryptodev/rte_cryptodev_pmd.h  | 22 ++++++++
 lib/librte_security/rte_security.c        | 11 ++++
 lib/librte_security/rte_security.h        | 28 +++++++++-
 lib/librte_security/rte_security_driver.h | 20 +++++++
 7 files changed, 177 insertions(+), 5 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [dpdk-dev] [RFC 1/4] cpu-crypto: Introduce basic data structures
  2019-11-05 18:41 [dpdk-dev] [RFC 0/4] cpu-crypto API choices Konstantin Ananyev
@ 2019-11-05 18:41 ` Konstantin Ananyev
  2019-11-05 18:41 ` [dpdk-dev] [RFC 2/4] security: introduce cpu-crypto API Konstantin Ananyev
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Konstantin Ananyev @ 2019-11-05 18:41 UTC (permalink / raw)
  To: dev, techboard
  Cc: roy.fan.zhang, declan.doherty, akhil.goyal, Konstantin Ananyev

Introduce basic data strucure to be used with cpu-crypto data-path.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_cryptodev/rte_crypto_sym.h | 52 +++++++++++++++++++++++++--
 lib/librte_security/rte_security.h    |  6 +++-
 2 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/lib/librte_cryptodev/rte_crypto_sym.h b/lib/librte_cryptodev/rte_crypto_sym.h
index ffa038dc4..d8d9e9514 100644
--- a/lib/librte_cryptodev/rte_crypto_sym.h
+++ b/lib/librte_cryptodev/rte_crypto_sym.h
@@ -25,6 +25,30 @@ extern "C" {
 #include <rte_mempool.h>
 #include <rte_common.h>
 
+/**
+  * Crypto IO Vector (in analogy with struct iovec)
+  * Supposed be used to pass input/output data buffers for crypto data-path
+  * functions.
+  */
+struct rte_crypto_vec {
+	/** virtual address of the data buffer */
+	void *base;
+	/** IOVA of the data buffer */
+	rte_iova_t *iova;
+	/** length of the data buffer */
+	uint32_t len;
+};
+
+struct rte_crypto_sym_vec {
+	/** array of SGL vectors */
+	struct rte_crypto_vec *vec;
+	/** array of pointers to IV */
+	void **iv;
+	/** array of pointers to AAD */
+	void **aad;
+	/** array of pointers to digest */
+	void **digest;
+};
 
 /** Symmetric Cipher Algorithms */
 enum rte_crypto_cipher_algorithm {
@@ -116,7 +140,8 @@ struct rte_crypto_cipher_xform {
 	struct {
 		const uint8_t *data;	/**< pointer to key data */
 		uint16_t length;	/**< key length in bytes */
-	} key;
+	} __attribute__((__packed__)) key;
+
 	/**< Cipher key
 	 *
 	 * For the RTE_CRYPTO_CIPHER_AES_F8 mode of operation, key.data will
@@ -140,6 +165,16 @@ struct rte_crypto_cipher_xform {
 	 *  - Each key can be either 128 bits (16 bytes) or 256 bits (32 bytes).
 	 *  - Both keys must have the same size.
 	 **/
+
+	struct {
+		/**
+		 * offset for cipher to start within user provided data buffer.
+		 */
+		uint16_t offset;
+	} cpu_crypto;
+
+	uint8_t reserved[4];
+
 	struct {
 		uint16_t offset;
 		/**< Starting point for Initialisation Vector or Counter,
@@ -284,7 +319,7 @@ struct rte_crypto_auth_xform {
 	struct {
 		const uint8_t *data;	/**< pointer to key data */
 		uint16_t length;	/**< key length in bytes */
-	} key;
+	} __attribute__((__packed__)) key;
 	/**< Authentication key data.
 	 * The authentication key length MUST be less than or equal to the
 	 * block size of the algorithm. It is the callers responsibility to
@@ -292,6 +327,8 @@ struct rte_crypto_auth_xform {
 	 * (for example RFC 2104, FIPS 198a).
 	 */
 
+	uint8_t reserved[6];
+
 	struct {
 		uint16_t offset;
 		/**< Starting point for Initialisation Vector or Counter,
@@ -376,7 +413,16 @@ struct rte_crypto_aead_xform {
 	struct {
 		const uint8_t *data;	/**< pointer to key data */
 		uint16_t length;	/**< key length in bytes */
-	} key;
+	} __attribute__((__packed__)) key;
+
+	struct {
+		/**
+		 * offset for cipher to start within user provided data buffer.
+		 */
+		uint16_t offset;
+	} cpu_crypto;
+
+	uint8_t reserved[4];
 
 	struct {
 		uint16_t offset;
diff --git a/lib/librte_security/rte_security.h b/lib/librte_security/rte_security.h
index aaafdfcd7..fed67ab39 100644
--- a/lib/librte_security/rte_security.h
+++ b/lib/librte_security/rte_security.h
@@ -303,10 +303,14 @@ enum rte_security_session_action_type {
 	/**< All security protocol processing is performed inline during
 	 * transmission
 	 */
-	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL
+	RTE_SECURITY_ACTION_TYPE_LOOKASIDE_PROTOCOL,
 	/**< All security protocol processing including crypto is performed
 	 * on a lookaside accelerator
 	 */
+	RTE_SECURITY_ACTION_TYPE_CPU_CRYPTO,
+	/**< Crypto processing for security protocol is processed by CPU
+	 * synchronously
+	 */
 };
 
 /** Security session protocol definition */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [dpdk-dev] [RFC 2/4] security: introduce cpu-crypto API
  2019-11-05 18:41 [dpdk-dev] [RFC 0/4] cpu-crypto API choices Konstantin Ananyev
  2019-11-05 18:41 ` [dpdk-dev] [RFC 1/4] cpu-crypto: Introduce basic data structures Konstantin Ananyev
@ 2019-11-05 18:41 ` Konstantin Ananyev
  2019-11-05 18:41 ` [dpdk-dev] [RFC 3/4] cryptodev: " Konstantin Ananyev
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Konstantin Ananyev @ 2019-11-05 18:41 UTC (permalink / raw)
  To: dev, techboard
  Cc: roy.fan.zhang, declan.doherty, akhil.goyal, Konstantin Ananyev

This patch extends rte_security API with CPU-CRYPTO mode.

Crypto PMD that wants to support that functionality would need to:
1. claim RTE_CRYPTODEV_FF_SECURITY capability supported.
2. at device .probe() allocate and initialize security context (dev->
security_ctx).
3. implement at least the following functions inside rte_security_ops:
	.session_create,
	.session_get_size,
	.session_destroy, 
	.process_cpu_crypto_sym

For data-path processing consumer of that API would have to maintain:
	struct rte_security_ctx *ctx,
	struct rte_security_session *sess

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_security/rte_security.c        | 11 +++++++++++
 lib/librte_security/rte_security.h        | 22 ++++++++++++++++++++++
 lib/librte_security/rte_security_driver.h | 20 ++++++++++++++++++++
 3 files changed, 53 insertions(+)

diff --git a/lib/librte_security/rte_security.c b/lib/librte_security/rte_security.c
index bc81ce15d..243f59105 100644
--- a/lib/librte_security/rte_security.c
+++ b/lib/librte_security/rte_security.c
@@ -141,3 +141,14 @@ rte_security_capability_get(struct rte_security_ctx *instance,
 
 	return NULL;
 }
+
+__rte_experimental
+int
+rte_security__cpu_crypto_sym_process(struct rte_security_ctx *instance,
+	struct rte_security_session *sess, struct rte_crypto_sym_vec *vec,
+	int32_t status[], uint32_t num)
+{
+	RTE_FUNC_PTR_OR_ERR_RET(*instance->ops->process_cpu_crypto_sym,
+		-ENOENT);
+	return instance->ops->process_cpu_crypto_sym(sess, vec, status, num);
+}
diff --git a/lib/librte_security/rte_security.h b/lib/librte_security/rte_security.h
index fed67ab39..0dc8fec09 100644
--- a/lib/librte_security/rte_security.h
+++ b/lib/librte_security/rte_security.h
@@ -669,6 +669,28 @@ const struct rte_security_capability *
 rte_security_capability_get(struct rte_security_ctx *instance,
 			    struct rte_security_capability_idx *idx);
 
+ /**
+ * Perform actual crypto processing (encrypt/digest or auth/decrypt)
+ * on user provided data.
+ *
+ * @param	instance	security instance.
+ * @param	sess		Security session structure
+ * @param	vec		Array of vectors for input data
+ * @param	status		Array of status values (one per vec)
+ *				(RTE_CRYPTO_OP_STATUS_* values)
+ * @param	num		Number of elems in vec and status arrays.
+ *
+ * @return
+ *  - Returns negative errno value on error, or non-negative number
+ *    of successfully processed input vectors.
+ *
+*/
+__rte_experimental
+int
+rte_security__cpu_crypto_sym_process(struct rte_security_ctx *instance,
+	struct rte_security_session *sess, struct rte_crypto_sym_vec *vec,
+	int32_t status[], uint32_t num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_security/rte_security_driver.h b/lib/librte_security/rte_security_driver.h
index 1b561f852..b348c5817 100644
--- a/lib/librte_security/rte_security_driver.h
+++ b/lib/librte_security/rte_security_driver.h
@@ -132,6 +132,25 @@ typedef int (*security_get_userdata_t)(void *device,
 typedef const struct rte_security_capability *(*security_capabilities_get_t)(
 		void *device);
 
+/**
+ * Perform actual crypto processing (encrypt/digest or auth/decrypt)
+ * on user provided data.
+ *
+ * @param	sess	Security session structure
+ * @param	vec	Array of vectors for input data
+ * @param	status	Array of status values (one per vec)
+ *			(RTE_CRYPTO_OP_STATUS_* values)
+ * @param	num	Number of elems in vec and status arrays.
+ *
+ * @return
+ *  - Returns negative errno value on error, or non-negative number
+ *    of successfully processed input vectors.
+ *
+*/
+typedef int (*security_process_cpu_crypto_sym_t)(
+	struct rte_security_session *sess, struct rte_crypto_sym_vec *vec,
+	int32_t status[], uint32_t num);
+
 /** Security operations function pointer table */
 struct rte_security_ops {
 	security_session_create_t session_create;
@@ -150,6 +169,7 @@ struct rte_security_ops {
 	/**< Get userdata associated with session which processed the packet. */
 	security_capabilities_get_t capabilities_get;
 	/**< Get security capabilities. */
+	security_process_cpu_crypto_sym_t process_cpu_crypto_sym;
 };
 
 #ifdef __cplusplus
-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [dpdk-dev] [RFC 3/4] cryptodev: introduce cpu-crypto API
  2019-11-05 18:41 [dpdk-dev] [RFC 0/4] cpu-crypto API choices Konstantin Ananyev
  2019-11-05 18:41 ` [dpdk-dev] [RFC 1/4] cpu-crypto: Introduce basic data structures Konstantin Ananyev
  2019-11-05 18:41 ` [dpdk-dev] [RFC 2/4] security: introduce cpu-crypto API Konstantin Ananyev
@ 2019-11-05 18:41 ` Konstantin Ananyev
  2019-11-05 21:41   ` Akhil Goyal
  2019-11-05 18:41 ` [dpdk-dev] [RFC 4/4] cryptodev: introduce rte_crypto_cpu_sym_session API Konstantin Ananyev
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Konstantin Ananyev @ 2019-11-05 18:41 UTC (permalink / raw)
  To: dev, techboard
  Cc: roy.fan.zhang, declan.doherty, akhil.goyal, Konstantin Ananyev

This patch extends rte_cryptodev API with CPU-CRYPTO mode.
This is done by reusing existing rte_crypto_sym_session structure itself
and related control-path cryptodev API (init/clear/get_size/etc.) 
 For data-path new sym_cpu_ process() function is added into
rte_cryptodev dev_ops.   

Crypto PMD that wants to support that functionality would need to:
1. claim RTE_CRYPTODEV_FF_SYM_CPU_CRYPTO capability supported.
2. change at least the following functions inside rte_cryptodev_ops:
	. sym_session_get_size,
	. sym_session_configure,
	. sym_session_clear
to accommodate support for both sync and async modes,
3. implement new function inside rte_cryptodev_ops:
	sym_cpu_process

For data-path processing consumer of that API would have to maintain:
	struct rte_cryptodev_sym_session *sess,
	list of dev ids for which this session was properly initialized 

As an advantage of this approach - reuse of existing API
and minimal visible changes for crypto PMDs. 

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_cryptodev/rte_crypto_sym.h    | 11 ++++++++++-
 lib/librte_cryptodev/rte_cryptodev.c     | 14 ++++++++++++++
 lib/librte_cryptodev/rte_cryptodev.h     | 24 ++++++++++++++++++++++++
 lib/librte_cryptodev/rte_cryptodev_pmd.h | 22 ++++++++++++++++++++++
 4 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/lib/librte_cryptodev/rte_crypto_sym.h b/lib/librte_cryptodev/rte_crypto_sym.h
index d8d9e9514..790c77524 100644
--- a/lib/librte_cryptodev/rte_crypto_sym.h
+++ b/lib/librte_cryptodev/rte_crypto_sym.h
@@ -166,6 +166,10 @@ struct rte_crypto_cipher_xform {
 	 *  - Both keys must have the same size.
 	 **/
 
+	/**
+         * CPU-CRYPTO specific data, should be set properly when
+	 * (xform->type & RTE_CRYPTO_SYM_CPU_CRYPTO) != 0, otherwise ignored.
+	 */
 	struct {
 		/**
 		 * offset for cipher to start within user provided data buffer.
@@ -415,6 +419,10 @@ struct rte_crypto_aead_xform {
 		uint16_t length;	/**< key length in bytes */
 	} __attribute__((__packed__)) key;
 
+	/**
+         * CPU-CRYPTO specific data, should be set properly when
+	 * (xform->type & RTE_CRYPTO_SYM_CPU_CRYPTO) != 0, otherwise ignored.
+	 */
 	struct {
 		/**
 		 * offset for cipher to start within user provided data buffer.
@@ -471,7 +479,8 @@ enum rte_crypto_sym_xform_type {
 	RTE_CRYPTO_SYM_XFORM_NOT_SPECIFIED = 0,	/**< No xform specified */
 	RTE_CRYPTO_SYM_XFORM_AUTH,		/**< Authentication xform */
 	RTE_CRYPTO_SYM_XFORM_CIPHER,		/**< Cipher xform  */
-	RTE_CRYPTO_SYM_XFORM_AEAD		/**< AEAD xform  */
+	RTE_CRYPTO_SYM_XFORM_AEAD,		/**< AEAD xform  */
+	RTE_CRYPTO_SYM_CPU_CRYPTO = INT32_MIN,  /**< xform for cpu-crypto */
 };
 
 /**
diff --git a/lib/librte_cryptodev/rte_cryptodev.c b/lib/librte_cryptodev/rte_cryptodev.c
index 89aa2ed3e..b1dbaf4c1 100644
--- a/lib/librte_cryptodev/rte_cryptodev.c
+++ b/lib/librte_cryptodev/rte_cryptodev.c
@@ -1616,6 +1616,20 @@ rte_cryptodev_sym_session_get_user_data(
 	return (void *)(sess->sess_data + sess->nb_drivers);
 }
 
+__rte_experimental
+int
+rte_cryptodev_sym_cpu_crypto_process(uint8_t dev_id,
+	struct rte_cryptodev_sym_session *sess, struct rte_crypto_sym_vec *vec,
+	int32_t status[], uint32_t num)
+{
+	struct rte_cryptodev *dev;
+
+	dev = rte_cryptodev_pmd_get_dev(dev_id);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->sym_cpu_process,-ENOTSUP);
+
+	return dev->dev_ops->sym_cpu_process(dev, sess, vec, status, num);
+}
+
 /** Initialise rte_crypto_op mempool element */
 static void
 rte_crypto_op_init(struct rte_mempool *mempool,
diff --git a/lib/librte_cryptodev/rte_cryptodev.h b/lib/librte_cryptodev/rte_cryptodev.h
index c6ffa3b35..24877006c 100644
--- a/lib/librte_cryptodev/rte_cryptodev.h
+++ b/lib/librte_cryptodev/rte_cryptodev.h
@@ -450,6 +450,8 @@ rte_cryptodev_asym_get_xform_enum(enum rte_crypto_asym_xform_type *xform_enum,
 /**< Support encrypted-digest operations where digest is appended to data */
 #define RTE_CRYPTODEV_FF_ASYM_SESSIONLESS		(1ULL << 20)
 /**< Support asymmetric session-less operations */
+#define	RTE_CRYPTODEV_FF_SYM_CPU_CRYPTO			(1ULL << 21)
+/**< Support symmeteric cpu-crypto processing */
 
 
 /**
@@ -1274,6 +1276,28 @@ void *
 rte_cryptodev_sym_session_get_user_data(
 					struct rte_cryptodev_sym_session *sess);
 
+/**
+ * Perform actual crypto processing (encrypt/digest or auth/decrypt)
+ * on user provided data.
+ *
+ * @param	dev_id	The device identifier.
+ * @param	sess	Cryptodev session structure
+ * @param	vec	Array of vectors for input data
+ * @param	status	Array of status values (one per vec)
+ *			(RTE_CRYPTO_OP_STATUS_* values)
+ * @param	num	Number of elems in vec and status arrays.
+ *
+ * @return
+ *  - Returns negative errno value on error, or non-negative number
+ *    of successfully processed input vectors.
+ *
+*/
+__rte_experimental
+int
+rte_cryptodev_sym_cpu_crypto_process(uint8_t dev_id,
+	struct rte_cryptodev_sym_session *sess, struct rte_crypto_sym_vec *vec,
+	int32_t status[], uint32_t num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h b/lib/librte_cryptodev/rte_cryptodev_pmd.h
index fba14f2fa..02e7a19ae 100644
--- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
+++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
@@ -308,6 +308,26 @@ typedef void (*cryptodev_sym_free_session_t)(struct rte_cryptodev *dev,
  */
 typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev *dev,
 		struct rte_cryptodev_asym_session *sess);
+/**
+ * Perform actual crypto processing (encrypt/digest or auth/decrypt)
+ * on user provided data.
+ *
+ * @param	dev		Crypto device pointer
+ * @param	sess	Cryptodev session structure
+ * @param	vec	Array of vectors for input data
+ * @param	status	Array of status values (one per vec)
+ *			(RTE_CRYPTO_OP_STATUS_* values)
+ * @param	num	Number of elems in vec and status arrays.
+ *
+ * @return
+ *  - Returns negative errno value on error, or non-negative number
+ *    of successfully processed input vectors.
+ *
+*/
+typedef int (*cryptodev_sym_cpu_crypto_process_t)(struct rte_cryptodev *dev,
+	struct rte_cryptodev_sym_session *sess, struct rte_crypto_sym_vec *vec,
+	int32_t status[], uint32_t num);
+
 
 /** Crypto device operations function pointer table */
 struct rte_cryptodev_ops {
@@ -342,6 +362,8 @@ struct rte_cryptodev_ops {
 	/**< Clear a Crypto sessions private data. */
 	cryptodev_asym_free_session_t asym_session_clear;
 	/**< Clear a Crypto sessions private data. */
+	cryptodev_sym_cpu_crypto_process_t sym_cpu_process;
+	/**< process input data synchronously (cpu-crypto). */
 };
 
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [dpdk-dev] [RFC 4/4] cryptodev: introduce rte_crypto_cpu_sym_session API
  2019-11-05 18:41 [dpdk-dev] [RFC 0/4] cpu-crypto API choices Konstantin Ananyev
                   ` (2 preceding siblings ...)
  2019-11-05 18:41 ` [dpdk-dev] [RFC 3/4] cryptodev: " Konstantin Ananyev
@ 2019-11-05 18:41 ` Konstantin Ananyev
  2019-11-06  4:54 ` [dpdk-dev] [dpdk-techboard] [RFC 0/4] cpu-crypto API choices Honnappa Nagarahalli
  2019-11-14  5:46 ` [dpdk-dev] " Jerin Jacob
  5 siblings, 0 replies; 18+ messages in thread
From: Konstantin Ananyev @ 2019-11-05 18:41 UTC (permalink / raw)
  To: dev, techboard
  Cc: roy.fan.zhang, declan.doherty, akhil.goyal, Konstantin Ananyev

This patch extends rte_cryptodev API with CPU-CRYPTO mode.
This is done by reusing of existing rte_crypto_sym_xform data structures
and introducing new opaque rte_crypto_cpu_sym_session data structure
and related control and data path API.

Crypto PMD that wants to support that functionality would need to:
1. claim RTE_CRYPTODEV_FF_SYM_CPU_CRYPTO capability supported.
2. implement new functions for rte_cryptodev_ops:
	. cpu_sym_session_size,
	. cpu_sym_session_init,
3. implement new functions for rte_crypto_cpu_sym_session:
	.clear,
	.process
	
For data-path processing consumer would have to maintain:
	struct rte_crypto_cpu_sym_session *sess

Below is a summary of reasons why I think adding new API is a preferable
over reusing existing  rte_crypto_sym_session API:

1. Inside rte_crypto_sym_session there is an array of pointers
to PMD private session structures (indexed by driver id).
Some of them might be initialized, others not.
So even now, user has to maintain a list of dev_ids that
can be used with this session.
With patch #3 in-place it becomes even more complex -
user will have to maintain two lists of dev_ids:
  - one for async crypto-devices that can be used with that session
  - another one for sync crypto-devices that can be used with that session
In majority of examples within dpdk.org tree, we usually don't bother
and use just one device per session.
But for intermediate libraries (librte_ipsec, etc.) that
have to be as generic as possible and can't make such assumptions
such design choice is not very convenient.

From other hand, cpu-crypto is SW based synchronous operation
and it doesn't need any device/queue data during crypto-processing,
all necessary information is stored inside session itself.
The only stage where we really need dev_id - during session init().
So it seems natural to exclude dev_id from data and majority of control
path completely.

2. Current  rte_cryptodev_sym_session_init() expects PMD session data
to be allocated via provided mempool. That seems not very flexible.
Preferred way would be to allow user can allocate memory for cpu-crypto
session whenever he likes.
    
3. Would allow PMD writer to expose a separate process() functions for
different session types and hopefully save extra function call and
reduce data-dependencies for fast-path comparing to previous patch.  

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_cryptodev/rte_crypto_sym.h    |  8 +++
 lib/librte_cryptodev/rte_cryptodev.c     | 46 +++++++++++++++
 lib/librte_cryptodev/rte_cryptodev.h     | 71 ++++++++++++++++++++++++
 lib/librte_cryptodev/rte_cryptodev_pmd.h | 49 ++++++++++++++++
 4 files changed, 174 insertions(+)

diff --git a/lib/librte_cryptodev/rte_crypto_sym.h b/lib/librte_cryptodev/rte_crypto_sym.h
index d8d9e9514..45f3840bb 100644
--- a/lib/librte_cryptodev/rte_crypto_sym.h
+++ b/lib/librte_cryptodev/rte_crypto_sym.h
@@ -166,6 +166,10 @@ struct rte_crypto_cipher_xform {
 	 *  - Both keys must have the same size.
 	 **/
 
+	/**
+	 * CPU-CRYPTO specific data, used only by rte_crypto_cpu_sym_session
+	 * initialisation pass, otherwise ignored.
+	 */
 	struct {
 		/**
 		 * offset for cipher to start within user provided data buffer.
@@ -415,6 +419,10 @@ struct rte_crypto_aead_xform {
 		uint16_t length;	/**< key length in bytes */
 	} __attribute__((__packed__)) key;
 
+	/**
+	 * CPU-CRYPTO specific data, used only by rte_crypto_cpu_sym_session
+	 * initialisation pass, otherwise ignored.
+	 */
 	struct {
 		/**
 		 * offset for cipher to start within user provided data buffer.
diff --git a/lib/librte_cryptodev/rte_cryptodev.c b/lib/librte_cryptodev/rte_cryptodev.c
index 89aa2ed3e..8109665c2 100644
--- a/lib/librte_cryptodev/rte_cryptodev.c
+++ b/lib/librte_cryptodev/rte_cryptodev.c
@@ -1616,6 +1616,52 @@ rte_cryptodev_sym_session_get_user_data(
 	return (void *)(sess->sess_data + sess->nb_drivers);
 }
 
+__rte_experimental
+int
+rte_crypto_cpu_sym_session_size(uint8_t dev_id,
+	const struct rte_crypto_sym_xform *xforms)
+{
+	struct rte_cryptodev *dev;
+
+	dev = rte_cryptodev_pmd_get_dev(dev_id);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->cpu_sym_session_size, -ENOTSUP);
+
+	return dev->dev_ops->cpu_sym_session_size(dev, xforms);
+}
+
+__rte_experimental
+int
+rte_crypto_cpu_sym_session_init(uint8_t dev_id,
+	struct rte_crypto_cpu_sym_session *sess,
+	const struct rte_crypto_sym_xform *xforms)
+{
+	struct rte_cryptodev *dev;
+
+	dev = rte_cryptodev_pmd_get_dev(dev_id);
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->cpu_sym_session_init, -ENOTSUP);
+
+	return dev->dev_ops->cpu_sym_session_init(dev, sess, xforms);
+}
+
+__rte_experimental
+void
+rte_crypto_cpu_sym_session_clear(struct rte_crypto_cpu_sym_session *sess)
+{
+	if (sess != NULL && sess->ops.clear != NULL)
+		return sess->ops.clear(sess);
+}
+
+__rte_experimental
+int
+rte_crypto_cpu_sym_session_process(struct rte_crypto_cpu_sym_session *sess,
+	struct rte_crypto_sym_vec *vec, int32_t status[], uint32_t num)
+{
+	if (sess == NULL || sess->ops.process == NULL)
+		return -EINVAL;
+
+	return sess->ops.process(sess, vec, status, num);
+}
+
 /** Initialise rte_crypto_op mempool element */
 static void
 rte_crypto_op_init(struct rte_mempool *mempool,
diff --git a/lib/librte_cryptodev/rte_cryptodev.h b/lib/librte_cryptodev/rte_cryptodev.h
index c6ffa3b35..3333bbe09 100644
--- a/lib/librte_cryptodev/rte_cryptodev.h
+++ b/lib/librte_cryptodev/rte_cryptodev.h
@@ -450,6 +450,8 @@ rte_cryptodev_asym_get_xform_enum(enum rte_crypto_asym_xform_type *xform_enum,
 /**< Support encrypted-digest operations where digest is appended to data */
 #define RTE_CRYPTODEV_FF_ASYM_SESSIONLESS		(1ULL << 20)
 /**< Support asymmetric session-less operations */
+#define RTE_CRYPTODEV_FF_SYM_CPU_CRYPTO		(1ULL << 21)
+/**< Support symmeteric cpu-crypto processing */
 
 
 /**
@@ -1274,6 +1276,75 @@ void *
 rte_cryptodev_sym_session_get_user_data(
 					struct rte_cryptodev_sym_session *sess);
 
+/**
+ * opaque session structure for symmetric CPU-CRYPTO operations.
+ */
+struct rte_crypto_cpu_sym_session;
+
+/**
+ * Calculate required session size in bytes for given set of xforms.
+ * if xforms == NULL, then return the max possible session size,
+ * that would fit session for any supported by the device algorithm.
+ * if CPU-CRYPTO is not supported at all, or requeted in xform
+ * algorithm is not supported, then return -ENOTSUP.
+ * @param   dev_id   ID of device that we want the session to be used on
+ * @param   xforms   Symmetric crypto transform operations to apply on flow
+ *                   processed with this session
+ * @return
+ *  - On success, positive value for required session size in bytes.
+ *  - negative errno value otherwise.
+ */
+__rte_experimental
+int
+rte_crypto_cpu_sym_session_size(uint8_t dev_id,
+		const struct rte_crypto_sym_xform *xforms);
+
+/**
+ * Initialize symmetric CPU-CRYPTO session.
+ * It is caller responsibility to allocate enough space for it.
+ * See rte_crypto_cpu_sym_session_size above.
+ * @param   dev_id   ID of device that we want the session to be used on
+ * @param   sess     Pointer to the raw session buffer
+ * @param   xforms   Symmetric crypto transform operations to apply on flow
+ *                   processed with this session
+ * @return
+ *  - On success, zero.
+ *  - negative errno value otherwise.
+ */
+__rte_experimental
+int
+rte_crypto_cpu_sym_session_init(uint8_t dev_id,
+		struct rte_crypto_cpu_sym_session *sess,
+		const struct rte_crypto_sym_xform *xforms);
+
+/**
+ * De-initialize symmetric CPU-CRYPTO session.
+ * It is caller responsibility to free  the session pointer afterwards.
+ */
+__rte_experimental
+void
+rte_crypto_cpu_sym_session_clear(struct rte_crypto_cpu_sym_session *sess);
+
+/**
+ * Perform actual crypto processing (encrypt/digest or auth/decrypt)
+ * on user provided data.
+ *
+ * @param	sess	cpu-crypto session structure
+ * @param	vec	Array of vectors for input data
+ * @param	status	Array of status values (one per vec)
+ * 			(RTE_CRYPTO_OP_STATUS_* values)
+ * @param	num	Number of elems in vec and status arrays.
+ *
+ * @return
+ *  - Returns negative errno value on error, or non-negative number
+ *    of successfully processed input vectors.
+ *
++*/
+__rte_experimental
+int
+rte_crypto_cpu_sym_session_process(struct rte_crypto_cpu_sym_session *sess,
+	struct rte_crypto_sym_vec *vec, int32_t status[], uint32_t num);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h b/lib/librte_cryptodev/rte_cryptodev_pmd.h
index fba14f2fa..629461315 100644
--- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
+++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
@@ -309,6 +309,38 @@ typedef void (*cryptodev_sym_free_session_t)(struct rte_cryptodev *dev,
 typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev *dev,
 		struct rte_cryptodev_asym_session *sess);
 
+/**
+ * Calculate required session size in bytes for given set of xforms.
+ * if xforms == NULL, then return the max possible session size,
+ * that would fit session for any supported by the device algorithm.
+ * if CPU-CRYPTO is not supported at all, or requeted in xform
+ * algorithm is not supported, then return -ENOTSUP.
+ * @param	dev	Crypto device pointer
+ * @param	xforms	Symmetric crypto transform operations to apply on flow
+ *                      processed with this session
+ * @return
+ *  - On success, positive value for required session size in bytes.
+ *  - negative errno value otherwise.
+ */
+typedef int (*cryptodev_cpu_sym_session_size_t)(struct rte_cryptodev *dev,
+		const struct rte_crypto_sym_xform *xforms);
+
+/**
+ * Initialize symmetric CPU-CRYPTO session.
+ * It is caller responsibility to allocate enough space for it.
+ * See rte_crypto_cpu_sym_session_size above.
+ * @param	dev	Crypto device pointer
+ * @param	sess	Pointer to the raw session buffer
+ * @param	xforms	Symmetric crypto transform operations to apply on flow
+ *                       processed with this session
+ * @return
+ *  - On success, zero.
+ *  - negative errno value otherwise.
+ */
+typedef int (*cryptodev_cpu_sym_session_init_t)(struct rte_cryptodev *dev,
+		struct rte_crypto_cpu_sym_session *sess,
+		const struct rte_crypto_sym_xform *xforms);
+
 /** Crypto device operations function pointer table */
 struct rte_cryptodev_ops {
 	cryptodev_configure_t dev_configure;	/**< Configure device. */
@@ -342,6 +374,10 @@ struct rte_cryptodev_ops {
 	/**< Clear a Crypto sessions private data. */
 	cryptodev_asym_free_session_t asym_session_clear;
 	/**< Clear a Crypto sessions private data. */
+	cryptodev_cpu_sym_session_size_t cpu_sym_session_size;
+	/**< Calculate required cpu-crypto session size. */
+	cryptodev_cpu_sym_session_init_t cpu_sym_session_init;
+	/**< Initialise cpu-crypto session. */
 };
 
 
@@ -506,6 +542,19 @@ set_asym_session_private_data(struct rte_cryptodev_asym_session *sess,
 	sess->sess_private_data[driver_id] = private_data;
 }
 
+
+struct rte_crypto_cpu_sym_session_ops {
+	void (*clear)(struct rte_crypto_cpu_sym_session *);
+	int (*process)(struct rte_crypto_cpu_sym_session *,
+		struct rte_crypto_sym_vec *, int32_t [], uint32_t);
+};
+
+struct rte_crypto_cpu_sym_session {
+	struct rte_crypto_cpu_sym_session_ops ops;
+	/** session private data starts here. */
+	void *sess_data[0] __rte_cache_min_aligned;
+};
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [dpdk-dev] [RFC 3/4] cryptodev: introduce cpu-crypto API
  2019-11-05 18:41 ` [dpdk-dev] [RFC 3/4] cryptodev: " Konstantin Ananyev
@ 2019-11-05 21:41   ` Akhil Goyal
  2019-11-06 14:49     ` Ananyev, Konstantin
  0 siblings, 1 reply; 18+ messages in thread
From: Akhil Goyal @ 2019-11-05 21:41 UTC (permalink / raw)
  To: Konstantin Ananyev, dev, techboard; +Cc: roy.fan.zhang, declan.doherty

Hi Konstantin,

> 
> This patch extends rte_cryptodev API with CPU-CRYPTO mode.
> This is done by reusing existing rte_crypto_sym_session structure itself
> and related control-path cryptodev API (init/clear/get_size/etc.)
>  For data-path new sym_cpu_ process() function is added into
> rte_cryptodev dev_ops.
> 
> Crypto PMD that wants to support that functionality would need to:
> 1. claim RTE_CRYPTODEV_FF_SYM_CPU_CRYPTO capability supported.
> 2. change at least the following functions inside rte_cryptodev_ops:
> 	. sym_session_get_size,
> 	. sym_session_configure,
> 	. sym_session_clear
> to accommodate support for both sync and async modes,
> 3. implement new function inside rte_cryptodev_ops:
> 	sym_cpu_process
> 
> For data-path processing consumer of that API would have to maintain:
> 	struct rte_cryptodev_sym_session *sess,
> 	list of dev ids for which this session was properly initialized
> 
> As an advantage of this approach - reuse of existing API
> and minimal visible changes for crypto PMDs.
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  lib/librte_cryptodev/rte_crypto_sym.h    | 11 ++++++++++-
>  lib/librte_cryptodev/rte_cryptodev.c     | 14 ++++++++++++++
>  lib/librte_cryptodev/rte_cryptodev.h     | 24 ++++++++++++++++++++++++
>  lib/librte_cryptodev/rte_cryptodev_pmd.h | 22 ++++++++++++++++++++++
>  4 files changed, 70 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_cryptodev/rte_crypto_sym.h
> b/lib/librte_cryptodev/rte_crypto_sym.h
> index d8d9e9514..790c77524 100644
> --- a/lib/librte_cryptodev/rte_crypto_sym.h
> +++ b/lib/librte_cryptodev/rte_crypto_sym.h
> @@ -166,6 +166,10 @@ struct rte_crypto_cipher_xform {
>  	 *  - Both keys must have the same size.
>  	 **/
> 
> +	/**
> +         * CPU-CRYPTO specific data, should be set properly when
> +	 * (xform->type & RTE_CRYPTO_SYM_CPU_CRYPTO) != 0, otherwise
> ignored.
> +	 */
>  	struct {
>  		/**
>  		 * offset for cipher to start within user provided data buffer.

Earlier I was ok to have this offset but on another thought, why do you need this?
User can give the exact pointer in the process() API from where he wants to do ciphering.
You will be adding this offset in the driver if you keep this in xform/session. So I think there is
no difference whether you add in the driver or the application. Am I missing something?

> @@ -415,6 +419,10 @@ struct rte_crypto_aead_xform {
>  		uint16_t length;	/**< key length in bytes */
>  	} __attribute__((__packed__)) key;
> 
> +	/**
> +         * CPU-CRYPTO specific data, should be set properly when
> +	 * (xform->type & RTE_CRYPTO_SYM_CPU_CRYPTO) != 0, otherwise
> ignored.
> +	 */
>  	struct {
>  		/**
>  		 * offset for cipher to start within user provided data buffer.
> @@ -471,7 +479,8 @@ enum rte_crypto_sym_xform_type {
>  	RTE_CRYPTO_SYM_XFORM_NOT_SPECIFIED = 0,	/**< No xform
> specified */
>  	RTE_CRYPTO_SYM_XFORM_AUTH,		/**< Authentication
> xform */
>  	RTE_CRYPTO_SYM_XFORM_CIPHER,		/**< Cipher xform  */
> -	RTE_CRYPTO_SYM_XFORM_AEAD		/**< AEAD xform  */
> +	RTE_CRYPTO_SYM_XFORM_AEAD,		/**< AEAD xform  */
> +	RTE_CRYPTO_SYM_CPU_CRYPTO = INT32_MIN,  /**< xform for cpu-
> crypto */

This is not a correct place to have this. All types of xforms CIPHER/AUTH/AEAD 
can be used in SYNC mode

>  };
> 
>  /**
> diff --git a/lib/librte_cryptodev/rte_cryptodev.c
> b/lib/librte_cryptodev/rte_cryptodev.c
> index 89aa2ed3e..b1dbaf4c1 100644
> --- a/lib/librte_cryptodev/rte_cryptodev.c
> +++ b/lib/librte_cryptodev/rte_cryptodev.c
> @@ -1616,6 +1616,20 @@ rte_cryptodev_sym_session_get_user_data(
>  	return (void *)(sess->sess_data + sess->nb_drivers);
>  }
> 
> +__rte_experimental
> +int
> +rte_cryptodev_sym_cpu_crypto_process(uint8_t dev_id,
> +	struct rte_cryptodev_sym_session *sess, struct rte_crypto_sym_vec
> *vec,
> +	int32_t status[], uint32_t num)
> +{
> +	struct rte_cryptodev *dev;
> +
> +	dev = rte_cryptodev_pmd_get_dev(dev_id);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->sym_cpu_process,-
> ENOTSUP);
> +
> +	return dev->dev_ops->sym_cpu_process(dev, sess, vec, status, num);
> +}
> +
>  /** Initialise rte_crypto_op mempool element */
>  static void
>  rte_crypto_op_init(struct rte_mempool *mempool,
> diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> b/lib/librte_cryptodev/rte_cryptodev.h
> index c6ffa3b35..24877006c 100644
> --- a/lib/librte_cryptodev/rte_cryptodev.h
> +++ b/lib/librte_cryptodev/rte_cryptodev.h
> @@ -450,6 +450,8 @@ rte_cryptodev_asym_get_xform_enum(enum
> rte_crypto_asym_xform_type *xform_enum,
>  /**< Support encrypted-digest operations where digest is appended to data */
>  #define RTE_CRYPTODEV_FF_ASYM_SESSIONLESS		(1ULL << 20)
>  /**< Support asymmetric session-less operations */
> +#define	RTE_CRYPTODEV_FF_SYM_CPU_CRYPTO
> 	(1ULL << 21)
> +/**< Support symmeteric cpu-crypto processing */
> 
> 
>  /**
> @@ -1274,6 +1276,28 @@ void *
>  rte_cryptodev_sym_session_get_user_data(
>  					struct rte_cryptodev_sym_session
> *sess);
> 
> +/**
> + * Perform actual crypto processing (encrypt/digest or auth/decrypt)
> + * on user provided data.
> + *
> + * @param	dev_id	The device identifier.
> + * @param	sess	Cryptodev session structure
> + * @param	vec	Array of vectors for input data
> + * @param	status	Array of status values (one per vec)
> + *			(RTE_CRYPTO_OP_STATUS_* values)
> + * @param	num	Number of elems in vec and status arrays.
> + *
> + * @return
> + *  - Returns negative errno value on error, or non-negative number
> + *    of successfully processed input vectors.
> + *
> +*/
> +__rte_experimental
> +int
> +rte_cryptodev_sym_cpu_crypto_process(uint8_t dev_id,
> +	struct rte_cryptodev_sym_session *sess, struct rte_crypto_sym_vec
> *vec,
> +	int32_t status[], uint32_t num);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> index fba14f2fa..02e7a19ae 100644
> --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> @@ -308,6 +308,26 @@ typedef void (*cryptodev_sym_free_session_t)(struct
> rte_cryptodev *dev,
>   */
>  typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev *dev,
>  		struct rte_cryptodev_asym_session *sess);
> +/**
> + * Perform actual crypto processing (encrypt/digest or auth/decrypt)
> + * on user provided data.
> + *
> + * @param	dev		Crypto device pointer
> + * @param	sess	Cryptodev session structure
> + * @param	vec	Array of vectors for input data
> + * @param	status	Array of status values (one per vec)
> + *			(RTE_CRYPTO_OP_STATUS_* values)
> + * @param	num	Number of elems in vec and status arrays.
> + *
> + * @return
> + *  - Returns negative errno value on error, or non-negative number
> + *    of successfully processed input vectors.
> + *
> +*/
> +typedef int (*cryptodev_sym_cpu_crypto_process_t)(struct rte_cryptodev *dev,
> +	struct rte_cryptodev_sym_session *sess, struct rte_crypto_sym_vec
> *vec,
> +	int32_t status[], uint32_t num);
> +
> 
>  /** Crypto device operations function pointer table */
>  struct rte_cryptodev_ops {
> @@ -342,6 +362,8 @@ struct rte_cryptodev_ops {
>  	/**< Clear a Crypto sessions private data. */
>  	cryptodev_asym_free_session_t asym_session_clear;
>  	/**< Clear a Crypto sessions private data. */
> +	cryptodev_sym_cpu_crypto_process_t sym_cpu_process;
> +	/**< process input data synchronously (cpu-crypto). */
>  };
> 
> 
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [dpdk-dev] [dpdk-techboard] [RFC 0/4] cpu-crypto API choices
  2019-11-05 18:41 [dpdk-dev] [RFC 0/4] cpu-crypto API choices Konstantin Ananyev
                   ` (3 preceding siblings ...)
  2019-11-05 18:41 ` [dpdk-dev] [RFC 4/4] cryptodev: introduce rte_crypto_cpu_sym_session API Konstantin Ananyev
@ 2019-11-06  4:54 ` Honnappa Nagarahalli
  2019-11-06  9:35   ` Thomas Monjalon
  2019-11-14  5:46 ` [dpdk-dev] " Jerin Jacob
  5 siblings, 1 reply; 18+ messages in thread
From: Honnappa Nagarahalli @ 2019-11-06  4:54 UTC (permalink / raw)
  To: Konstantin Ananyev, dev, techboard
  Cc: roy.fan.zhang, declan.doherty, Akhil.goyal@nxp.com, nd

<snip>

> 
> Originally both SW and HW crypto PMDs use rte_crypot_op based API to
> process the crypto workload asynchronously. This way provides uniformity to
> both PMD types, but also introduce unnecessary performance penalty to SW
> PMDs that have to "simulate" HW async behavior (crypto-ops
> enqueue/dequeue, HW addresses computations, storing/dereferencing user
> provided data (mbuf) for each crypto-op, etc).
> 
> The aim is to introduce a new optional API for SW crypto-devices to perform
> crypto processing in a synchronous manner.
> As summarized by Akhil, we need a synchronous API to perform crypto
> operations on raw data using SW PMDs, that provides:
>  - no crypto-ops.
>  - avoid using mbufs inside this API, use raw data buffers instead.
>  - no separate enqueue-dequeue, only single process() API for data path.
>  - input data buffers should be grouped by session,
>    i.e. each process() call takes one session and group of input buffers
>    that  belong to that session.
>  - All parameters that are constant accross session, should be stored
>    inside the session itself and reused by all incoming data buffers.
> 
> While there seems no controversy about need of such functionality, there
> seems to be no agreement on what would be the best API for that.
> So I am requesting for TB input on that matter.
> 
> Series structure:
> - patch #1 - intorduce basic data structures to be used by sync API
>   (no controversy here, I hope ..)
>   [RFC 1/4] cpu-crypto: Introduce basic data structures
> - patch #2 - Intel initial approach for new API (via rte_security)
>   [RFC 2/4] security: introduce cpu-crypto API
> - patch #3 - approach that reuses existing rte_cryptodev API as much as
>   possible
>   [RFC 3/4] cryptodev: introduce cpu-crypto API
> - patch #4 - approach via introducing new session data structure and API
>   [RFC 4/4] cryptodev: introduce rte_crypto_cpu_sym_session API
> 
> Patches 2,3,4 are mutually exclusive,
> and we probably have to choose which one to go forward with.
> I put some explanations in each of the patches, hopefully that will help to
> understand pros and cons of each one.
> 
> Akhil strongly supports #3, AFAIK mainly because it allows PMDs to reuse
> existing API and minimize API level changes.
IMO, from application perspective, it should not matter who (CPU or an accelerator) does the crypto functionality. It just needs to know if the result will be returned synchronously or asynchronously.

> My favorite is #4, #2 is less preferable but ok too.
> #3 seems problematic to me by the reasons I outlined in #4 patch description.
> 
> Please provide your opinion.
> 
> Konstantin Ananyev (4):
>   cpu-crypto: Introduce basic data structures
>   security: introduce cpu-crypto API
>   cryptodev: introduce cpu-crypto API
>   cryptodev: introduce rte_crypto_cpu_sym_session API
> 
>  lib/librte_cryptodev/rte_crypto_sym.h     | 63 +++++++++++++++++++++--
>  lib/librte_cryptodev/rte_cryptodev.c      | 14 +++++
>  lib/librte_cryptodev/rte_cryptodev.h      | 24 +++++++++
>  lib/librte_cryptodev/rte_cryptodev_pmd.h  | 22 ++++++++
>  lib/librte_security/rte_security.c        | 11 ++++
>  lib/librte_security/rte_security.h        | 28 +++++++++-
>  lib/librte_security/rte_security_driver.h | 20 +++++++
>  7 files changed, 177 insertions(+), 5 deletions(-)
> 
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [dpdk-dev] [dpdk-techboard] [RFC 0/4] cpu-crypto API choices
  2019-11-06  4:54 ` [dpdk-dev] [dpdk-techboard] [RFC 0/4] cpu-crypto API choices Honnappa Nagarahalli
@ 2019-11-06  9:35   ` Thomas Monjalon
  2019-11-06  9:48     ` Thomas Monjalon
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Monjalon @ 2019-11-06  9:35 UTC (permalink / raw)
  To: techboard
  Cc: Honnappa Nagarahalli, Konstantin Ananyev, dev, roy.fan.zhang,
	declan.doherty, Akhil.goyal@nxp.com, nd

06/11/2019 05:54, Honnappa Nagarahalli:
> <snip>
> 
> > Originally both SW and HW crypto PMDs use rte_crypot_op based API to
> > process the crypto workload asynchronously. This way provides uniformity to
> > both PMD types, but also introduce unnecessary performance penalty to SW
> > PMDs that have to "simulate" HW async behavior (crypto-ops
> > enqueue/dequeue, HW addresses computations, storing/dereferencing user
> > provided data (mbuf) for each crypto-op, etc).
> > 
> > The aim is to introduce a new optional API for SW crypto-devices to perform
> > crypto processing in a synchronous manner.
> > As summarized by Akhil, we need a synchronous API to perform crypto
> > operations on raw data using SW PMDs, that provides:
> >  - no crypto-ops.
> >  - avoid using mbufs inside this API, use raw data buffers instead.
> >  - no separate enqueue-dequeue, only single process() API for data path.
> >  - input data buffers should be grouped by session,
> >    i.e. each process() call takes one session and group of input buffers
> >    that  belong to that session.
> >  - All parameters that are constant accross session, should be stored
> >    inside the session itself and reused by all incoming data buffers.
> > 
> > While there seems no controversy about need of such functionality, there
> > seems to be no agreement on what would be the best API for that.
> > So I am requesting for TB input on that matter.
> > 
> > Series structure:
> > - patch #1 - intorduce basic data structures to be used by sync API
> >   (no controversy here, I hope ..)
> >   [RFC 1/4] cpu-crypto: Introduce basic data structures
> > - patch #2 - Intel initial approach for new API (via rte_security)
> >   [RFC 2/4] security: introduce cpu-crypto API
> > - patch #3 - approach that reuses existing rte_cryptodev API as much as
> >   possible
> >   [RFC 3/4] cryptodev: introduce cpu-crypto API
> > - patch #4 - approach via introducing new session data structure and API
> >   [RFC 4/4] cryptodev: introduce rte_crypto_cpu_sym_session API
> > 
> > Patches 2,3,4 are mutually exclusive,
> > and we probably have to choose which one to go forward with.
> > I put some explanations in each of the patches, hopefully that will help to
> > understand pros and cons of each one.
> > 
> > Akhil strongly supports #3, AFAIK mainly because it allows PMDs to reuse
> > existing API and minimize API level changes.
> 
> IMO, from application perspective, it should not matter who (CPU or an accelerator) does the crypto functionality. It just needs to know if the result will be returned synchronously or asynchronously.

We already have asymmetric and symmetric APIs.
Here you are proposing a third method: symmetric without mbuf for CPU PMDs

> > My favorite is #4, #2 is less preferable but ok too.
> > #3 seems problematic to me by the reasons I outlined in #4 patch description.
> > 
> > Please provide your opinion.

It means the API is not PMD agnostic, right?




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [dpdk-dev] [dpdk-techboard] [RFC 0/4] cpu-crypto API choices
  2019-11-06  9:35   ` Thomas Monjalon
@ 2019-11-06  9:48     ` Thomas Monjalon
  2019-11-06 10:14       ` Ananyev, Konstantin
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Monjalon @ 2019-11-06  9:48 UTC (permalink / raw)
  To: techboard
  Cc: Honnappa Nagarahalli, Konstantin Ananyev, dev, roy.fan.zhang,
	declan.doherty, Akhil.goyal@nxp.com, nd

06/11/2019 10:35, Thomas Monjalon:
> 06/11/2019 05:54, Honnappa Nagarahalli:
> > <snip>
> > 
> > > Originally both SW and HW crypto PMDs use rte_crypot_op based API to
> > > process the crypto workload asynchronously. This way provides uniformity to
> > > both PMD types, but also introduce unnecessary performance penalty to SW
> > > PMDs that have to "simulate" HW async behavior (crypto-ops
> > > enqueue/dequeue, HW addresses computations, storing/dereferencing user
> > > provided data (mbuf) for each crypto-op, etc).
> > > 
> > > The aim is to introduce a new optional API for SW crypto-devices to perform
> > > crypto processing in a synchronous manner.
> > > As summarized by Akhil, we need a synchronous API to perform crypto
> > > operations on raw data using SW PMDs, that provides:
> > >  - no crypto-ops.
> > >  - avoid using mbufs inside this API, use raw data buffers instead.
> > >  - no separate enqueue-dequeue, only single process() API for data path.
> > >  - input data buffers should be grouped by session,
> > >    i.e. each process() call takes one session and group of input buffers
> > >    that  belong to that session.
> > >  - All parameters that are constant accross session, should be stored
> > >    inside the session itself and reused by all incoming data buffers.
> > > 
> > > While there seems no controversy about need of such functionality, there
> > > seems to be no agreement on what would be the best API for that.
> > > So I am requesting for TB input on that matter.
> > > 
> > > Series structure:
> > > - patch #1 - intorduce basic data structures to be used by sync API
> > >   (no controversy here, I hope ..)
> > >   [RFC 1/4] cpu-crypto: Introduce basic data structures
> > > - patch #2 - Intel initial approach for new API (via rte_security)
> > >   [RFC 2/4] security: introduce cpu-crypto API
> > > - patch #3 - approach that reuses existing rte_cryptodev API as much as
> > >   possible
> > >   [RFC 3/4] cryptodev: introduce cpu-crypto API
> > > - patch #4 - approach via introducing new session data structure and API
> > >   [RFC 4/4] cryptodev: introduce rte_crypto_cpu_sym_session API
> > > 
> > > Patches 2,3,4 are mutually exclusive,
> > > and we probably have to choose which one to go forward with.
> > > I put some explanations in each of the patches, hopefully that will help to
> > > understand pros and cons of each one.
> > > 
> > > Akhil strongly supports #3, AFAIK mainly because it allows PMDs to reuse
> > > existing API and minimize API level changes.
> > 
> > IMO, from application perspective, it should not matter who (CPU or an accelerator) does the crypto functionality. It just needs to know if the result will be returned synchronously or asynchronously.
> 
> We already have asymmetric and symmetric APIs.
> Here you are proposing a third method: symmetric without mbuf for CPU PMDs

Sorry, for this garbage, I am mixing synchronous/asynchronous and symmetric/asymmetric.

> > > My favorite is #4, #2 is less preferable but ok too.
> > > #3 seems problematic to me by the reasons I outlined in #4 patch description.
> > > 
> > > Please provide your opinion.
> 
> It means the API is not PMD agnostic, right?

So the question is to know if a synchronous API will be implemented only for CPU virtual PMDs?



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [dpdk-dev] [dpdk-techboard] [RFC 0/4] cpu-crypto API choices
  2019-11-06  9:48     ` Thomas Monjalon
@ 2019-11-06 10:14       ` Ananyev, Konstantin
  2019-11-06 11:33         ` Ananyev, Konstantin
  0 siblings, 1 reply; 18+ messages in thread
From: Ananyev, Konstantin @ 2019-11-06 10:14 UTC (permalink / raw)
  To: Thomas Monjalon, techboard
  Cc: Honnappa Nagarahalli, dev, Zhang, Roy Fan, Doherty, Declan,
	Akhil.goyal@nxp.com, nd


> > > > Originally both SW and HW crypto PMDs use rte_crypot_op based API to
> > > > process the crypto workload asynchronously. This way provides uniformity to
> > > > both PMD types, but also introduce unnecessary performance penalty to SW
> > > > PMDs that have to "simulate" HW async behavior (crypto-ops
> > > > enqueue/dequeue, HW addresses computations, storing/dereferencing user
> > > > provided data (mbuf) for each crypto-op, etc).
> > > >
> > > > The aim is to introduce a new optional API for SW crypto-devices to perform
> > > > crypto processing in a synchronous manner.
> > > > As summarized by Akhil, we need a synchronous API to perform crypto
> > > > operations on raw data using SW PMDs, that provides:
> > > >  - no crypto-ops.
> > > >  - avoid using mbufs inside this API, use raw data buffers instead.
> > > >  - no separate enqueue-dequeue, only single process() API for data path.
> > > >  - input data buffers should be grouped by session,
> > > >    i.e. each process() call takes one session and group of input buffers
> > > >    that  belong to that session.
> > > >  - All parameters that are constant accross session, should be stored
> > > >    inside the session itself and reused by all incoming data buffers.
> > > >
> > > > While there seems no controversy about need of such functionality, there
> > > > seems to be no agreement on what would be the best API for that.
> > > > So I am requesting for TB input on that matter.
> > > >
> > > > Series structure:
> > > > - patch #1 - intorduce basic data structures to be used by sync API
> > > >   (no controversy here, I hope ..)
> > > >   [RFC 1/4] cpu-crypto: Introduce basic data structures
> > > > - patch #2 - Intel initial approach for new API (via rte_security)
> > > >   [RFC 2/4] security: introduce cpu-crypto API
> > > > - patch #3 - approach that reuses existing rte_cryptodev API as much as
> > > >   possible
> > > >   [RFC 3/4] cryptodev: introduce cpu-crypto API
> > > > - patch #4 - approach via introducing new session data structure and API
> > > >   [RFC 4/4] cryptodev: introduce rte_crypto_cpu_sym_session API
> > > >
> > > > Patches 2,3,4 are mutually exclusive,
> > > > and we probably have to choose which one to go forward with.
> > > > I put some explanations in each of the patches, hopefully that will help to
> > > > understand pros and cons of each one.
> > > >
> > > > Akhil strongly supports #3, AFAIK mainly because it allows PMDs to reuse
> > > > existing API and minimize API level changes.
> > >
> > > IMO, from application perspective, it should not matter who (CPU or an accelerator) does the crypto functionality. It just needs to know
> if the result will be returned synchronously or asynchronously.
> >
> > We already have asymmetric and symmetric APIs.
> > Here you are proposing a third method: symmetric without mbuf for CPU PMDs
> 
> Sorry, for this garbage, I am mixing synchronous/asynchronous and symmetric/asymmetric.
> 
> > > > My favorite is #4, #2 is less preferable but ok too.
> > > > #3 seems problematic to me by the reasons I outlined in #4 patch description.
> > > >
> > > > Please provide your opinion.
> >
> > It means the API is not PMD agnostic, right?

Probably not...
Because inside DPDK we don't have any other abstraction for SW crypto-libs
except vdev, we do need dev_id to get session initialization point.
After that I believe all operations can be session based.
 
> So the question is to know if a synchronous API will be implemented only for CPU virtual PMDs?

I don't expect lookaside devices to benefit from sync mode.
I think performance penalty would be too high.
Konstantin



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [dpdk-dev] [dpdk-techboard] [RFC 0/4] cpu-crypto API choices
  2019-11-06 10:14       ` Ananyev, Konstantin
@ 2019-11-06 11:33         ` Ananyev, Konstantin
  2019-11-06 12:18           ` Thomas Monjalon
  0 siblings, 1 reply; 18+ messages in thread
From: Ananyev, Konstantin @ 2019-11-06 11:33 UTC (permalink / raw)
  To: Ananyev, Konstantin, Thomas Monjalon, techboard
  Cc: Honnappa Nagarahalli, dev, Zhang, Roy Fan, Doherty, Declan,
	Akhil.goyal@nxp.com, nd


> > > > > Originally both SW and HW crypto PMDs use rte_crypot_op based API to
> > > > > process the crypto workload asynchronously. This way provides uniformity to
> > > > > both PMD types, but also introduce unnecessary performance penalty to SW
> > > > > PMDs that have to "simulate" HW async behavior (crypto-ops
> > > > > enqueue/dequeue, HW addresses computations, storing/dereferencing user
> > > > > provided data (mbuf) for each crypto-op, etc).
> > > > >
> > > > > The aim is to introduce a new optional API for SW crypto-devices to perform
> > > > > crypto processing in a synchronous manner.
> > > > > As summarized by Akhil, we need a synchronous API to perform crypto
> > > > > operations on raw data using SW PMDs, that provides:
> > > > >  - no crypto-ops.
> > > > >  - avoid using mbufs inside this API, use raw data buffers instead.
> > > > >  - no separate enqueue-dequeue, only single process() API for data path.
> > > > >  - input data buffers should be grouped by session,
> > > > >    i.e. each process() call takes one session and group of input buffers
> > > > >    that  belong to that session.
> > > > >  - All parameters that are constant accross session, should be stored
> > > > >    inside the session itself and reused by all incoming data buffers.
> > > > >
> > > > > While there seems no controversy about need of such functionality, there
> > > > > seems to be no agreement on what would be the best API for that.
> > > > > So I am requesting for TB input on that matter.
> > > > >
> > > > > Series structure:
> > > > > - patch #1 - intorduce basic data structures to be used by sync API
> > > > >   (no controversy here, I hope ..)
> > > > >   [RFC 1/4] cpu-crypto: Introduce basic data structures
> > > > > - patch #2 - Intel initial approach for new API (via rte_security)
> > > > >   [RFC 2/4] security: introduce cpu-crypto API
> > > > > - patch #3 - approach that reuses existing rte_cryptodev API as much as
> > > > >   possible
> > > > >   [RFC 3/4] cryptodev: introduce cpu-crypto API
> > > > > - patch #4 - approach via introducing new session data structure and API
> > > > >   [RFC 4/4] cryptodev: introduce rte_crypto_cpu_sym_session API
> > > > >
> > > > > Patches 2,3,4 are mutually exclusive,
> > > > > and we probably have to choose which one to go forward with.
> > > > > I put some explanations in each of the patches, hopefully that will help to
> > > > > understand pros and cons of each one.
> > > > >
> > > > > Akhil strongly supports #3, AFAIK mainly because it allows PMDs to reuse
> > > > > existing API and minimize API level changes.
> > > >
> > > > IMO, from application perspective, it should not matter who (CPU or an accelerator) does the crypto functionality. It just needs to
> know
> > if the result will be returned synchronously or asynchronously.
> > >
> > > We already have asymmetric and symmetric APIs.
> > > Here you are proposing a third method: symmetric without mbuf for CPU PMDs
> >
> > Sorry, for this garbage, I am mixing synchronous/asynchronous and symmetric/asymmetric.
> >
> > > > > My favorite is #4, #2 is less preferable but ok too.
> > > > > #3 seems problematic to me by the reasons I outlined in #4 patch description.
> > > > >
> > > > > Please provide your opinion.
> > >
> > > It means the API is not PMD agnostic, right?
> 
> Probably not...
> Because inside DPDK we don't have any other abstraction for SW crypto-libs
> except vdev, we do need dev_id to get session initialization point.
> After that I believe all operations can be session based.
> 
> > So the question is to know if a synchronous API will be implemented only for CPU virtual PMDs?
> 
> I don't expect lookaside devices to benefit from sync mode.
> I think performance penalty would be too high.

After another thought, if some lookaside PMD would like to support such API -
I think it is still possible: dev_id (or just pointer to internal dev/queue structure)
can be stored inside the session itself.
Though I really doubt any lookaside PMD would be interested in such mode.
Konstantin 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [dpdk-dev] [dpdk-techboard] [RFC 0/4] cpu-crypto API choices
  2019-11-06 11:33         ` Ananyev, Konstantin
@ 2019-11-06 12:18           ` Thomas Monjalon
  2019-11-06 12:22             ` Hemant Agrawal
  2019-11-06 15:19             ` Ananyev, Konstantin
  0 siblings, 2 replies; 18+ messages in thread
From: Thomas Monjalon @ 2019-11-06 12:18 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: techboard, Honnappa Nagarahalli, dev, Zhang, Roy Fan, Doherty,
	Declan, Akhil.goyal@nxp.com, nd

06/11/2019 12:33, Ananyev, Konstantin:
> 
> > > > > > Originally both SW and HW crypto PMDs use rte_crypot_op based API to
> > > > > > process the crypto workload asynchronously. This way provides uniformity to
> > > > > > both PMD types, but also introduce unnecessary performance penalty to SW
> > > > > > PMDs that have to "simulate" HW async behavior (crypto-ops
> > > > > > enqueue/dequeue, HW addresses computations, storing/dereferencing user
> > > > > > provided data (mbuf) for each crypto-op, etc).
> > > > > >
> > > > > > The aim is to introduce a new optional API for SW crypto-devices to perform
> > > > > > crypto processing in a synchronous manner.
> > > > > > As summarized by Akhil, we need a synchronous API to perform crypto
> > > > > > operations on raw data using SW PMDs, that provides:
> > > > > >  - no crypto-ops.
> > > > > >  - avoid using mbufs inside this API, use raw data buffers instead.
> > > > > >  - no separate enqueue-dequeue, only single process() API for data path.
> > > > > >  - input data buffers should be grouped by session,
> > > > > >    i.e. each process() call takes one session and group of input buffers
> > > > > >    that  belong to that session.
> > > > > >  - All parameters that are constant accross session, should be stored
> > > > > >    inside the session itself and reused by all incoming data buffers.
> > > > > >
> > > > > > While there seems no controversy about need of such functionality, there
> > > > > > seems to be no agreement on what would be the best API for that.
> > > > > > So I am requesting for TB input on that matter.
> > > > > >
> > > > > > Series structure:
> > > > > > - patch #1 - intorduce basic data structures to be used by sync API
> > > > > >   (no controversy here, I hope ..)
> > > > > >   [RFC 1/4] cpu-crypto: Introduce basic data structures
> > > > > > - patch #2 - Intel initial approach for new API (via rte_security)
> > > > > >   [RFC 2/4] security: introduce cpu-crypto API
> > > > > > - patch #3 - approach that reuses existing rte_cryptodev API as much as
> > > > > >   possible
> > > > > >   [RFC 3/4] cryptodev: introduce cpu-crypto API
> > > > > > - patch #4 - approach via introducing new session data structure and API
> > > > > >   [RFC 4/4] cryptodev: introduce rte_crypto_cpu_sym_session API
> > > > > >
> > > > > > Patches 2,3,4 are mutually exclusive,
> > > > > > and we probably have to choose which one to go forward with.
> > > > > > I put some explanations in each of the patches, hopefully that will help to
> > > > > > understand pros and cons of each one.
> > > > > >
> > > > > > Akhil strongly supports #3, AFAIK mainly because it allows PMDs to reuse
> > > > > > existing API and minimize API level changes.
> > > > >
> > > > > IMO, from application perspective, it should not matter who (CPU or an accelerator) does the crypto functionality. It just needs to
> > know
> > > if the result will be returned synchronously or asynchronously.
> > > >
> > > > We already have asymmetric and symmetric APIs.
> > > > Here you are proposing a third method: symmetric without mbuf for CPU PMDs
> > >
> > > Sorry, for this garbage, I am mixing synchronous/asynchronous and symmetric/asymmetric.
> > >
> > > > > > My favorite is #4, #2 is less preferable but ok too.
> > > > > > #3 seems problematic to me by the reasons I outlined in #4 patch description.
> > > > > >
> > > > > > Please provide your opinion.
> > > >
> > > > It means the API is not PMD agnostic, right?
> > 
> > Probably not...
> > Because inside DPDK we don't have any other abstraction for SW crypto-libs
> > except vdev, we do need dev_id to get session initialization point.
> > After that I believe all operations can be session based.
> > 
> > > So the question is to know if a synchronous API will be implemented only for CPU virtual PMDs?
> > 
> > I don't expect lookaside devices to benefit from sync mode.
> > I think performance penalty would be too high.
> 
> After another thought, if some lookaside PMD would like to support such API -
> I think it is still possible: dev_id (or just pointer to internal dev/queue structure)
> can be stored inside the session itself.
> Though I really doubt any lookaside PMD would be interested in such mode.

So what should be the logic in the application?
How the combo PMD/API is chosen?
How does it work with the crypto scheduler?




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [dpdk-dev] [dpdk-techboard] [RFC 0/4] cpu-crypto API choices
  2019-11-06 12:18           ` Thomas Monjalon
@ 2019-11-06 12:22             ` Hemant Agrawal
  2019-11-06 15:19             ` Ananyev, Konstantin
  1 sibling, 0 replies; 18+ messages in thread
From: Hemant Agrawal @ 2019-11-06 12:22 UTC (permalink / raw)
  To: Thomas Monjalon, Ananyev, Konstantin
  Cc: techboard, Honnappa Nagarahalli, dev, Zhang, Roy Fan, Doherty,
	Declan, Akhil Goyal, nd


> 06/11/2019 12:33, Ananyev, Konstantin:
> >
> > > > > > > Originally both SW and HW crypto PMDs use rte_crypot_op
> > > > > > > based API to process the crypto workload asynchronously.
> > > > > > > This way provides uniformity to both PMD types, but also
> > > > > > > introduce unnecessary performance penalty to SW PMDs that
> > > > > > > have to "simulate" HW async behavior (crypto-ops
> > > > > > > enqueue/dequeue, HW addresses computations,
> storing/dereferencing user provided data (mbuf) for each crypto-op, etc).
> > > > > > >
> > > > > > > The aim is to introduce a new optional API for SW
> > > > > > > crypto-devices to perform crypto processing in a synchronous
> manner.
> > > > > > > As summarized by Akhil, we need a synchronous API to perform
> > > > > > > crypto operations on raw data using SW PMDs, that provides:
> > > > > > >  - no crypto-ops.
> > > > > > >  - avoid using mbufs inside this API, use raw data buffers instead.
> > > > > > >  - no separate enqueue-dequeue, only single process() API for data
> path.
> > > > > > >  - input data buffers should be grouped by session,
> > > > > > >    i.e. each process() call takes one session and group of input
> buffers
> > > > > > >    that  belong to that session.
> > > > > > >  - All parameters that are constant accross session, should be
> stored
> > > > > > >    inside the session itself and reused by all incoming data buffers.
> > > > > > >
> > > > > > > While there seems no controversy about need of such
> > > > > > > functionality, there seems to be no agreement on what would be
> the best API for that.
> > > > > > > So I am requesting for TB input on that matter.
> > > > > > >
> > > > > > > Series structure:
> > > > > > > - patch #1 - intorduce basic data structures to be used by sync API
> > > > > > >   (no controversy here, I hope ..)
> > > > > > >   [RFC 1/4] cpu-crypto: Introduce basic data structures
> > > > > > > - patch #2 - Intel initial approach for new API (via rte_security)
> > > > > > >   [RFC 2/4] security: introduce cpu-crypto API
> > > > > > > - patch #3 - approach that reuses existing rte_cryptodev API as
> much as
> > > > > > >   possible
> > > > > > >   [RFC 3/4] cryptodev: introduce cpu-crypto API
> > > > > > > - patch #4 - approach via introducing new session data structure
> and API
> > > > > > >   [RFC 4/4] cryptodev: introduce rte_crypto_cpu_sym_session
> > > > > > > API
> > > > > > >
> > > > > > > Patches 2,3,4 are mutually exclusive, and we probably have
> > > > > > > to choose which one to go forward with.
> > > > > > > I put some explanations in each of the patches, hopefully
> > > > > > > that will help to understand pros and cons of each one.
> > > > > > >
> > > > > > > Akhil strongly supports #3, AFAIK mainly because it allows
> > > > > > > PMDs to reuse existing API and minimize API level changes.
> > > > > >
> > > > > > IMO, from application perspective, it should not matter who
> > > > > > (CPU or an accelerator) does the crypto functionality. It just
> > > > > > needs to
> > > know
> > > > if the result will be returned synchronously or asynchronously.
> > > > >
> > > > > We already have asymmetric and symmetric APIs.
> > > > > Here you are proposing a third method: symmetric without mbuf
> > > > > for CPU PMDs
> > > >
> > > > Sorry, for this garbage, I am mixing synchronous/asynchronous and
> symmetric/asymmetric.
> > > >
> > > > > > > My favorite is #4, #2 is less preferable but ok too.
> > > > > > > #3 seems problematic to me by the reasons I outlined in #4 patch
> description.
> > > > > > >
> > > > > > > Please provide your opinion.
> > > > >
> > > > > It means the API is not PMD agnostic, right?
> > >
> > > Probably not...
> > > Because inside DPDK we don't have any other abstraction for SW
> > > crypto-libs except vdev, we do need dev_id to get session initialization
> point.
> > > After that I believe all operations can be session based.
> > >
> > > > So the question is to know if a synchronous API will be implemented
> only for CPU virtual PMDs?
> > >
> > > I don't expect lookaside devices to benefit from sync mode.
> > > I think performance penalty would be too high.
> >
> > After another thought, if some lookaside PMD would like to support
> > such API - I think it is still possible: dev_id (or just pointer to
> > internal dev/queue structure) can be stored inside the session itself.
> > Though I really doubt any lookaside PMD would be interested in such
> mode.

[Hemant] Lookaside PMDs may be interested but may not be in synchronous nature, but for raw buffers processing.

e.g. I see a use-case to support crypto without forcing to use crypto_ops or mbufs i.e. use plain buffers.

So, I want to take advantage of similar APIs, just extend an option to show that it is async in process.
And then overload existing or add an API to get new such raw crypto process API.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [dpdk-dev] [RFC 3/4] cryptodev: introduce cpu-crypto API
  2019-11-05 21:41   ` Akhil Goyal
@ 2019-11-06 14:49     ` Ananyev, Konstantin
  0 siblings, 0 replies; 18+ messages in thread
From: Ananyev, Konstantin @ 2019-11-06 14:49 UTC (permalink / raw)
  To: Akhil Goyal, dev, techboard; +Cc: Zhang, Roy Fan, Doherty, Declan

Hi Akhil,


> > This patch extends rte_cryptodev API with CPU-CRYPTO mode.
> > This is done by reusing existing rte_crypto_sym_session structure itself
> > and related control-path cryptodev API (init/clear/get_size/etc.)
> >  For data-path new sym_cpu_ process() function is added into
> > rte_cryptodev dev_ops.
> >
> > Crypto PMD that wants to support that functionality would need to:
> > 1. claim RTE_CRYPTODEV_FF_SYM_CPU_CRYPTO capability supported.
> > 2. change at least the following functions inside rte_cryptodev_ops:
> > 	. sym_session_get_size,
> > 	. sym_session_configure,
> > 	. sym_session_clear
> > to accommodate support for both sync and async modes,
> > 3. implement new function inside rte_cryptodev_ops:
> > 	sym_cpu_process
> >
> > For data-path processing consumer of that API would have to maintain:
> > 	struct rte_cryptodev_sym_session *sess,
> > 	list of dev ids for which this session was properly initialized
> >
> > As an advantage of this approach - reuse of existing API
> > and minimal visible changes for crypto PMDs.
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
> >  lib/librte_cryptodev/rte_crypto_sym.h    | 11 ++++++++++-
> >  lib/librte_cryptodev/rte_cryptodev.c     | 14 ++++++++++++++
> >  lib/librte_cryptodev/rte_cryptodev.h     | 24 ++++++++++++++++++++++++
> >  lib/librte_cryptodev/rte_cryptodev_pmd.h | 22 ++++++++++++++++++++++
> >  4 files changed, 70 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_cryptodev/rte_crypto_sym.h
> > b/lib/librte_cryptodev/rte_crypto_sym.h
> > index d8d9e9514..790c77524 100644
> > --- a/lib/librte_cryptodev/rte_crypto_sym.h
> > +++ b/lib/librte_cryptodev/rte_crypto_sym.h
> > @@ -166,6 +166,10 @@ struct rte_crypto_cipher_xform {
> >  	 *  - Both keys must have the same size.
> >  	 **/
> >
> > +	/**
> > +         * CPU-CRYPTO specific data, should be set properly when
> > +	 * (xform->type & RTE_CRYPTO_SYM_CPU_CRYPTO) != 0, otherwise
> > ignored.
> > +	 */
> >  	struct {
> >  		/**
> >  		 * offset for cipher to start within user provided data buffer.
> 
> Earlier I was ok to have this offset but on another thought, why do you need this?
> User can give the exact pointer in the process() API from where he wants to do ciphering.
> You will be adding this offset in the driver if you keep this in xform/session. So I think there is
> no difference whether you add in the driver or the application. Am I missing something?

At least for ipsec this value is always constant for session (usually ESP header + IV). 
So seems better to set it once inside session, instead of passing an
array of same constant values for each process() call.

> 
> > @@ -415,6 +419,10 @@ struct rte_crypto_aead_xform {
> >  		uint16_t length;	/**< key length in bytes */
> >  	} __attribute__((__packed__)) key;
> >
> > +	/**
> > +         * CPU-CRYPTO specific data, should be set properly when
> > +	 * (xform->type & RTE_CRYPTO_SYM_CPU_CRYPTO) != 0, otherwise
> > ignored.
> > +	 */
> >  	struct {
> >  		/**
> >  		 * offset for cipher to start within user provided data buffer.
> > @@ -471,7 +479,8 @@ enum rte_crypto_sym_xform_type {
> >  	RTE_CRYPTO_SYM_XFORM_NOT_SPECIFIED = 0,	/**< No xform
> > specified */
> >  	RTE_CRYPTO_SYM_XFORM_AUTH,		/**< Authentication
> > xform */
> >  	RTE_CRYPTO_SYM_XFORM_CIPHER,		/**< Cipher xform  */
> > -	RTE_CRYPTO_SYM_XFORM_AEAD		/**< AEAD xform  */
> > +	RTE_CRYPTO_SYM_XFORM_AEAD,		/**< AEAD xform  */
> > +	RTE_CRYPTO_SYM_CPU_CRYPTO = INT32_MIN,  /**< xform for cpu-
> > crypto */
> 
> This is not a correct place to have this. All types of xforms CIPHER/AUTH/AEAD
> can be used in SYNC mode

The intention is to use it as a flag.
For async mode only we just do let say
.type = RTE_CRYPTO_SYM_XFORM_AEAD;
(as we do now)
For async+sync modes:
.type = RTE_CRYPTO_SYM_XFORM_AEAD | RTE_CRYPTO_SYM_CPU_CRYPTO;

> 
> >  };
> >
> >  /**
> > diff --git a/lib/librte_cryptodev/rte_cryptodev.c
> > b/lib/librte_cryptodev/rte_cryptodev.c
> > index 89aa2ed3e..b1dbaf4c1 100644
> > --- a/lib/librte_cryptodev/rte_cryptodev.c
> > +++ b/lib/librte_cryptodev/rte_cryptodev.c
> > @@ -1616,6 +1616,20 @@ rte_cryptodev_sym_session_get_user_data(
> >  	return (void *)(sess->sess_data + sess->nb_drivers);
> >  }
> >
> > +__rte_experimental
> > +int
> > +rte_cryptodev_sym_cpu_crypto_process(uint8_t dev_id,
> > +	struct rte_cryptodev_sym_session *sess, struct rte_crypto_sym_vec
> > *vec,
> > +	int32_t status[], uint32_t num)
> > +{
> > +	struct rte_cryptodev *dev;
> > +
> > +	dev = rte_cryptodev_pmd_get_dev(dev_id);
> > +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->sym_cpu_process,-
> > ENOTSUP);
> > +
> > +	return dev->dev_ops->sym_cpu_process(dev, sess, vec, status, num);
> > +}
> > +
> >  /** Initialise rte_crypto_op mempool element */
> >  static void
> >  rte_crypto_op_init(struct rte_mempool *mempool,
> > diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> > b/lib/librte_cryptodev/rte_cryptodev.h
> > index c6ffa3b35..24877006c 100644
> > --- a/lib/librte_cryptodev/rte_cryptodev.h
> > +++ b/lib/librte_cryptodev/rte_cryptodev.h
> > @@ -450,6 +450,8 @@ rte_cryptodev_asym_get_xform_enum(enum
> > rte_crypto_asym_xform_type *xform_enum,
> >  /**< Support encrypted-digest operations where digest is appended to data */
> >  #define RTE_CRYPTODEV_FF_ASYM_SESSIONLESS		(1ULL << 20)
> >  /**< Support asymmetric session-less operations */
> > +#define	RTE_CRYPTODEV_FF_SYM_CPU_CRYPTO
> > 	(1ULL << 21)
> > +/**< Support symmeteric cpu-crypto processing */
> >
> >
> >  /**
> > @@ -1274,6 +1276,28 @@ void *
> >  rte_cryptodev_sym_session_get_user_data(
> >  					struct rte_cryptodev_sym_session
> > *sess);
> >
> > +/**
> > + * Perform actual crypto processing (encrypt/digest or auth/decrypt)
> > + * on user provided data.
> > + *
> > + * @param	dev_id	The device identifier.
> > + * @param	sess	Cryptodev session structure
> > + * @param	vec	Array of vectors for input data
> > + * @param	status	Array of status values (one per vec)
> > + *			(RTE_CRYPTO_OP_STATUS_* values)
> > + * @param	num	Number of elems in vec and status arrays.
> > + *
> > + * @return
> > + *  - Returns negative errno value on error, or non-negative number
> > + *    of successfully processed input vectors.
> > + *
> > +*/
> > +__rte_experimental
> > +int
> > +rte_cryptodev_sym_cpu_crypto_process(uint8_t dev_id,
> > +	struct rte_cryptodev_sym_session *sess, struct rte_crypto_sym_vec
> > *vec,
> > +	int32_t status[], uint32_t num);
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > index fba14f2fa..02e7a19ae 100644
> > --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> > @@ -308,6 +308,26 @@ typedef void (*cryptodev_sym_free_session_t)(struct
> > rte_cryptodev *dev,
> >   */
> >  typedef void (*cryptodev_asym_free_session_t)(struct rte_cryptodev *dev,
> >  		struct rte_cryptodev_asym_session *sess);
> > +/**
> > + * Perform actual crypto processing (encrypt/digest or auth/decrypt)
> > + * on user provided data.
> > + *
> > + * @param	dev		Crypto device pointer
> > + * @param	sess	Cryptodev session structure
> > + * @param	vec	Array of vectors for input data
> > + * @param	status	Array of status values (one per vec)
> > + *			(RTE_CRYPTO_OP_STATUS_* values)
> > + * @param	num	Number of elems in vec and status arrays.
> > + *
> > + * @return
> > + *  - Returns negative errno value on error, or non-negative number
> > + *    of successfully processed input vectors.
> > + *
> > +*/
> > +typedef int (*cryptodev_sym_cpu_crypto_process_t)(struct rte_cryptodev *dev,
> > +	struct rte_cryptodev_sym_session *sess, struct rte_crypto_sym_vec
> > *vec,
> > +	int32_t status[], uint32_t num);
> > +
> >
> >  /** Crypto device operations function pointer table */
> >  struct rte_cryptodev_ops {
> > @@ -342,6 +362,8 @@ struct rte_cryptodev_ops {
> >  	/**< Clear a Crypto sessions private data. */
> >  	cryptodev_asym_free_session_t asym_session_clear;
> >  	/**< Clear a Crypto sessions private data. */
> > +	cryptodev_sym_cpu_crypto_process_t sym_cpu_process;
> > +	/**< process input data synchronously (cpu-crypto). */
> >  };
> >
> >
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [dpdk-dev] [dpdk-techboard] [RFC 0/4] cpu-crypto API choices
  2019-11-06 12:18           ` Thomas Monjalon
  2019-11-06 12:22             ` Hemant Agrawal
@ 2019-11-06 15:19             ` Ananyev, Konstantin
  1 sibling, 0 replies; 18+ messages in thread
From: Ananyev, Konstantin @ 2019-11-06 15:19 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: techboard, Honnappa Nagarahalli, dev, Zhang, Roy Fan, Doherty,
	Declan, Akhil.goyal@nxp.com, nd



> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Wednesday, November 6, 2019 12:19 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: techboard@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; dev@dpdk.org; Zhang, Roy Fan
> <roy.fan.zhang@intel.com>; Doherty, Declan <declan.doherty@intel.com>; Akhil.goyal@nxp.com; nd <nd@arm.com>
> Subject: Re: [dpdk-techboard] [RFC 0/4] cpu-crypto API choices
> 
> 06/11/2019 12:33, Ananyev, Konstantin:
> >
> > > > > > > Originally both SW and HW crypto PMDs use rte_crypot_op based API to
> > > > > > > process the crypto workload asynchronously. This way provides uniformity to
> > > > > > > both PMD types, but also introduce unnecessary performance penalty to SW
> > > > > > > PMDs that have to "simulate" HW async behavior (crypto-ops
> > > > > > > enqueue/dequeue, HW addresses computations, storing/dereferencing user
> > > > > > > provided data (mbuf) for each crypto-op, etc).
> > > > > > >
> > > > > > > The aim is to introduce a new optional API for SW crypto-devices to perform
> > > > > > > crypto processing in a synchronous manner.
> > > > > > > As summarized by Akhil, we need a synchronous API to perform crypto
> > > > > > > operations on raw data using SW PMDs, that provides:
> > > > > > >  - no crypto-ops.
> > > > > > >  - avoid using mbufs inside this API, use raw data buffers instead.
> > > > > > >  - no separate enqueue-dequeue, only single process() API for data path.
> > > > > > >  - input data buffers should be grouped by session,
> > > > > > >    i.e. each process() call takes one session and group of input buffers
> > > > > > >    that  belong to that session.
> > > > > > >  - All parameters that are constant accross session, should be stored
> > > > > > >    inside the session itself and reused by all incoming data buffers.
> > > > > > >
> > > > > > > While there seems no controversy about need of such functionality, there
> > > > > > > seems to be no agreement on what would be the best API for that.
> > > > > > > So I am requesting for TB input on that matter.
> > > > > > >
> > > > > > > Series structure:
> > > > > > > - patch #1 - intorduce basic data structures to be used by sync API
> > > > > > >   (no controversy here, I hope ..)
> > > > > > >   [RFC 1/4] cpu-crypto: Introduce basic data structures
> > > > > > > - patch #2 - Intel initial approach for new API (via rte_security)
> > > > > > >   [RFC 2/4] security: introduce cpu-crypto API
> > > > > > > - patch #3 - approach that reuses existing rte_cryptodev API as much as
> > > > > > >   possible
> > > > > > >   [RFC 3/4] cryptodev: introduce cpu-crypto API
> > > > > > > - patch #4 - approach via introducing new session data structure and API
> > > > > > >   [RFC 4/4] cryptodev: introduce rte_crypto_cpu_sym_session API
> > > > > > >
> > > > > > > Patches 2,3,4 are mutually exclusive,
> > > > > > > and we probably have to choose which one to go forward with.
> > > > > > > I put some explanations in each of the patches, hopefully that will help to
> > > > > > > understand pros and cons of each one.
> > > > > > >
> > > > > > > Akhil strongly supports #3, AFAIK mainly because it allows PMDs to reuse
> > > > > > > existing API and minimize API level changes.
> > > > > >
> > > > > > IMO, from application perspective, it should not matter who (CPU or an accelerator) does the crypto functionality. It just needs to
> > > know
> > > > if the result will be returned synchronously or asynchronously.
> > > > >
> > > > > We already have asymmetric and symmetric APIs.
> > > > > Here you are proposing a third method: symmetric without mbuf for CPU PMDs
> > > >
> > > > Sorry, for this garbage, I am mixing synchronous/asynchronous and symmetric/asymmetric.
> > > >
> > > > > > > My favorite is #4, #2 is less preferable but ok too.
> > > > > > > #3 seems problematic to me by the reasons I outlined in #4 patch description.
> > > > > > >
> > > > > > > Please provide your opinion.
> > > > >
> > > > > It means the API is not PMD agnostic, right?
> > >
> > > Probably not...
> > > Because inside DPDK we don't have any other abstraction for SW crypto-libs
> > > except vdev, we do need dev_id to get session initialization point.
> > > After that I believe all operations can be session based.
> > >
> > > > So the question is to know if a synchronous API will be implemented only for CPU virtual PMDs?
> > >
> > > I don't expect lookaside devices to benefit from sync mode.
> > > I think performance penalty would be too high.
> >
> > After another thought, if some lookaside PMD would like to support such API -
> > I think it is still possible: dev_id (or just pointer to internal dev/queue structure)
> > can be stored inside the session itself.
> > Though I really doubt any lookaside PMD would be interested in such mode.
> 
> So what should be the logic in the application?
> How the combo PMD/API is chosen?

Up to the user.
At session creation time user has to choose what session he wants to use.
Then at data-path he can either call async API (enqueue/dequeue)
or sync API (process). 
I expect users who do care about extra perf will choose cpu-crypto mode
when it is available.
Existing apps and apps who'd like to have just one code-path
would stay with async mode and will be unaffected.  

> How does it work with the crypto scheduler?

If we want to add cpu-crypto support to crypto-scheduler PMD,
then changes would be needed anyway, not matter will we choose #3 or #4. 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [dpdk-dev] [RFC 0/4] cpu-crypto API choices
  2019-11-05 18:41 [dpdk-dev] [RFC 0/4] cpu-crypto API choices Konstantin Ananyev
                   ` (4 preceding siblings ...)
  2019-11-06  4:54 ` [dpdk-dev] [dpdk-techboard] [RFC 0/4] cpu-crypto API choices Honnappa Nagarahalli
@ 2019-11-14  5:46 ` Jerin Jacob
  2019-11-18 11:57   ` Ananyev, Konstantin
  5 siblings, 1 reply; 18+ messages in thread
From: Jerin Jacob @ 2019-11-14  5:46 UTC (permalink / raw)
  To: Konstantin Ananyev
  Cc: dpdk-dev, techboard, roy.fan.zhang, declan.doherty, Akhil Goyal

On Wed, Nov 6, 2019 at 12:11 AM Konstantin Ananyev
<konstantin.ananyev@intel.com> wrote:
>
> Originally both SW and HW crypto PMDs use rte_crypot_op based API to
> process the crypto workload asynchronously. This way provides uniformity
> to both PMD types, but also introduce unnecessary performance penalty to
> SW PMDs that have to "simulate" HW async behavior
> (crypto-ops enqueue/dequeue, HW addresses computations,
> storing/dereferencing user provided data (mbuf) for each crypto-op,
> etc).
>
> The aim is to introduce a new optional API for SW crypto-devices
> to perform crypto processing in a synchronous manner.
> As summarized by Akhil, we need a synchronous API to perform crypto
> operations on raw data using SW PMDs, that provides:
>  - no crypto-ops.
>  - avoid using mbufs inside this API, use raw data buffers instead.
>  - no separate enqueue-dequeue, only single process() API for data path.
>  - input data buffers should be grouped by session,
>    i.e. each process() call takes one session and group of input buffers
>    that  belong to that session.
>  - All parameters that are constant accross session, should be stored
>    inside the session itself and reused by all incoming data buffers.
>
> While there seems no controversy about need of such functionality,
> there seems to be no agreement on what would be the best API for that.
> So I am requesting for TB input on that matter.
>
> Series structure:
> - patch #1 - intorduce basic data structures to be used by sync API
>   (no controversy here, I hope ..)
>   [RFC 1/4] cpu-crypto: Introduce basic data structures
> - patch #2 - Intel initial approach for new API (via rte_security)
>   [RFC 2/4] security: introduce cpu-crypto API
> - patch #3 - approach that reuses existing rte_cryptodev API as much as
>   possible
>   [RFC 3/4] cryptodev: introduce cpu-crypto API
> - patch #4 - approach via introducing new session data structure and API
>   [RFC 4/4] cryptodev: introduce rte_crypto_cpu_sym_session API
>
> Patches 2,3,4 are mutually exclusive,
> and we probably have to choose which one to go forward with.
> I put some explanations in each of the patches, hopefully that will help
> to  understand pros and cons of each one.
>
> Akhil strongly supports #3, AFAIK mainly because it allows PMDs to
> reuse existing API and minimize API level changes.
> My favorite is #4, #2 is less preferable but ok too.
> #3 seems problematic to me by the reasons I outlined in #4 patch
> description.
>
> Please provide your opinion.

I spend some time on the proposal and I agree that sync API is needed
and it makes sense to remove queue emulation and allocating/freeing
the crypto_ops
in case of sync API.

# I would prefer to not duplicate the session. If the newly added
fields are for optimization
then those can be applicable for HW too. For example, if we consider,
offset to be
constant for one session HW PMD will be able to leverage this. ref:
rte_crypto_aead_xfrom::cpu_crypto:offset

# I would prefer to not duplicate ops parameters, instead of the
existing rte_crypto_ops  can be updated.
I see that most members introduced in rte_crypto_sym_vec &
rte_crypto_vec are already existing in rte_crypto_op.

Also, since we are agreeing that the ops for SYNC API can be from
stack/one time allocated, the size shouldn't matter.

I understand that this would cause ABI breakage, but for this release,
we can work together and add some reserved fields
that we can implement later. I believe that's the reason why you want
to introduce new structures. I think that will bloat
the existing crypto lib.

If I understand it correctly, this will be used in conjunction with
IXGBE to handle fragmented IPsec traffic. If that's the fundamental
reasoning, then there is an alternate path possible. Currently, the
issue is, rte_security doesn't define the treatment for fragmented
packets. Maybe let's define it and then a similar CPU crypto
processing can be done inside the PMD. By creating an internal
function in S/W PMDs and calling it from the inline crypto enabled eth
PMDs, fragmentation support for inline crypto devices can
be achieved. This way the application would look clean. All the
fragmentation related configuration (no of fragmentation contexts
needed,
reassembly timeout etc) need to be added in rte_security library and
the result for that operation will come as dynamic fields in the mbuf.

Just my 2c.







>
> Konstantin Ananyev (4):
>   cpu-crypto: Introduce basic data structures
>   security: introduce cpu-crypto API
>   cryptodev: introduce cpu-crypto API
>   cryptodev: introduce rte_crypto_cpu_sym_session API
>
>  lib/librte_cryptodev/rte_crypto_sym.h     | 63 +++++++++++++++++++++--
>  lib/librte_cryptodev/rte_cryptodev.c      | 14 +++++
>  lib/librte_cryptodev/rte_cryptodev.h      | 24 +++++++++
>  lib/librte_cryptodev/rte_cryptodev_pmd.h  | 22 ++++++++
>  lib/librte_security/rte_security.c        | 11 ++++
>  lib/librte_security/rte_security.h        | 28 +++++++++-
>  lib/librte_security/rte_security_driver.h | 20 +++++++
>  7 files changed, 177 insertions(+), 5 deletions(-)
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [dpdk-dev] [RFC 0/4] cpu-crypto API choices
  2019-11-14  5:46 ` [dpdk-dev] " Jerin Jacob
@ 2019-11-18 11:57   ` Ananyev, Konstantin
  2019-11-20 14:27     ` Jerin Jacob
  0 siblings, 1 reply; 18+ messages in thread
From: Ananyev, Konstantin @ 2019-11-18 11:57 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dpdk-dev, techboard, Zhang, Roy Fan, Doherty, Declan, Akhil Goyal

Hi Jerin,

Thanks for input, my answers inline.
Other guys - please provide your input.
Thanks
Konstantin

> > Originally both SW and HW crypto PMDs use rte_crypot_op based API to
> > process the crypto workload asynchronously. This way provides uniformity
> > to both PMD types, but also introduce unnecessary performance penalty to
> > SW PMDs that have to "simulate" HW async behavior
> > (crypto-ops enqueue/dequeue, HW addresses computations,
> > storing/dereferencing user provided data (mbuf) for each crypto-op,
> > etc).
> >
> > The aim is to introduce a new optional API for SW crypto-devices
> > to perform crypto processing in a synchronous manner.
> > As summarized by Akhil, we need a synchronous API to perform crypto
> > operations on raw data using SW PMDs, that provides:
> >  - no crypto-ops.
> >  - avoid using mbufs inside this API, use raw data buffers instead.
> >  - no separate enqueue-dequeue, only single process() API for data path.
> >  - input data buffers should be grouped by session,
> >    i.e. each process() call takes one session and group of input buffers
> >    that  belong to that session.
> >  - All parameters that are constant accross session, should be stored
> >    inside the session itself and reused by all incoming data buffers.
> >
> > While there seems no controversy about need of such functionality,
> > there seems to be no agreement on what would be the best API for that.
> > So I am requesting for TB input on that matter.
> >
> > Series structure:
> > - patch #1 - intorduce basic data structures to be used by sync API
> >   (no controversy here, I hope ..)
> >   [RFC 1/4] cpu-crypto: Introduce basic data structures
> > - patch #2 - Intel initial approach for new API (via rte_security)
> >   [RFC 2/4] security: introduce cpu-crypto API
> > - patch #3 - approach that reuses existing rte_cryptodev API as much as
> >   possible
> >   [RFC 3/4] cryptodev: introduce cpu-crypto API
> > - patch #4 - approach via introducing new session data structure and API
> >   [RFC 4/4] cryptodev: introduce rte_crypto_cpu_sym_session API
> >
> > Patches 2,3,4 are mutually exclusive,
> > and we probably have to choose which one to go forward with.
> > I put some explanations in each of the patches, hopefully that will help
> > to  understand pros and cons of each one.
> >
> > Akhil strongly supports #3, AFAIK mainly because it allows PMDs to
> > reuse existing API and minimize API level changes.
> > My favorite is #4, #2 is less preferable but ok too.
> > #3 seems problematic to me by the reasons I outlined in #4 patch
> > description.
> >
> > Please provide your opinion.
> 
> I spend some time on the proposal and I agree that sync API is needed
> and it makes sense to remove queue emulation and allocating/freeing
> the crypto_ops
> in case of sync API.
> 
> # I would prefer to not duplicate the session. If the newly added
> fields are for optimization
> then those can be applicable for HW too. For example, if we consider,
> offset to be
> constant for one session HW PMD will be able to leverage this. ref:
> rte_crypto_aead_xfrom::cpu_crypto:offset

It might, but right for async API we pass this info in crypto_op instead.
So if I get you right your preference is sort of #3 approach
that reuses existing rte_cryptodev API as much as possible:
reuse existing rte_cryptodev_sym structure with new sync process() API?
 
> # I would prefer to not duplicate ops parameters, instead of the
> existing rte_crypto_ops  can be updated.
> I see that most members introduced in rte_crypto_sym_vec &
> rte_crypto_vec are already existing in rte_crypto_op.

rte_crypto_ops is way too generic/excessive.
Filling/reading it seems one of the main slowdowns that  we trying to
avoid in new API. 

> 
> Also, since we are agreeing that the ops for SYNC API can be from
> stack/one time allocated, the size shouldn't matter.

I can be on stack, but it means user will still have to fill them
and PMD will have to read/process/overwrite them. 
 
> I understand that this would cause ABI breakage, but for this release,
> we can work together and add some reserved fields
> that we can implement later. I believe that's the reason why you want
> to introduce new structures. I think that will bloat
> the existing crypto lib.

It will increase the lib code, but I don't think it will be significant.
Honestly, I think messing with crypto_op and other existing structures
might have much more negative effect. 
 
> If I understand it correctly, this will be used in conjunction with
> IXGBE to handle fragmented IPsec traffic. If that's the fundamental
> reasoning, then there is an alternate path possible.

No, it's just one of the use-case.
Pretty important, but not the only one.
The main reason - current cryptodev API (crypto_op based) is suboptimal for SW based PMDs.
We wasting too many cycles to pretend that it is a lookaside device underneath.
I think makes more sense to admit that it is SW based and exploit it nature,
instead of trying to hide it.

> Currently, the  issue is, rte_security doesn't define the treatment for fragmented
> packets. Maybe let's define it and then a similar CPU crypto
> processing can be done inside the PMD. By creating an internal
> function in S/W PMDs and calling it from the inline crypto enabled eth
> PMDs, fragmentation support for inline crypto devices can
> be achieved. This way the application would look clean. All the
> fragmentation related configuration (no of fragmentation contexts
> needed,
> reassembly timeout etc) need to be added in rte_security library and
> the result for that operation will come as dynamic fields in the mbuf.
> 
> Just my 2c.
> 
 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [dpdk-dev] [RFC 0/4] cpu-crypto API choices
  2019-11-18 11:57   ` Ananyev, Konstantin
@ 2019-11-20 14:27     ` Jerin Jacob
  0 siblings, 0 replies; 18+ messages in thread
From: Jerin Jacob @ 2019-11-20 14:27 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: dpdk-dev, techboard, Zhang, Roy Fan, Doherty, Declan, Akhil Goyal

On Mon, Nov 18, 2019 at 5:27 PM Ananyev, Konstantin
<konstantin.ananyev@intel.com> wrote:
>
> Hi Jerin,

Hi Konstantin,

>
> Thanks for input, my answers inline.
> Other guys - please provide your input.
> Thanks
> Konstantin
>
> > > Originally both SW and HW crypto PMDs use rte_crypot_op based API to
> > > process the crypto workload asynchronously. This way provides uniformity
> > > to both PMD types, but also introduce unnecessary performance penalty to
> > > SW PMDs that have to "simulate" HW async behavior
> > > (crypto-ops enqueue/dequeue, HW addresses computations,
> > > storing/dereferencing user provided data (mbuf) for each crypto-op,
> > > etc).
> > >
> > > The aim is to introduce a new optional API for SW crypto-devices
> > > to perform crypto processing in a synchronous manner.
> > > As summarized by Akhil, we need a synchronous API to perform crypto
> > > operations on raw data using SW PMDs, that provides:
> > >  - no crypto-ops.
> > >  - avoid using mbufs inside this API, use raw data buffers instead.
> > >  - no separate enqueue-dequeue, only single process() API for data path.
> > >  - input data buffers should be grouped by session,
> > >    i.e. each process() call takes one session and group of input buffers
> > >    that  belong to that session.
> > >  - All parameters that are constant accross session, should be stored
> > >    inside the session itself and reused by all incoming data buffers.
> > >
> > > While there seems no controversy about need of such functionality,
> > > there seems to be no agreement on what would be the best API for that.
> > > So I am requesting for TB input on that matter.
> > >
> > > Series structure:
> > > - patch #1 - intorduce basic data structures to be used by sync API
> > >   (no controversy here, I hope ..)
> > >   [RFC 1/4] cpu-crypto: Introduce basic data structures
> > > - patch #2 - Intel initial approach for new API (via rte_security)
> > >   [RFC 2/4] security: introduce cpu-crypto API
> > > - patch #3 - approach that reuses existing rte_cryptodev API as much as
> > >   possible
> > >   [RFC 3/4] cryptodev: introduce cpu-crypto API
> > > - patch #4 - approach via introducing new session data structure and API
> > >   [RFC 4/4] cryptodev: introduce rte_crypto_cpu_sym_session API
> > >
> > > Patches 2,3,4 are mutually exclusive,
> > > and we probably have to choose which one to go forward with.
> > > I put some explanations in each of the patches, hopefully that will help
> > > to  understand pros and cons of each one.
> > >
> > > Akhil strongly supports #3, AFAIK mainly because it allows PMDs to
> > > reuse existing API and minimize API level changes.
> > > My favorite is #4, #2 is less preferable but ok too.
> > > #3 seems problematic to me by the reasons I outlined in #4 patch
> > > description.
> > >
> > > Please provide your opinion.
> >
> > I spend some time on the proposal and I agree that sync API is needed
> > and it makes sense to remove queue emulation and allocating/freeing
> > the crypto_ops
> > in case of sync API.
> >
> > # I would prefer to not duplicate the session. If the newly added
> > fields are for optimization
> > then those can be applicable for HW too. For example, if we consider,
> > offset to be
> > constant for one session HW PMD will be able to leverage this. ref:
> > rte_crypto_aead_xfrom::cpu_crypto:offset
>
> It might, but right for async API we pass this info in crypto_op instead.
> So if I get you right your preference is sort of #3 approach
> that reuses existing rte_cryptodev API as much as possible:
> reuse existing rte_cryptodev_sym structure with new sync process() API?

Yes.

> > # I would prefer to not duplicate ops parameters, instead of the
> > existing rte_crypto_ops  can be updated.
> > I see that most members introduced in rte_crypto_sym_vec &
> > rte_crypto_vec are already existing in rte_crypto_op.
>
> rte_crypto_ops is way too generic/excessive.
> Filling/reading it seems one of the main slowdowns that  we trying to
> avoid in new API.

It does not look like it is going over 1 CL. Regarding the filling
case, I think,
We need to form the rte_crypto_ops in the slow path and change only in
mutable fields need to update per packet.

> >
> > Also, since we are agreeing that the ops for SYNC API can be from
> > stack/one time allocated, the size shouldn't matter.
>
> I can be on stack, but it means user will still have to fill them
> and PMD will have to read/process/overwrite them.
>
> > I understand that this would cause ABI breakage, but for this release,
> > we can work together and add some reserved fields
> > that we can implement later. I believe that's the reason why you want
> > to introduce new structures. I think that will bloat
> > the existing crypto lib.
>
> It will increase the lib code, but I don't think it will be significant.
> Honestly, I think messing with crypto_op and other existing structures
> might have much more negative effect.

Yes. We need to change it carefully.

>
> > If I understand it correctly, this will be used in conjunction with
> > IXGBE to handle fragmented IPsec traffic. If that's the fundamental
> > reasoning, then there is an alternate path possible.
>
> No, it's just one of the use-case.
> Pretty important, but not the only one.
> The main reason - current cryptodev API (crypto_op based) is suboptimal for SW based PMDs.
> We wasting too many cycles to pretend that it is a lookaside device underneath.

That I agree. I think, it should be fixed by the process() API.

> I think makes more sense to admit that it is SW based and exploit it nature,
> instead of trying to hide it.

Yes. I thought the separate process() device op will solve the major problems.

This is just my _personal_ opinion.  I leave crypto code contributors
to define specifics of API.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-11-20 14:27 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-05 18:41 [dpdk-dev] [RFC 0/4] cpu-crypto API choices Konstantin Ananyev
2019-11-05 18:41 ` [dpdk-dev] [RFC 1/4] cpu-crypto: Introduce basic data structures Konstantin Ananyev
2019-11-05 18:41 ` [dpdk-dev] [RFC 2/4] security: introduce cpu-crypto API Konstantin Ananyev
2019-11-05 18:41 ` [dpdk-dev] [RFC 3/4] cryptodev: " Konstantin Ananyev
2019-11-05 21:41   ` Akhil Goyal
2019-11-06 14:49     ` Ananyev, Konstantin
2019-11-05 18:41 ` [dpdk-dev] [RFC 4/4] cryptodev: introduce rte_crypto_cpu_sym_session API Konstantin Ananyev
2019-11-06  4:54 ` [dpdk-dev] [dpdk-techboard] [RFC 0/4] cpu-crypto API choices Honnappa Nagarahalli
2019-11-06  9:35   ` Thomas Monjalon
2019-11-06  9:48     ` Thomas Monjalon
2019-11-06 10:14       ` Ananyev, Konstantin
2019-11-06 11:33         ` Ananyev, Konstantin
2019-11-06 12:18           ` Thomas Monjalon
2019-11-06 12:22             ` Hemant Agrawal
2019-11-06 15:19             ` Ananyev, Konstantin
2019-11-14  5:46 ` [dpdk-dev] " Jerin Jacob
2019-11-18 11:57   ` Ananyev, Konstantin
2019-11-20 14:27     ` Jerin Jacob

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).