DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH 00/12] fixes and improvements to CNXK crypto PMD
@ 2024-06-20 14:58 Aakash Sasidharan
  2024-06-20 14:58 ` [PATCH 01/12] common/cnxk: add comments to denote skipped entries Aakash Sasidharan
                   ` (12 more replies)
  0 siblings, 13 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-20 14:58 UTC (permalink / raw)
  Cc: gakhil, jerinj, anoobj, vvelumuri, asasidharan, dev

This series adds improvements to CNXK crypto PMD and fixes aes-gcm zero
length input failure.

Aakash Sasidharan (1):
  crypto/cnxk: fix aes-gcm zero len input cases

Anoob Joseph (11):
  common/cnxk: add comments to denote skipped entries
  crypto/cnxk: update version map file with PMD APIs
  common/cnxk: make inline dev PF func get as idev API
  crypto/cnxk: add flow control in Rx inject path
  crypto/cnxk: use SSO PF func of inline device in inst
  crypto/cnxk: use NEON for Rx inject inst preparation
  crypto/cnxk: remove init of CPT result field in packet
  crypto/cnxk: add dual submission in Rx inject
  crypto/cnxk: update sess pointer for next iteration
  crypto/cnxk: make pack IV variable as const
  crypto/cnxk: enable dual submission to CPT

 drivers/common/cnxk/roc_ae.c              |   6 +-
 drivers/common/cnxk/roc_ae_fpm_tables.c   |   6 +-
 drivers/common/cnxk/roc_cpt.c             |  17 +-
 drivers/common/cnxk/roc_cpt.h             |  51 +++--
 drivers/common/cnxk/roc_idev.c            |   6 +
 drivers/common/cnxk/roc_idev.h            |   2 +
 drivers/common/cnxk/roc_nix_inl.h         |   1 -
 drivers/common/cnxk/roc_nix_inl_dev.c     |   6 -
 drivers/common/cnxk/version.map           |   2 +-
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 231 +++++++++-------------
 drivers/crypto/cnxk/cn10k_cryptodev_ops.h |  60 +++++-
 drivers/crypto/cnxk/cnxk_cryptodev.h      |   2 +-
 drivers/crypto/cnxk/cnxk_cryptodev_ops.c  |  40 ++--
 drivers/crypto/cnxk/cnxk_cryptodev_ops.h  |   2 +
 drivers/crypto/cnxk/cnxk_se.h             |  55 +++---
 drivers/crypto/cnxk/rte_pmd_cnxk_crypto.h |   2 +
 drivers/crypto/cnxk/version.map           |   8 +
 drivers/event/cnxk/cnxk_eventdev_adptr.c  |   4 +-
 drivers/net/cnxk/cn10k_ethdev_sec.c       |   2 +-
 drivers/net/cnxk/cnxk_ethdev_telemetry.c  |   3 +-
 20 files changed, 272 insertions(+), 234 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 01/12] common/cnxk: add comments to denote skipped entries
  2024-06-20 14:58 [PATCH 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
@ 2024-06-20 14:58 ` Aakash Sasidharan
  2024-06-20 14:58 ` [PATCH 02/12] crypto/cnxk: update version map file with PMD APIs Aakash Sasidharan
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-20 14:58 UTC (permalink / raw)
  To: Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Harman Kalra
  Cc: gakhil, jerinj, anoobj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Add comments to denote unused table entries.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/common/cnxk/roc_ae.c            | 6 +++---
 drivers/common/cnxk/roc_ae_fpm_tables.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/common/cnxk/roc_ae.c b/drivers/common/cnxk/roc_ae.c
index e6a013d7c4..7ef0efe2b3 100644
--- a/drivers/common/cnxk/roc_ae.c
+++ b/drivers/common/cnxk/roc_ae.c
@@ -151,9 +151,9 @@ const struct roc_ae_ec_group ae_ec_grp[ROC_AE_EC_ID_PMAX] = {
 			     0x3F, 0x00},
 		    .length = 66},
 	},
-	{},
-	{},
-	{},
+	{ /* ROC_AE_EC_ID_P160 */ },
+	{ /* ROC_AE_EC_ID_P320 */ },
+	{ /* ROC_AE_EC_ID_P512 */ },
 	{
 		.prime = {.data = {0xFF, 0xFF, 0xFF, 0xFE, 0xFF, 0xFF, 0xFF,
 				   0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
diff --git a/drivers/common/cnxk/roc_ae_fpm_tables.c b/drivers/common/cnxk/roc_ae_fpm_tables.c
index ead3128e7f..942657b56a 100644
--- a/drivers/common/cnxk/roc_ae_fpm_tables.c
+++ b/drivers/common/cnxk/roc_ae_fpm_tables.c
@@ -1261,9 +1261,9 @@ const struct ae_fpm_entry ae_fpm_tbl_scalar[ROC_AE_EC_ID_PMAX] = {
 		.data = ae_fpm_tbl_p521,
 		.len = sizeof(ae_fpm_tbl_p521)
 	},
-	{},
-	{},
-	{},
+	{ /* ROC_AE_EC_ID_P160 */ },
+	{ /* ROC_AE_EC_ID_P320 */ },
+	{ /* ROC_AE_EC_ID_P512 */ },
 	{
 		.data = ae_fpm_tbl_p256_sm2,
 		.len = sizeof(ae_fpm_tbl_p256_sm2)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 02/12] crypto/cnxk: update version map file with PMD APIs
  2024-06-20 14:58 [PATCH 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
  2024-06-20 14:58 ` [PATCH 01/12] common/cnxk: add comments to denote skipped entries Aakash Sasidharan
@ 2024-06-20 14:58 ` Aakash Sasidharan
  2024-06-20 14:58 ` [PATCH 03/12] common/cnxk: make inline dev PF func get as idev API Aakash Sasidharan
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-20 14:58 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Update version map with details of PMD APIs added.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/crypto/cnxk/rte_pmd_cnxk_crypto.h | 2 ++
 drivers/crypto/cnxk/version.map           | 8 ++++++++
 2 files changed, 10 insertions(+)

diff --git a/drivers/crypto/cnxk/rte_pmd_cnxk_crypto.h b/drivers/crypto/cnxk/rte_pmd_cnxk_crypto.h
index 8b0a5ba0f2..eab1243065 100644
--- a/drivers/crypto/cnxk/rte_pmd_cnxk_crypto.h
+++ b/drivers/crypto/cnxk/rte_pmd_cnxk_crypto.h
@@ -23,6 +23,7 @@
  * @return
  *   Pointer to queue pair structure that would be the input to submit APIs.
  */
+__rte_experimental
 void *rte_pmd_cnxk_crypto_qptr_get(uint8_t dev_id, uint16_t qp_id);
 
 /**
@@ -41,6 +42,7 @@ void *rte_pmd_cnxk_crypto_qptr_get(uint8_t dev_id, uint16_t qp_id);
  * @param nb_inst
  *   Number of instructions.
  */
+__rte_experimental
 void rte_pmd_cnxk_crypto_submit(void *qptr, void *inst, uint16_t nb_inst);
 
 #endif /* _PMD_CNXK_CRYPTO_H_ */
diff --git a/drivers/crypto/cnxk/version.map b/drivers/crypto/cnxk/version.map
index 5789a6bfc9..7a77607774 100644
--- a/drivers/crypto/cnxk/version.map
+++ b/drivers/crypto/cnxk/version.map
@@ -1,3 +1,11 @@
+EXPERIMENTAL {
+	global:
+
+	# added in 24.03
+	rte_pmd_cnxk_crypto_submit;
+	rte_pmd_cnxk_crypto_qptr_get;
+};
+
 INTERNAL {
 	global:
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 03/12] common/cnxk: make inline dev PF func get as idev API
  2024-06-20 14:58 [PATCH 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
  2024-06-20 14:58 ` [PATCH 01/12] common/cnxk: add comments to denote skipped entries Aakash Sasidharan
  2024-06-20 14:58 ` [PATCH 02/12] crypto/cnxk: update version map file with PMD APIs Aakash Sasidharan
@ 2024-06-20 14:58 ` Aakash Sasidharan
  2024-06-20 14:58 ` [PATCH 04/12] crypto/cnxk: add flow control in Rx inject path Aakash Sasidharan
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-20 14:58 UTC (permalink / raw)
  To: Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Harman Kalra
  Cc: gakhil, jerinj, anoobj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Inline PF FUNC would be required to set SSO_PF_FUNC in the instruction
for cryptodev Rx inject. Move the API to idev to allow usage of the
same.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/common/cnxk/roc_idev.c           | 6 ++++++
 drivers/common/cnxk/roc_idev.h           | 2 ++
 drivers/common/cnxk/roc_nix_inl.h        | 1 -
 drivers/common/cnxk/roc_nix_inl_dev.c    | 6 ------
 drivers/common/cnxk/version.map          | 2 +-
 drivers/net/cnxk/cn10k_ethdev_sec.c      | 2 +-
 drivers/net/cnxk/cnxk_ethdev_telemetry.c | 3 +--
 7 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/common/cnxk/roc_idev.c b/drivers/common/cnxk/roc_idev.c
index d0307c666c..0778d51d1e 100644
--- a/drivers/common/cnxk/roc_idev.c
+++ b/drivers/common/cnxk/roc_idev.c
@@ -374,3 +374,9 @@ roc_idev_nix_rx_chan_set(uint16_t port, uint16_t chan)
 	if (idev != NULL && port < PLT_MAX_ETHPORTS)
 		__atomic_store_n(&idev->inl_rx_inj_cfg.chan[port], chan, __ATOMIC_RELEASE);
 }
+
+uint16_t
+roc_idev_nix_inl_dev_pffunc_get(void)
+{
+	return nix_inl_dev_pffunc_get();
+}
diff --git a/drivers/common/cnxk/roc_idev.h b/drivers/common/cnxk/roc_idev.h
index 00664eaed6..fc0f7db54e 100644
--- a/drivers/common/cnxk/roc_idev.h
+++ b/drivers/common/cnxk/roc_idev.h
@@ -27,4 +27,6 @@ uint8_t __roc_api roc_idev_nix_rx_inject_get(uint16_t port);
 void __roc_api roc_idev_nix_rx_inject_set(uint16_t port, uint8_t enable);
 uint16_t *__roc_api roc_idev_nix_rx_chan_base_get(void);
 void __roc_api roc_idev_nix_rx_chan_set(uint16_t port, uint16_t chan);
+
+uint16_t __roc_api roc_idev_nix_inl_dev_pffunc_get(void);
 #endif /* _ROC_IDEV_H_ */
diff --git a/drivers/common/cnxk/roc_nix_inl.h b/drivers/common/cnxk/roc_nix_inl.h
index ab0965e512..1a4bf8808c 100644
--- a/drivers/common/cnxk/roc_nix_inl.h
+++ b/drivers/common/cnxk/roc_nix_inl.h
@@ -112,7 +112,6 @@ void __roc_api roc_nix_inl_dev_lock(void);
 void __roc_api roc_nix_inl_dev_unlock(void);
 int __roc_api roc_nix_inl_dev_xaq_realloc(uint64_t aura_handle);
 int __roc_api roc_nix_inl_dev_stats_get(struct roc_nix_stats *stats);
-uint16_t __roc_api roc_nix_inl_dev_pffunc_get(void);
 int __roc_api roc_nix_inl_dev_cpt_setup(bool use_inl_dev_sso);
 int __roc_api roc_nix_inl_dev_cpt_release(void);
 bool __roc_api roc_nix_inl_dev_is_multi_channel(void);
diff --git a/drivers/common/cnxk/roc_nix_inl_dev.c b/drivers/common/cnxk/roc_nix_inl_dev.c
index 60e6a43033..e2bbe3a67b 100644
--- a/drivers/common/cnxk/roc_nix_inl_dev.c
+++ b/drivers/common/cnxk/roc_nix_inl_dev.c
@@ -34,12 +34,6 @@ nix_inl_dev_pffunc_get(void)
 	return 0;
 }
 
-uint16_t
-roc_nix_inl_dev_pffunc_get(void)
-{
-	return nix_inl_dev_pffunc_get();
-}
-
 static void
 nix_inl_selftest_work_cb(uint64_t *gw, void *args, uint32_t soft_exp_event)
 {
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index eac2ea9ff8..f98738d07e 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -112,6 +112,7 @@ INTERNAL {
 	roc_idev_npa_nix_get;
 	roc_idev_num_lmtlines_get;
 	roc_idev_nix_inl_meta_aura_get;
+	roc_idev_nix_inl_dev_pffunc_get;
 	roc_idev_nix_list_get;
 	roc_idev_nix_rx_chan_base_get;
 	roc_idev_nix_rx_chan_set;
@@ -244,7 +245,6 @@ INTERNAL {
 	roc_nix_inl_dev_is_probed;
 	roc_nix_inl_dev_stats_get;
 	roc_nix_inl_dev_lock;
-	roc_nix_inl_dev_pffunc_get;
 	roc_nix_inl_dev_rq;
 	roc_nix_inl_dev_rq_get;
 	roc_nix_inl_dev_rq_put;
diff --git a/drivers/net/cnxk/cn10k_ethdev_sec.c b/drivers/net/cnxk/cn10k_ethdev_sec.c
index b8b0da5ea9..5e509e97d4 100644
--- a/drivers/net/cnxk/cn10k_ethdev_sec.c
+++ b/drivers/net/cnxk/cn10k_ethdev_sec.c
@@ -1360,7 +1360,7 @@ cn10k_eth_sec_rx_inject_config(void *device, uint16_t port_id, bool enable)
 	inj_cfg->io_addr = inl_lf->io_addr;
 	inj_cfg->lmt_base = nix->lmt_base;
 	channel = roc_nix_get_base_chan(nix);
-	pf_func = roc_nix_inl_dev_pffunc_get();
+	pf_func = roc_idev_nix_inl_dev_pffunc_get();
 	inj_cfg->cmd_w0 = pf_func << 48 | inj_match_id << 32 | channel << 4;
 
 	return 0;
diff --git a/drivers/net/cnxk/cnxk_ethdev_telemetry.c b/drivers/net/cnxk/cnxk_ethdev_telemetry.c
index 3027ca4735..a1958185f2 100644
--- a/drivers/net/cnxk/cnxk_ethdev_telemetry.c
+++ b/drivers/net/cnxk/cnxk_ethdev_telemetry.c
@@ -65,8 +65,7 @@ ethdev_tel_handle_info(const char *cmd __rte_unused,
 			info = &eth_info.info;
 			dev = cnxk_eth_pmd_priv(eth_dev);
 			if (dev) {
-				info->inl_dev_pf_func =
-					roc_nix_inl_dev_pffunc_get();
+				info->inl_dev_pf_func = roc_idev_nix_inl_dev_pffunc_get();
 				info->pf_func = roc_nix_get_pf_func(&dev->nix);
 				info->max_mac_entries = dev->max_mac_entries;
 				info->dmac_filter_ena = dev->dmac_filter_enable;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 04/12] crypto/cnxk: add flow control in Rx inject path
  2024-06-20 14:58 [PATCH 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                   ` (2 preceding siblings ...)
  2024-06-20 14:58 ` [PATCH 03/12] common/cnxk: make inline dev PF func get as idev API Aakash Sasidharan
@ 2024-06-20 14:58 ` Aakash Sasidharan
  2024-06-20 14:58 ` [PATCH 05/12] crypto/cnxk: use SSO PF func of inline device in inst Aakash Sasidharan
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-20 14:58 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Add flow control in Rx inject path to avoid over submission to CPT.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index 720b756001..9f1c074925 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -1400,8 +1400,10 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 	struct rte_cryptodev *cdev = dev;
 	union cpt_res_s *hw_res = NULL;
 	struct cpt_inst_s *inst;
+	union cpt_fc_write_s fc;
 	struct cnxk_cpt_vf *vf;
 	struct rte_mbuf *m;
+	uint64_t *fc_addr;
 	uint64_t dptr;
 	int i;
 
@@ -1413,13 +1415,24 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 
 	lmt_base = vf->rx_inj_lmtline.lmt_base;
 	io_addr = vf->rx_inj_lmtline.io_addr;
+	fc_addr = vf->rx_inj_lmtline.fc_addr;
 
 	ROC_LMT_BASE_ID_GET(lmt_base, lmt_id);
 	pf_func = vf->rx_inj_pf_func;
 
+	const uint32_t fc_thresh = vf->rx_inj_lmtline.fc_thresh;
+
 again:
+	fc.u64[0] =
+		rte_atomic_load_explicit((RTE_ATOMIC(uint64_t) *)fc_addr, rte_memory_order_relaxed);
 	inst = (struct cpt_inst_s *)lmt_base;
-	for (i = 0; i < RTE_MIN(CN10K_PKTS_PER_LOOP, nb_pkts); i++) {
+
+	i = 0;
+
+	if (unlikely(fc.s.qsize > fc_thresh))
+		goto exit;
+
+	for (; i < RTE_MIN(CN10K_PKTS_PER_LOOP, nb_pkts); i++) {
 
 		m = pkts[i];
 		sec_sess = (struct cn10k_sec_session *)sess[i];
@@ -1487,6 +1500,7 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 		goto again;
 	}
 
+exit:
 	return count + i;
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 05/12] crypto/cnxk: use SSO PF func of inline device in inst
  2024-06-20 14:58 [PATCH 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                   ` (3 preceding siblings ...)
  2024-06-20 14:58 ` [PATCH 04/12] crypto/cnxk: add flow control in Rx inject path Aakash Sasidharan
@ 2024-06-20 14:58 ` Aakash Sasidharan
  2024-06-20 14:58 ` [PATCH 06/12] crypto/cnxk: use NEON for Rx inject inst preparation Aakash Sasidharan
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-20 14:58 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

RVU PF FUNC of the CPT LF need not be set as the hardware would
determine that. Instead SSO PF FUNC need to be set as inline device so
that critical errors would reach inline device.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 2 +-
 drivers/crypto/cnxk/cnxk_cryptodev.h      | 2 +-
 drivers/crypto/cnxk/cnxk_cryptodev_ops.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index 9f1c074925..f2980399c5 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -1418,7 +1418,7 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 	fc_addr = vf->rx_inj_lmtline.fc_addr;
 
 	ROC_LMT_BASE_ID_GET(lmt_base, lmt_id);
-	pf_func = vf->rx_inj_pf_func;
+	pf_func = vf->rx_inj_sso_pf_func;
 
 	const uint32_t fc_thresh = vf->rx_inj_lmtline.fc_thresh;
 
diff --git a/drivers/crypto/cnxk/cnxk_cryptodev.h b/drivers/crypto/cnxk/cnxk_cryptodev.h
index fffc4a47b4..4000e84a7e 100644
--- a/drivers/crypto/cnxk/cnxk_cryptodev.h
+++ b/drivers/crypto/cnxk/cnxk_cryptodev.h
@@ -22,7 +22,7 @@
  */
 struct cnxk_cpt_vf {
 	struct roc_cpt_lmtline rx_inj_lmtline;
-	uint16_t rx_inj_pf_func;
+	uint16_t rx_inj_sso_pf_func;
 	uint16_t *rx_chan_base;
 	struct roc_cpt cpt;
 	struct rte_cryptodev_capabilities crypto_caps[CNXK_CPT_MAX_CAPS];
diff --git a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
index d7f5780637..51369309c5 100644
--- a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
@@ -483,7 +483,7 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 			goto exit;
 		}
 
-		vf->rx_inj_pf_func = qp->lf.pf_func;
+		vf->rx_inj_sso_pf_func = roc_idev_nix_inl_dev_pffunc_get();
 
 		/* Block the queue for other submissions */
 		qp->pend_q.pq_mask = 0;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 06/12] crypto/cnxk: use NEON for Rx inject inst preparation
  2024-06-20 14:58 [PATCH 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                   ` (4 preceding siblings ...)
  2024-06-20 14:58 ` [PATCH 05/12] crypto/cnxk: use SSO PF func of inline device in inst Aakash Sasidharan
@ 2024-06-20 14:58 ` Aakash Sasidharan
  2024-06-20 14:58 ` [PATCH 07/12] crypto/cnxk: remove init of CPT result field in packet Aakash Sasidharan
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-20 14:58 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Use NEON instructions for Rx inject instruction preparation.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 57 +++++++++++++++++------
 1 file changed, 42 insertions(+), 15 deletions(-)

diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index f2980399c5..d36516735a 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -7,6 +7,7 @@
 #include <rte_event_crypto_adapter.h>
 #include <rte_hexdump.h>
 #include <rte_ip.h>
+#include <rte_vect.h>
 
 #include <ethdev_driver.h>
 
@@ -1390,15 +1391,17 @@ cn10k_cpt_dequeue_burst(void *qptr, struct rte_crypto_op **ops, uint16_t nb_ops)
 	return i;
 }
 
+#if defined(RTE_ARCH_ARM64)
 uint16_t __rte_hot
 cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 				  struct rte_security_session **sess, uint16_t nb_pkts)
 {
-	uint16_t l2_len, pf_func, lmt_id, count = 0;
-	uint64_t lmt_base, lmt_arg, io_addr;
+	uint64_t lmt_base, lmt_arg, io_addr, u64_0, u64_1, l2_len, pf_func;
+	uint64x2_t inst_01, inst_23, inst_45, inst_67;
 	struct cn10k_sec_session *sec_sess;
 	struct rte_cryptodev *cdev = dev;
 	union cpt_res_s *hw_res = NULL;
+	uint16_t lmt_id, count = 0;
 	struct cpt_inst_s *inst;
 	union cpt_fc_write_s fc;
 	struct cnxk_cpt_vf *vf;
@@ -1456,26 +1459,38 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 		hw_res = RTE_PTR_ALIGN_CEIL(hw_res, 16);
 
 		/* Prepare CPT instruction */
-		inst->w0.u64 = 0;
-		inst->w2.u64 = 0;
-		inst->w2.s.rvu_pf_func = pf_func;
-		inst->w3.u64 = (((uint64_t)m + sizeof(struct rte_mbuf)) >> 3) << 3 | 1;
 
-		inst->w4.u64 = sec_sess->inst.w4 | (rte_pktmbuf_pkt_len(m));
+		/* Word 0 and 1 */
+		u64_0 = pf_func << 48 | *(vf->rx_chan_base + m->port) << 4 | (l2_len - 2) << 24 |
+			l2_len << 16;
+		inst_01 = vsetq_lane_u64(u64_0, inst_01, 0);
+		inst_01 = vsetq_lane_u64((uint64_t)hw_res, inst_01, 1);
+		vst1q_u64(&inst->w0.u64, inst_01);
+
+		/* Word 2 and 3 */
+		inst_23 = vdupq_n_u64(0);
+		u64_1 = (((uint64_t)m + sizeof(struct rte_mbuf)) >> 3) << 3 | 1;
+		inst_23 = vsetq_lane_u64(u64_1, inst_23, 1);
+		vst1q_u64(&inst->w2.u64, inst_23);
+
+		/* Word 4 and 5 */
+		u64_0 = sec_sess->inst.w4 | (rte_pktmbuf_pkt_len(m));
+		inst_45 = vsetq_lane_u64(u64_0, inst_45, 0);
 		dptr = (uint64_t)rte_pktmbuf_iova(m);
-		inst->dptr = dptr;
-		inst->rptr = dptr;
+		u64_1 = dptr;
+		inst_45 = vsetq_lane_u64(u64_1, inst_45, 1);
+		vst1q_u64(&inst->w4.u64, inst_45);
 
-		inst->w0.hw_s.chan = *(vf->rx_chan_base + m->port);
-		inst->w0.hw_s.l2_len = l2_len;
-		inst->w0.hw_s.et_offset = l2_len - 2;
+		/* Word 6 and 7 */
+		u64_0 = dptr;
+		u64_1 = sec_sess->inst.w7;
+		inst_67 = vsetq_lane_u64(u64_0, inst_67, 0);
+		inst_67 = vsetq_lane_u64(u64_1, inst_67, 1);
+		vst1q_u64(&inst->w6.u64, inst_67);
 
-		inst->res_addr = (uint64_t)hw_res;
 		rte_atomic_store_explicit((unsigned long __rte_atomic *)&hw_res->u64[0], res.u64[0],
 					  rte_memory_order_relaxed);
 
-		inst->w7.u64 = sec_sess->inst.w7;
-
 		inst += 2;
 	}
 
@@ -1503,6 +1518,18 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 exit:
 	return count + i;
 }
+#else
+uint16_t __rte_hot
+cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
+				  struct rte_security_session **sess, uint16_t nb_pkts)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(pkts);
+	RTE_SET_USED(sess);
+	RTE_SET_USED(nb_pkts);
+	return 0;
+}
+#endif
 
 void
 cn10k_cpt_set_enqdeq_fns(struct rte_cryptodev *dev, struct cnxk_cpt_vf *vf)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 07/12] crypto/cnxk: remove init of CPT result field in packet
  2024-06-20 14:58 [PATCH 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                   ` (5 preceding siblings ...)
  2024-06-20 14:58 ` [PATCH 06/12] crypto/cnxk: use NEON for Rx inject inst preparation Aakash Sasidharan
@ 2024-06-20 14:58 ` Aakash Sasidharan
  2024-06-20 14:58 ` [PATCH 08/12] crypto/cnxk: add dual submission in Rx inject Aakash Sasidharan
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-20 14:58 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

The packet would be posted to CPT only when there is a valid result.
Skip setting of the same.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index d36516735a..1108a8a1da 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -1410,10 +1410,6 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 	uint64_t dptr;
 	int i;
 
-	const union cpt_res_s res = {
-		.cn10k.compcode = CPT_COMP_NOT_DONE,
-	};
-
 	vf = cdev->data->dev_private;
 
 	lmt_base = vf->rx_inj_lmtline.lmt_base;
@@ -1488,9 +1484,6 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 		inst_67 = vsetq_lane_u64(u64_1, inst_67, 1);
 		vst1q_u64(&inst->w6.u64, inst_67);
 
-		rte_atomic_store_explicit((unsigned long __rte_atomic *)&hw_res->u64[0], res.u64[0],
-					  rte_memory_order_relaxed);
-
 		inst += 2;
 	}
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 08/12] crypto/cnxk: add dual submission in Rx inject
  2024-06-20 14:58 [PATCH 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                   ` (6 preceding siblings ...)
  2024-06-20 14:58 ` [PATCH 07/12] crypto/cnxk: remove init of CPT result field in packet Aakash Sasidharan
@ 2024-06-20 14:58 ` Aakash Sasidharan
  2024-06-20 14:58 ` [PATCH 09/12] crypto/cnxk: update sess pointer for next iteration Aakash Sasidharan
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-20 14:58 UTC (permalink / raw)
  To: Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Harman Kalra, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Add dual submission to CPT in Rx inject path.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
Signed-off-by: Vidya Sagar Velumuri <vvelumuri@marvell.com>
---
 drivers/common/cnxk/roc_cpt.h             | 43 +++++++++-----
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 70 +++++++++++++++++------
 drivers/crypto/cnxk/cnxk_cryptodev_ops.c  |  9 +++
 3 files changed, 90 insertions(+), 32 deletions(-)

diff --git a/drivers/common/cnxk/roc_cpt.h b/drivers/common/cnxk/roc_cpt.h
index 3721fa08c0..8ef9062ae0 100644
--- a/drivers/common/cnxk/roc_cpt.h
+++ b/drivers/common/cnxk/roc_cpt.h
@@ -30,23 +30,36 @@
 /* Vector of sizes in the burst of 16 CPT inst except first in 63:19 of
  * APT_LMT_ARG_S
  */
-#define ROC_CN10K_CPT_LMT_ARG                                                  \
-	(ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 0) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 1) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 2) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 3) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 4) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 5) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 6) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 7) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 8) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 9) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 10) |                           \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 11) |                           \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 12) |                           \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 13) |                           \
+#define ROC_CN10K_CPT_LMT_ARG                                                                      \
+	(ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 0) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 1) |     \
+	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 2) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 3) |     \
+	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 4) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 5) |     \
+	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 6) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 7) |     \
+	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 8) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 9) |     \
+	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 10) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 11) |   \
+	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 12) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 13) |   \
 	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 14))
 
+/* Vector of sizes in the burst of 2 * 16 CPT inst except first in 63:19 of
+ * APT_LMT_ARG_S
+ */
+#define ROC_CN10K_DUAL_CPT_LMT_ARG                                                                 \
+	(ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 0) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 1) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 2) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 3) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 4) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 5) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 6) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 7) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 8) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 9) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 10) |                                           \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 11) |                                           \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 12) |                                           \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 13) |                                           \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 14))
+
 /* CPT helper macros */
 #define ROC_CPT_AH_HDR_LEN	12
 #define ROC_CPT_AES_GCM_IV_LEN	8
diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index 1108a8a1da..3fd002d549 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -55,6 +55,54 @@ struct vec_request {
 	uint64_t w2;
 };
 
+static __rte_always_inline void __rte_hot
+cn10k_cpt_lmtst_dual_submit(uint64_t *io_addr, const uint16_t lmt_id, int *i)
+{
+	uint64_t lmt_arg;
+
+	/* Check if the total number of instructions is odd or even. */
+	const int flag_odd = *i & 0x1;
+
+	/* Reduce i by 1 when odd number of instructions.*/
+	*i -= flag_odd;
+
+	if (*i > 2 * CN10K_PKTS_PER_STEORL) {
+		lmt_arg = ROC_CN10K_DUAL_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
+			  (uint64_t)lmt_id;
+		roc_lmt_submit_steorl(lmt_arg, *io_addr);
+		lmt_arg = ROC_CN10K_DUAL_CPT_LMT_ARG | (*i / 2 - CN10K_PKTS_PER_STEORL - 1) << 12 |
+			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
+		roc_lmt_submit_steorl(lmt_arg, *io_addr);
+		if (flag_odd) {
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_CPT_INST_DW_M1 << 4);
+			lmt_arg = (uint64_t)(lmt_id + *i / 2);
+			roc_lmt_submit_steorl(lmt_arg, *io_addr);
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_TWO_CPT_INST_DW_M1 << 4);
+			*i += 1;
+		}
+	} else {
+		if (*i != 0) {
+			lmt_arg =
+				ROC_CN10K_DUAL_CPT_LMT_ARG | (*i / 2 - 1) << 12 | (uint64_t)lmt_id;
+			roc_lmt_submit_steorl(lmt_arg, *io_addr);
+		}
+
+		if (flag_odd) {
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_CPT_INST_DW_M1 << 4);
+			lmt_arg = (uint64_t)(lmt_id + *i / 2);
+			roc_lmt_submit_steorl(lmt_arg, *io_addr);
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_TWO_CPT_INST_DW_M1 << 4);
+			*i += 1;
+		}
+	}
+
+	rte_io_wmb();
+}
+
 static inline struct cnxk_se_sess *
 cn10k_cpt_sym_temp_sess_create(struct cnxk_cpt_qp *qp, struct rte_crypto_op *op)
 {
@@ -1396,7 +1444,7 @@ uint16_t __rte_hot
 cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 				  struct rte_security_session **sess, uint16_t nb_pkts)
 {
-	uint64_t lmt_base, lmt_arg, io_addr, u64_0, u64_1, l2_len, pf_func;
+	uint64_t lmt_base, io_addr, u64_0, u64_1, l2_len, pf_func;
 	uint64x2_t inst_01, inst_23, inst_45, inst_67;
 	struct cn10k_sec_session *sec_sess;
 	struct rte_cryptodev *cdev = dev;
@@ -1431,7 +1479,7 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 	if (unlikely(fc.s.qsize > fc_thresh))
 		goto exit;
 
-	for (; i < RTE_MIN(CN10K_PKTS_PER_LOOP, nb_pkts); i++) {
+	for (; i < RTE_MIN(2 * CN10K_PKTS_PER_LOOP, nb_pkts); i++) {
 
 		m = pkts[i];
 		sec_sess = (struct cn10k_sec_session *)sess[i];
@@ -1484,24 +1532,12 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 		inst_67 = vsetq_lane_u64(u64_1, inst_67, 1);
 		vst1q_u64(&inst->w6.u64, inst_67);
 
-		inst += 2;
-	}
-
-	if (i > CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	} else {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
+		inst++;
 	}
 
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
-	if (nb_pkts - i > 0 && i == CN10K_PKTS_PER_LOOP) {
+	if (nb_pkts - i > 0 && i == 2 * CN10K_PKTS_PER_LOOP) {
 		nb_pkts -= i;
 		pkts += i;
 		count += i;
diff --git a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
index 51369309c5..6acaa4413b 100644
--- a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
@@ -431,6 +431,7 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 	struct rte_pci_device *pci_dev;
 	struct cnxk_cpt_qp *qp;
 	uint32_t nb_desc;
+	uint64_t io_addr;
 	int ret;
 
 	if (dev->data->queue_pairs[qp_id] != NULL)
@@ -485,6 +486,14 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 
 		vf->rx_inj_sso_pf_func = roc_idev_nix_inl_dev_pffunc_get();
 
+		/* Update IO addr to enable dual submission */
+		io_addr = vf->rx_inj_lmtline.io_addr;
+		io_addr = (io_addr & ~(uint64_t)(0x7 << 4)) | ROC_CN10K_TWO_CPT_INST_DW_M1 << 4;
+		vf->rx_inj_lmtline.io_addr = io_addr;
+
+		/* Update FC threshold to reflect dual submission */
+		vf->rx_inj_lmtline.fc_thresh -= 32;
+
 		/* Block the queue for other submissions */
 		qp->pend_q.pq_mask = 0;
 	}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 09/12] crypto/cnxk: update sess pointer for next iteration
  2024-06-20 14:58 [PATCH 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                   ` (7 preceding siblings ...)
  2024-06-20 14:58 ` [PATCH 08/12] crypto/cnxk: add dual submission in Rx inject Aakash Sasidharan
@ 2024-06-20 14:58 ` Aakash Sasidharan
  2024-06-20 14:58 ` [PATCH 10/12] crypto/cnxk: fix aes-gcm zero len input cases Aakash Sasidharan
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-20 14:58 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Update sess pointer while working on next set of packets.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index 3fd002d549..0afd623990 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -1460,6 +1460,8 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 
 	vf = cdev->data->dev_private;
 
+	const int nb_pkts_per_loop = 2 * CN10K_PKTS_PER_LOOP;
+
 	lmt_base = vf->rx_inj_lmtline.lmt_base;
 	io_addr = vf->rx_inj_lmtline.io_addr;
 	fc_addr = vf->rx_inj_lmtline.fc_addr;
@@ -1479,7 +1481,7 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 	if (unlikely(fc.s.qsize > fc_thresh))
 		goto exit;
 
-	for (; i < RTE_MIN(2 * CN10K_PKTS_PER_LOOP, nb_pkts); i++) {
+	for (; i < RTE_MIN(nb_pkts_per_loop, nb_pkts); i++) {
 
 		m = pkts[i];
 		sec_sess = (struct cn10k_sec_session *)sess[i];
@@ -1537,10 +1539,11 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 
 	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
-	if (nb_pkts - i > 0 && i == 2 * CN10K_PKTS_PER_LOOP) {
-		nb_pkts -= i;
-		pkts += i;
-		count += i;
+	if (nb_pkts - i > 0 && i == nb_pkts_per_loop) {
+		nb_pkts -= nb_pkts_per_loop;
+		pkts += nb_pkts_per_loop;
+		count += nb_pkts_per_loop;
+		sess += nb_pkts_per_loop;
 		goto again;
 	}
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 10/12] crypto/cnxk: fix aes-gcm zero len input cases
  2024-06-20 14:58 [PATCH 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                   ` (8 preceding siblings ...)
  2024-06-20 14:58 ` [PATCH 09/12] crypto/cnxk: update sess pointer for next iteration Aakash Sasidharan
@ 2024-06-20 14:58 ` Aakash Sasidharan
  2024-06-20 14:58 ` [PATCH 11/12] crypto/cnxk: make pack IV variable as const Aakash Sasidharan
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-20 14:58 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj, Akhil Goyal
  Cc: jerinj, vvelumuri, asasidharan, dev

For aes-gcm (AEAD) zero length input, sg code path is taken unlike
the digest only cases as AAD is treated as a separate input component.
Fix the zero len case in SG path by avoiding the gather component
only when it is a non AEAD algorithm. Also add sg version check as
the fix only applies to specific model.

Fixes: 4d8166d64988 ("crypto/cnxk: enable digest for zero length input")

Signed-off-by: Aakash Sasidharan <asasidharan@marvell.com>
---
 drivers/crypto/cnxk/cnxk_se.h | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/crypto/cnxk/cnxk_se.h b/drivers/crypto/cnxk/cnxk_se.h
index 6374718a82..63dbef4411 100644
--- a/drivers/crypto/cnxk/cnxk_se.h
+++ b/drivers/crypto/cnxk/cnxk_se.h
@@ -2468,13 +2468,14 @@ fill_sess_gmac(struct rte_crypto_sym_xform *xform, struct cnxk_se_sess *sess)
 }
 
 static __rte_always_inline uint32_t
-prepare_iov_from_pkt(struct rte_mbuf *pkt, struct roc_se_iov_ptr *iovec, uint32_t start_offset)
+prepare_iov_from_pkt(struct rte_mbuf *pkt, struct roc_se_iov_ptr *iovec, uint32_t start_offset,
+		     const bool is_aead, const bool is_sg_ver2)
 {
 	uint16_t index = 0;
 	void *seg_data = NULL;
 	int32_t seg_size = 0;
 
-	if (!pkt || pkt->data_len == 0) {
+	if (!pkt || (is_sg_ver2 && (pkt->data_len == 0) && !is_aead)) {
 		iovec->buf_cnt = 0;
 		return 0;
 	}
@@ -2619,13 +2620,13 @@ fill_sm_params(struct rte_crypto_op *cop, struct cnxk_se_sess *sess,
 		fc_params.dst_iov = (void *)dst;
 
 		/* Store SG I/O in the api for reuse */
-		if (prepare_iov_from_pkt(m_src, fc_params.src_iov, 0)) {
+		if (prepare_iov_from_pkt(m_src, fc_params.src_iov, 0, false, is_sg_ver2)) {
 			plt_dp_err("Prepare src iov failed");
 			ret = -EINVAL;
 			goto err_exit;
 		}
 
-		if (prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0)) {
+		if (prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0, false, is_sg_ver2)) {
 			plt_dp_err("Prepare dst iov failed for m_dst %p", m_dst);
 			ret = -EINVAL;
 			goto err_exit;
@@ -2816,14 +2817,15 @@ fill_fc_params(struct rte_crypto_op *cop, struct cnxk_se_sess *sess,
 		fc_params.dst_iov = (void *)dst;
 
 		/* Store SG I/O in the api for reuse */
-		if (prepare_iov_from_pkt(m_src, fc_params.src_iov, 0)) {
+		if (prepare_iov_from_pkt(m_src, fc_params.src_iov, 0, is_aead, is_sg_ver2)) {
 			plt_dp_err("Prepare src iov failed");
 			ret = -EINVAL;
 			goto err_exit;
 		}
 
 		if (unlikely(m_dst != NULL)) {
-			if (prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0)) {
+			if (prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0, is_aead,
+						 is_sg_ver2)) {
 				plt_dp_err("Prepare dst iov failed for "
 					   "m_dst %p",
 					   m_dst);
@@ -2957,13 +2959,15 @@ fill_pdcp_params(struct rte_crypto_op *cop, struct cnxk_se_sess *sess,
 		fc_params.dst_iov = (void *)dst;
 
 		/* Store SG I/O in the api for reuse */
-		if (unlikely(prepare_iov_from_pkt(m_src, fc_params.src_iov, 0))) {
+		if (unlikely(
+			    prepare_iov_from_pkt(m_src, fc_params.src_iov, 0, false, is_sg_ver2))) {
 			plt_dp_err("Prepare src iov failed");
 			ret = -EINVAL;
 			goto err_exit;
 		}
 
-		if (unlikely(prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0))) {
+		if (unlikely(
+			    prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0, false, is_sg_ver2))) {
 			plt_dp_err("Prepare dst iov failed for m_dst %p", m_dst);
 			ret = -EINVAL;
 			goto err_exit;
@@ -3080,14 +3084,16 @@ fill_pdcp_chain_params(struct rte_crypto_op *cop, struct cnxk_se_sess *sess,
 		fc_params.dst_iov = (void *)dst;
 
 		/* Store SG I/O in the api for reuse */
-		if (unlikely(prepare_iov_from_pkt(m_src, fc_params.src_iov, 0))) {
+		if (unlikely(
+			    prepare_iov_from_pkt(m_src, fc_params.src_iov, 0, false, is_sg_ver2))) {
 			plt_dp_err("Could not prepare src iov");
 			ret = -EINVAL;
 			goto err_exit;
 		}
 
 		if (unlikely(m_dst != NULL)) {
-			if (unlikely(prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0))) {
+			if (unlikely(prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0, false,
+							  is_sg_ver2))) {
 				plt_dp_err("Could not prepare m_dst iov %p", m_dst);
 				ret = -EINVAL;
 				goto err_exit;
@@ -3306,7 +3312,7 @@ fill_digest_params(struct rte_crypto_op *cop, struct cnxk_se_sess *sess,
 	params.src_iov = (void *)src;
 
 	/*Store SG I/O in the api for reuse */
-	if (prepare_iov_from_pkt(m_src, params.src_iov, auth_range_off)) {
+	if (prepare_iov_from_pkt(m_src, params.src_iov, auth_range_off, false, is_sg_ver2)) {
 		plt_dp_err("Prepare src iov failed");
 		ret = -EINVAL;
 		goto free_mdata_and_exit;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 11/12] crypto/cnxk: make pack IV variable as const
  2024-06-20 14:58 [PATCH 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                   ` (9 preceding siblings ...)
  2024-06-20 14:58 ` [PATCH 10/12] crypto/cnxk: fix aes-gcm zero len input cases Aakash Sasidharan
@ 2024-06-20 14:58 ` Aakash Sasidharan
  2024-06-20 14:58 ` [PATCH 12/12] crypto/cnxk: enable dual submission to CPT Aakash Sasidharan
  2024-06-24  6:23 ` [PATCH v2 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-20 14:58 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Make 'pack_iv' variable as const to avoid multiple checks.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/crypto/cnxk/cnxk_se.h | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/crypto/cnxk/cnxk_se.h b/drivers/crypto/cnxk/cnxk_se.h
index 63dbef4411..dbd36a8a54 100644
--- a/drivers/crypto/cnxk/cnxk_se.h
+++ b/drivers/crypto/cnxk/cnxk_se.h
@@ -105,7 +105,7 @@ cpt_pack_iv(uint8_t *iv_src, uint8_t *iv_dst)
 }
 
 static inline void
-pdcp_iv_copy(uint8_t *iv_d, const uint8_t *iv_s, const uint8_t pdcp_alg_type, uint8_t pack_iv)
+pdcp_iv_copy(uint8_t *iv_d, const uint8_t *iv_s, const uint8_t pdcp_alg_type, const bool pack_iv)
 {
 	const uint32_t *iv_s_temp;
 	uint32_t iv_temp[4];
@@ -261,7 +261,7 @@ cpt_mac_len_verify(struct rte_crypto_auth_xform *auth)
 
 static __rte_always_inline int
 sg_inst_prep(struct roc_se_fc_params *params, struct cpt_inst_s *inst, uint64_t offset_ctrl,
-	     const uint8_t *iv_s, int iv_len, uint8_t pack_iv, uint8_t pdcp_alg_type,
+	     const uint8_t *iv_s, int iv_len, const bool pack_iv, uint8_t pdcp_alg_type,
 	     int32_t inputlen, int32_t outputlen, uint32_t passthrough_len, uint32_t req_flags,
 	     int pdcp_flag, int decrypt)
 {
@@ -457,7 +457,7 @@ sg_inst_prep(struct roc_se_fc_params *params, struct cpt_inst_s *inst, uint64_t
 
 static __rte_always_inline int
 sg2_inst_prep(struct roc_se_fc_params *params, struct cpt_inst_s *inst, uint64_t offset_ctrl,
-	      const uint8_t *iv_s, int iv_len, uint8_t pack_iv, uint8_t pdcp_alg_type,
+	      const uint8_t *iv_s, int iv_len, const bool pack_iv, uint8_t pdcp_alg_type,
 	      int32_t inputlen, int32_t outputlen, uint32_t passthrough_len, uint32_t req_flags,
 	      int pdcp_flag, int decrypt)
 {
@@ -882,7 +882,7 @@ static inline int
 pdcp_chain_sg1_prep(struct roc_se_fc_params *params, struct roc_se_ctx *cpt_ctx,
 		    struct cpt_inst_s *inst, union cpt_inst_w4 w4, int32_t inputlen,
 		    uint8_t hdr_len, uint64_t offset_ctrl, uint32_t req_flags,
-		    const uint8_t *cipher_iv, const uint8_t *auth_iv, const int pack_iv,
+		    const uint8_t *cipher_iv, const uint8_t *auth_iv, const bool pack_iv,
 		    const uint8_t pdcp_ci_alg, const uint8_t pdcp_auth_alg)
 {
 	struct roc_sglist_comp *scatter_comp, *gather_comp;
@@ -991,7 +991,7 @@ static inline int
 pdcp_chain_sg2_prep(struct roc_se_fc_params *params, struct roc_se_ctx *cpt_ctx,
 		    struct cpt_inst_s *inst, union cpt_inst_w4 w4, int32_t inputlen,
 		    uint8_t hdr_len, uint64_t offset_ctrl, uint32_t req_flags,
-		    const uint8_t *cipher_iv, const uint8_t *auth_iv, const int pack_iv,
+		    const uint8_t *cipher_iv, const uint8_t *auth_iv, const bool pack_iv,
 		    const uint8_t pdcp_ci_alg, const uint8_t pdcp_auth_alg)
 {
 	struct roc_sg2list_comp *gather_comp, *scatter_comp;
@@ -1528,7 +1528,6 @@ cpt_pdcp_chain_alg_prep(uint32_t req_flags, uint64_t d_offs, uint64_t d_lens,
 	struct roc_se_ctx *se_ctx;
 	uint64_t *offset_vaddr;
 	uint64_t offset_ctrl;
-	uint8_t pack_iv = 0;
 	int32_t inputlen;
 	void *dm_vaddr;
 	uint8_t *iv_d;
@@ -1606,10 +1605,10 @@ cpt_pdcp_chain_alg_prep(uint32_t req_flags, uint64_t d_offs, uint64_t d_lens,
 		cpt_inst_w4.s.dlen = inputlen + ROC_SE_OFF_CTRL_LEN;
 
 		iv_d = ((uint8_t *)offset_vaddr + ROC_SE_OFF_CTRL_LEN);
-		pdcp_iv_copy(iv_d, cipher_iv, pdcp_ci_alg, pack_iv);
+		pdcp_iv_copy(iv_d, cipher_iv, pdcp_ci_alg, false);
 
 		iv_d = ((uint8_t *)offset_vaddr + ROC_SE_OFF_CTRL_LEN + pdcp_iv_off);
-		pdcp_iv_copy(iv_d, auth_iv, pdcp_auth_alg, pack_iv);
+		pdcp_iv_copy(iv_d, auth_iv, pdcp_auth_alg, false);
 
 		inst->w4.u64 = cpt_inst_w4.u64;
 		return 0;
@@ -1618,11 +1617,11 @@ cpt_pdcp_chain_alg_prep(uint32_t req_flags, uint64_t d_offs, uint64_t d_lens,
 		if (is_sg_ver2)
 			return pdcp_chain_sg2_prep(params, se_ctx, inst, cpt_inst_w4, inputlen,
 						   hdr_len, offset_ctrl, req_flags, cipher_iv,
-						   auth_iv, pack_iv, pdcp_ci_alg, pdcp_auth_alg);
+						   auth_iv, false, pdcp_ci_alg, pdcp_auth_alg);
 		else
 			return pdcp_chain_sg1_prep(params, se_ctx, inst, cpt_inst_w4, inputlen,
 						   hdr_len, offset_ctrl, req_flags, cipher_iv,
-						   auth_iv, pack_iv, pdcp_ci_alg, pdcp_auth_alg);
+						   auth_iv, false, pdcp_ci_alg, pdcp_auth_alg);
 	}
 }
 
@@ -1647,9 +1646,9 @@ cpt_pdcp_alg_prep(uint32_t req_flags, uint64_t d_offs, uint64_t d_lens,
 	uint64_t *offset_vaddr;
 	uint8_t pdcp_alg_type;
 	uint32_t mac_len = 0;
-	const uint8_t *iv_s;
-	uint8_t pack_iv = 0;
 	uint64_t offset_ctrl;
+	bool pack_iv = false;
+	const uint8_t *iv_s;
 	int ret;
 
 	mac_len = se_ctx->mac_len;
@@ -1671,7 +1670,7 @@ cpt_pdcp_alg_prep(uint32_t req_flags, uint64_t d_offs, uint64_t d_lens,
 		if (pdcp_alg_type != ROC_SE_PDCP_ALG_TYPE_AES_CMAC) {
 
 			if (params->auth_iv_len == 25)
-				pack_iv = 1;
+				pack_iv = true;
 
 			auth_offset = auth_offset / 8;
 			auth_data_len = RTE_ALIGN(auth_data_len, 8) / 8;
@@ -1694,7 +1693,7 @@ cpt_pdcp_alg_prep(uint32_t req_flags, uint64_t d_offs, uint64_t d_lens,
 		pdcp_alg_type = se_ctx->pdcp_ci_alg;
 
 		if (params->cipher_iv_len == 25)
-			pack_iv = 1;
+			pack_iv = true;
 
 		/*
 		 * Microcode expects offsets in bytes
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 12/12] crypto/cnxk: enable dual submission to CPT
  2024-06-20 14:58 [PATCH 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                   ` (10 preceding siblings ...)
  2024-06-20 14:58 ` [PATCH 11/12] crypto/cnxk: make pack IV variable as const Aakash Sasidharan
@ 2024-06-20 14:58 ` Aakash Sasidharan
  2024-06-24  6:23 ` [PATCH v2 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-20 14:58 UTC (permalink / raw)
  To: Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Harman Kalra, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Pavan Nikhilesh, Shijith Thotton
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Submit two instructions in one LMTLINE.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/common/cnxk/roc_cpt.c             |  17 +-
 drivers/common/cnxk/roc_cpt.h             |   8 +-
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 182 +++++-----------------
 drivers/crypto/cnxk/cn10k_cryptodev_ops.h |  60 ++++++-
 drivers/crypto/cnxk/cnxk_cryptodev_ops.c  |  47 ++----
 drivers/crypto/cnxk/cnxk_cryptodev_ops.h  |   2 +
 drivers/event/cnxk/cnxk_eventdev_adptr.c  |   4 +-
 7 files changed, 124 insertions(+), 196 deletions(-)

diff --git a/drivers/common/cnxk/roc_cpt.c b/drivers/common/cnxk/roc_cpt.c
index 9f283ceb2e..aba2a49d19 100644
--- a/drivers/common/cnxk/roc_cpt.c
+++ b/drivers/common/cnxk/roc_cpt.c
@@ -1135,8 +1135,8 @@ roc_cpt_iq_enable(struct roc_cpt_lf *lf)
 }
 
 int
-roc_cpt_lmtline_init(struct roc_cpt *roc_cpt, struct roc_cpt_lmtline *lmtline,
-		     int lf_id)
+roc_cpt_lmtline_init(struct roc_cpt *roc_cpt, struct roc_cpt_lmtline *lmtline, int lf_id,
+		     bool is_dual)
 {
 	struct roc_cpt_lf *lf;
 
@@ -1145,12 +1145,19 @@ roc_cpt_lmtline_init(struct roc_cpt *roc_cpt, struct roc_cpt_lmtline *lmtline,
 		return -ENOTSUP;
 
 	lmtline->io_addr = lf->io_addr;
-	if (roc_model_is_cn10k())
-		lmtline->io_addr |= ROC_CN10K_CPT_INST_DW_M1 << 4;
+	lmtline->fc_thresh = lf->nb_desc - CPT_LF_FC_MIN_THRESHOLD;
+
+	if (roc_model_is_cn10k()) {
+		if (is_dual) {
+			lmtline->io_addr |= ROC_CN10K_TWO_CPT_INST_DW_M1 << 4;
+			lmtline->fc_thresh = lf->nb_desc -  2 * CPT_LF_FC_MIN_THRESHOLD;
+		} else {
+			lmtline->io_addr |= ROC_CN10K_CPT_INST_DW_M1 << 4;
+		}
+	}
 
 	lmtline->fc_addr = lf->fc_addr;
 	lmtline->lmt_base = lf->lmt_base;
-	lmtline->fc_thresh = lf->nb_desc - CPT_LF_FC_MIN_THRESHOLD;
 
 	return 0;
 }
diff --git a/drivers/common/cnxk/roc_cpt.h b/drivers/common/cnxk/roc_cpt.h
index 8ef9062ae0..e2e919f80f 100644
--- a/drivers/common/cnxk/roc_cpt.h
+++ b/drivers/common/cnxk/roc_cpt.h
@@ -200,12 +200,12 @@ int __roc_api roc_cpt_afs_print(struct roc_cpt *roc_cpt);
 int __roc_api roc_cpt_lfs_print(struct roc_cpt *roc_cpt);
 void __roc_api roc_cpt_iq_disable(struct roc_cpt_lf *lf);
 void __roc_api roc_cpt_iq_enable(struct roc_cpt_lf *lf);
-int __roc_api roc_cpt_lmtline_init(struct roc_cpt *roc_cpt,
-				   struct roc_cpt_lmtline *lmtline, int lf_id);
+int __roc_api roc_cpt_lmtline_init(struct roc_cpt *roc_cpt, struct roc_cpt_lmtline *lmtline,
+				   int lf_id, bool is_dual);
 
 void __roc_api roc_cpt_parse_hdr_dump(FILE *file, const struct cpt_parse_hdr_s *cpth);
-int __roc_api roc_cpt_ctx_write(struct roc_cpt_lf *lf, void *sa_dptr,
-				void *sa_cptr, uint16_t sa_len);
+int __roc_api roc_cpt_ctx_write(struct roc_cpt_lf *lf, void *sa_dptr, void *sa_cptr,
+				uint16_t sa_len);
 
 void __roc_api roc_cpt_int_misc_cb_register(roc_cpt_int_misc_cb_t cb, void *args);
 int __roc_api roc_cpt_int_misc_cb_unregister(roc_cpt_int_misc_cb_t cb, void *args);
diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index 0afd623990..f46379b43e 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -12,11 +12,6 @@
 #include <ethdev_driver.h>
 
 #include "roc_cpt.h"
-#if defined(__aarch64__)
-#include "roc_io.h"
-#else
-#include "roc_io_generic.h"
-#endif
 #include "roc_idev.h"
 #include "roc_sso.h"
 #include "roc_sso_dp.h"
@@ -40,8 +35,8 @@
 
 /* Holds information required to send crypto operations in one burst */
 struct ops_burst {
-	struct rte_crypto_op *op[CN10K_PKTS_PER_LOOP];
-	uint64_t w2[CN10K_PKTS_PER_LOOP];
+	struct rte_crypto_op *op[CN10K_CPT_PKTS_PER_LOOP];
+	uint64_t w2[CN10K_CPT_PKTS_PER_LOOP];
 	struct cn10k_sso_hws *ws;
 	struct cnxk_cpt_qp *qp;
 	uint16_t nb_ops;
@@ -55,54 +50,6 @@ struct vec_request {
 	uint64_t w2;
 };
 
-static __rte_always_inline void __rte_hot
-cn10k_cpt_lmtst_dual_submit(uint64_t *io_addr, const uint16_t lmt_id, int *i)
-{
-	uint64_t lmt_arg;
-
-	/* Check if the total number of instructions is odd or even. */
-	const int flag_odd = *i & 0x1;
-
-	/* Reduce i by 1 when odd number of instructions.*/
-	*i -= flag_odd;
-
-	if (*i > 2 * CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_DUAL_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, *io_addr);
-		lmt_arg = ROC_CN10K_DUAL_CPT_LMT_ARG | (*i / 2 - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, *io_addr);
-		if (flag_odd) {
-			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
-				   (ROC_CN10K_CPT_INST_DW_M1 << 4);
-			lmt_arg = (uint64_t)(lmt_id + *i / 2);
-			roc_lmt_submit_steorl(lmt_arg, *io_addr);
-			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
-				   (ROC_CN10K_TWO_CPT_INST_DW_M1 << 4);
-			*i += 1;
-		}
-	} else {
-		if (*i != 0) {
-			lmt_arg =
-				ROC_CN10K_DUAL_CPT_LMT_ARG | (*i / 2 - 1) << 12 | (uint64_t)lmt_id;
-			roc_lmt_submit_steorl(lmt_arg, *io_addr);
-		}
-
-		if (flag_odd) {
-			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
-				   (ROC_CN10K_CPT_INST_DW_M1 << 4);
-			lmt_arg = (uint64_t)(lmt_id + *i / 2);
-			roc_lmt_submit_steorl(lmt_arg, *io_addr);
-			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
-				   (ROC_CN10K_TWO_CPT_INST_DW_M1 << 4);
-			*i += 1;
-		}
-	}
-
-	rte_io_wmb();
-}
-
 static inline struct cnxk_se_sess *
 cn10k_cpt_sym_temp_sess_create(struct cnxk_cpt_qp *qp, struct rte_crypto_op *op)
 {
@@ -385,8 +332,8 @@ static uint16_t
 cn10k_cpt_enqueue_burst(void *qptr, struct rte_crypto_op **ops, uint16_t nb_ops,
 			const bool is_sg_ver2)
 {
-	uint64_t lmt_base, lmt_arg, io_addr;
 	struct cpt_inflight_req *infl_req;
+	uint64_t head, lmt_base, io_addr;
 	uint16_t nb_allowed, count = 0;
 	struct cnxk_cpt_qp *qp = qptr;
 	struct pending_queue *pend_q;
@@ -394,7 +341,6 @@ cn10k_cpt_enqueue_burst(void *qptr, struct rte_crypto_op **ops, uint16_t nb_ops,
 	union cpt_fc_write_s fc;
 	uint64_t *fc_addr;
 	uint16_t lmt_id;
-	uint64_t head;
 	int ret, i;
 
 	pend_q = &qp->pend_q;
@@ -424,11 +370,11 @@ cn10k_cpt_enqueue_burst(void *qptr, struct rte_crypto_op **ops, uint16_t nb_ops,
 		goto pend_q_commit;
 	}
 
-	for (i = 0; i < RTE_MIN(CN10K_PKTS_PER_LOOP, nb_ops); i++) {
+	for (i = 0; i < RTE_MIN(CN10K_CPT_PKTS_PER_LOOP, nb_ops); i++) {
 		infl_req = &pend_q->req_queue[head];
 		infl_req->op_flags = 0;
 
-		ret = cn10k_cpt_fill_inst(qp, ops + i, &inst[2 * i], infl_req, is_sg_ver2);
+		ret = cn10k_cpt_fill_inst(qp, ops + i, &inst[i], infl_req, is_sg_ver2);
 		if (unlikely(ret != 1)) {
 			plt_dp_err("Could not process op: %p", ops + i);
 			if (i == 0)
@@ -439,24 +385,12 @@ cn10k_cpt_enqueue_burst(void *qptr, struct rte_crypto_op **ops, uint16_t nb_ops,
 		pending_queue_advance(&head, pq_mask);
 	}
 
-	if (i > CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	} else {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	}
-
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
-	if (nb_ops - i > 0 && i == CN10K_PKTS_PER_LOOP) {
-		nb_ops -= i;
-		ops += i;
-		count += i;
+	if (nb_ops - i > 0 && i == CN10K_CPT_PKTS_PER_LOOP) {
+		nb_ops -= CN10K_CPT_PKTS_PER_LOOP;
+		ops += CN10K_CPT_PKTS_PER_LOOP;
+		count += CN10K_CPT_PKTS_PER_LOOP;
 		goto again;
 	}
 
@@ -631,7 +565,7 @@ cn10k_cpt_vec_pkt_submission_timeout_handle(void)
 static inline void
 cn10k_cpt_vec_submit(struct vec_request vec_tbl[], uint16_t vec_tbl_len, struct cnxk_cpt_qp *qp)
 {
-	uint64_t lmt_base, lmt_arg, lmt_id, io_addr;
+	uint64_t lmt_base, lmt_id, io_addr;
 	union cpt_fc_write_s fc;
 	struct cpt_inst_s *inst;
 	uint16_t burst_size;
@@ -659,7 +593,7 @@ cn10k_cpt_vec_submit(struct vec_request vec_tbl[], uint16_t vec_tbl_len, struct
 again:
 	burst_size = RTE_MIN(CN10K_PKTS_PER_STEORL, vec_tbl_len);
 	for (i = 0; i < burst_size; i++)
-		cn10k_cpt_vec_inst_fill(&vec_tbl[i], &inst[i * 2], qp, vec_tbl[0].w7);
+		cn10k_cpt_vec_inst_fill(&vec_tbl[i], &inst[i], qp, vec_tbl[0].w7);
 
 	do {
 		fc.u64[0] = __atomic_load_n(fc_addr, __ATOMIC_RELAXED);
@@ -669,10 +603,7 @@ cn10k_cpt_vec_submit(struct vec_request vec_tbl[], uint16_t vec_tbl_len, struct
 			cn10k_cpt_vec_pkt_submission_timeout_handle();
 	} while (true);
 
-	lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | lmt_id;
-	roc_lmt_submit_steorl(lmt_arg, io_addr);
-
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
 	vec_tbl_len -= i;
 
@@ -686,12 +617,12 @@ static inline int
 ca_lmtst_vec_submit(struct ops_burst *burst, struct vec_request vec_tbl[], uint16_t *vec_tbl_len,
 		    const bool is_sg_ver2)
 {
-	struct cpt_inflight_req *infl_reqs[CN10K_PKTS_PER_LOOP];
-	uint64_t lmt_base, lmt_arg, io_addr;
+	struct cpt_inflight_req *infl_reqs[CN10K_CPT_PKTS_PER_LOOP];
 	uint16_t lmt_id, len = *vec_tbl_len;
 	struct cpt_inst_s *inst, *inst_base;
 	struct cpt_inflight_req *infl_req;
 	struct rte_event_vector *vec;
+	uint64_t lmt_base, io_addr;
 	union cpt_fc_write_s fc;
 	struct cnxk_cpt_qp *qp;
 	uint64_t *fc_addr;
@@ -728,7 +659,7 @@ ca_lmtst_vec_submit(struct ops_burst *burst, struct vec_request vec_tbl[], uint1
 	}
 
 	for (i = 0; i < burst->nb_ops; i++) {
-		inst = &inst_base[2 * i];
+		inst = &inst_base[i];
 		infl_req = infl_reqs[i];
 		infl_req->op_flags = 0;
 
@@ -788,24 +719,12 @@ next_op:;
 	if (CNXK_TT_FROM_TAG(burst->ws->gw_rdata) == SSO_TT_ORDERED)
 		roc_sso_hws_head_wait(burst->ws->base);
 
-	if (i > CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	} else {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	}
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
 	/* Store w7 of last successfully filled instruction */
 	inst = &inst_base[2 * (i - 1)];
 	vec_tbl[0].w7 = inst->w7;
 
-	rte_io_wmb();
-
 put:
 	if (i != burst->nb_ops)
 		rte_mempool_put_bulk(qp->ca.req_mp, (void *)&infl_reqs[i], burst->nb_ops - i);
@@ -818,10 +737,10 @@ next_op:;
 static inline uint16_t
 ca_lmtst_burst_submit(struct ops_burst *burst, const bool is_sg_ver2)
 {
-	struct cpt_inflight_req *infl_reqs[CN10K_PKTS_PER_LOOP];
-	uint64_t lmt_base, lmt_arg, io_addr;
+	struct cpt_inflight_req *infl_reqs[CN10K_CPT_PKTS_PER_LOOP];
 	struct cpt_inst_s *inst, *inst_base;
 	struct cpt_inflight_req *infl_req;
+	uint64_t lmt_base, io_addr;
 	union cpt_fc_write_s fc;
 	struct cnxk_cpt_qp *qp;
 	uint64_t *fc_addr;
@@ -852,7 +771,7 @@ ca_lmtst_burst_submit(struct ops_burst *burst, const bool is_sg_ver2)
 	}
 
 	for (i = 0; i < burst->nb_ops; i++) {
-		inst = &inst_base[2 * i];
+		inst = &inst_base[i];
 		infl_req = infl_reqs[i];
 		infl_req->op_flags = 0;
 
@@ -889,19 +808,7 @@ ca_lmtst_burst_submit(struct ops_burst *burst, const bool is_sg_ver2)
 	if (CNXK_TT_FROM_TAG(burst->ws->gw_rdata) == SSO_TT_ORDERED)
 		roc_sso_hws_head_wait(burst->ws->base);
 
-	if (i > CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	} else {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	}
-
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
 put:
 	if (unlikely(i != burst->nb_ops))
@@ -963,7 +870,7 @@ cn10k_cpt_crypto_adapter_enqueue(void *ws, struct rte_event ev[], uint16_t nb_ev
 		burst.op[burst.nb_ops] = op;
 
 		/* Max nb_ops per burst check */
-		if (++burst.nb_ops == CN10K_PKTS_PER_LOOP) {
+		if (++burst.nb_ops == CN10K_CPT_PKTS_PER_LOOP) {
 			if (is_vector)
 				submitted = ca_lmtst_vec_submit(&burst, vec_tbl, &vec_tbl_len,
 								is_sg_ver2);
@@ -1460,8 +1367,6 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 
 	vf = cdev->data->dev_private;
 
-	const int nb_pkts_per_loop = 2 * CN10K_PKTS_PER_LOOP;
-
 	lmt_base = vf->rx_inj_lmtline.lmt_base;
 	io_addr = vf->rx_inj_lmtline.io_addr;
 	fc_addr = vf->rx_inj_lmtline.fc_addr;
@@ -1481,7 +1386,7 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 	if (unlikely(fc.s.qsize > fc_thresh))
 		goto exit;
 
-	for (; i < RTE_MIN(nb_pkts_per_loop, nb_pkts); i++) {
+	for (; i < RTE_MIN(CN10K_CPT_PKTS_PER_LOOP, nb_pkts); i++) {
 
 		m = pkts[i];
 		sec_sess = (struct cn10k_sec_session *)sess[i];
@@ -1539,11 +1444,11 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 
 	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
-	if (nb_pkts - i > 0 && i == nb_pkts_per_loop) {
-		nb_pkts -= nb_pkts_per_loop;
-		pkts += nb_pkts_per_loop;
-		count += nb_pkts_per_loop;
-		sess += nb_pkts_per_loop;
+	if (nb_pkts - i > 0 && i == CN10K_CPT_PKTS_PER_LOOP) {
+		nb_pkts -= CN10K_CPT_PKTS_PER_LOOP;
+		pkts += CN10K_CPT_PKTS_PER_LOOP;
+		count += CN10K_CPT_PKTS_PER_LOOP;
+		sess += CN10K_CPT_PKTS_PER_LOOP;
 		goto again;
 	}
 
@@ -1642,8 +1547,8 @@ cn10k_cpt_raw_enqueue_burst(void *qpair, uint8_t *drv_ctx, struct rte_crypto_sym
 			    const bool is_sgv2)
 {
 	uint16_t lmt_id, nb_allowed, nb_ops = vec->num;
-	uint64_t lmt_base, lmt_arg, io_addr, head;
 	struct cpt_inflight_req *infl_req;
+	uint64_t lmt_base, io_addr, head;
 	struct cnxk_cpt_qp *qp = qpair;
 	struct cnxk_sym_dp_ctx *dp_ctx;
 	struct pending_queue *pend_q;
@@ -1680,7 +1585,7 @@ cn10k_cpt_raw_enqueue_burst(void *qpair, uint8_t *drv_ctx, struct rte_crypto_sym
 		goto pend_q_commit;
 	}
 
-	for (i = 0; i < RTE_MIN(CN10K_PKTS_PER_LOOP, nb_ops); i++) {
+	for (i = 0; i < RTE_MIN(CN10K_CPT_PKTS_PER_LOOP, nb_ops); i++) {
 		struct cnxk_iov iov;
 
 		index = count + i;
@@ -1688,7 +1593,7 @@ cn10k_cpt_raw_enqueue_burst(void *qpair, uint8_t *drv_ctx, struct rte_crypto_sym
 		infl_req->op_flags = 0;
 
 		cnxk_raw_burst_to_iov(vec, &ofs, index, &iov);
-		ret = cn10k_cpt_raw_fill_inst(&iov, qp, dp_ctx, &inst[2 * i], infl_req,
+		ret = cn10k_cpt_raw_fill_inst(&iov, qp, dp_ctx, &inst[i], infl_req,
 					      user_data[index], is_sgv2);
 		if (unlikely(ret != 1)) {
 			plt_dp_err("Could not process vec: %d", index);
@@ -1702,21 +1607,9 @@ cn10k_cpt_raw_enqueue_burst(void *qpair, uint8_t *drv_ctx, struct rte_crypto_sym
 		pending_queue_advance(&head, pq_mask);
 	}
 
-	if (i > CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	} else {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	}
-
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
-	if (nb_ops - i > 0 && i == CN10K_PKTS_PER_LOOP) {
+	if (nb_ops - i > 0 && i == CN10K_CPT_PKTS_PER_LOOP) {
 		nb_ops -= i;
 		count += i;
 		goto again;
@@ -1757,8 +1650,8 @@ cn10k_cpt_raw_enqueue(void *qpair, uint8_t *drv_ctx, struct rte_crypto_vec *data
 		      struct rte_crypto_va_iova_ptr *aad_or_auth_iv, void *user_data,
 		      const bool is_sgv2)
 {
-	uint64_t lmt_base, lmt_arg, io_addr, head;
 	struct cpt_inflight_req *infl_req;
+	uint64_t lmt_base, io_addr, head;
 	struct cnxk_cpt_qp *qp = qpair;
 	struct cnxk_sym_dp_ctx *dp_ctx;
 	uint16_t lmt_id, nb_allowed;
@@ -1766,7 +1659,7 @@ cn10k_cpt_raw_enqueue(void *qpair, uint8_t *drv_ctx, struct rte_crypto_vec *data
 	union cpt_fc_write_s fc;
 	struct cnxk_iov iov;
 	uint64_t *fc_addr;
-	int ret;
+	int ret, i = 1;
 
 	struct pending_queue *pend_q = &qp->pend_q;
 	const uint64_t pq_mask = pend_q->pq_mask;
@@ -1803,10 +1696,7 @@ cn10k_cpt_raw_enqueue(void *qpair, uint8_t *drv_ctx, struct rte_crypto_vec *data
 
 	pending_queue_advance(&head, pq_mask);
 
-	lmt_arg = ROC_CN10K_CPT_LMT_ARG | (uint64_t)lmt_id;
-	roc_lmt_submit_steorl(lmt_arg, io_addr);
-
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
 	pend_q->head = head;
 	pend_q->time_out = rte_get_timer_cycles() + DEFAULT_COMMAND_TIMEOUT * rte_get_timer_hz();
diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.h b/drivers/crypto/cnxk/cn10k_cryptodev_ops.h
index 406c4abc7f..be76c49a65 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.h
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.h
@@ -5,15 +5,21 @@
 #ifndef _CN10K_CRYPTODEV_OPS_H_
 #define _CN10K_CRYPTODEV_OPS_H_
 
-#include <rte_compat.h>
 #include <cryptodev_pmd.h>
+#include <rte_compat.h>
 #include <rte_cryptodev.h>
 #include <rte_eventdev.h>
 
+#if defined(__aarch64__)
+#include "roc_io.h"
+#else
+#include "roc_io_generic.h"
+#endif
+
 #include "cnxk_cryptodev.h"
 
-#define CN10K_PKTS_PER_LOOP   32
-#define CN10K_PKTS_PER_STEORL 16
+#define CN10K_PKTS_PER_STEORL	  32
+#define CN10K_LMTLINES_PER_STEORL 16
 
 extern struct rte_cryptodev_ops cn10k_cpt_ops;
 
@@ -34,4 +40,52 @@ __rte_internal
 uint16_t __rte_hot cn10k_cpt_sg_ver2_crypto_adapter_enqueue(void *ws, struct rte_event ev[],
 		uint16_t nb_events);
 
+static __rte_always_inline void __rte_hot
+cn10k_cpt_lmtst_dual_submit(uint64_t *io_addr, const uint16_t lmt_id, int *i)
+{
+	uint64_t lmt_arg;
+
+	/* Check if the total number of instructions is odd or even. */
+	const int flag_odd = *i & 0x1;
+
+	/* Reduce i by 1 when odd number of instructions.*/
+	*i -= flag_odd;
+
+	if (*i > CN10K_PKTS_PER_STEORL) {
+		lmt_arg = ROC_CN10K_DUAL_CPT_LMT_ARG | (CN10K_LMTLINES_PER_STEORL - 1) << 12 |
+			  (uint64_t)lmt_id;
+		roc_lmt_submit_steorl(lmt_arg, *io_addr);
+		lmt_arg = ROC_CN10K_DUAL_CPT_LMT_ARG |
+			  (*i / 2 - CN10K_LMTLINES_PER_STEORL - 1) << 12 |
+			  (uint64_t)(lmt_id + CN10K_LMTLINES_PER_STEORL);
+		roc_lmt_submit_steorl(lmt_arg, *io_addr);
+		if (flag_odd) {
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_CPT_INST_DW_M1 << 4);
+			lmt_arg = (uint64_t)(lmt_id + *i / 2);
+			roc_lmt_submit_steorl(lmt_arg, *io_addr);
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_TWO_CPT_INST_DW_M1 << 4);
+			*i += 1;
+		}
+	} else {
+		if (*i != 0) {
+			lmt_arg =
+				ROC_CN10K_DUAL_CPT_LMT_ARG | (*i / 2 - 1) << 12 | (uint64_t)lmt_id;
+			roc_lmt_submit_steorl(lmt_arg, *io_addr);
+		}
+
+		if (flag_odd) {
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_CPT_INST_DW_M1 << 4);
+			lmt_arg = (uint64_t)(lmt_id + *i / 2);
+			roc_lmt_submit_steorl(lmt_arg, *io_addr);
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_TWO_CPT_INST_DW_M1 << 4);
+			*i += 1;
+		}
+	}
+
+	rte_io_wmb();
+}
 #endif /* _CN10K_CRYPTODEV_OPS_H_ */
diff --git a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
index 6acaa4413b..cfcfa79fdf 100644
--- a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
@@ -431,7 +431,6 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 	struct rte_pci_device *pci_dev;
 	struct cnxk_cpt_qp *qp;
 	uint32_t nb_desc;
-	uint64_t io_addr;
 	int ret;
 
 	if (dev->data->queue_pairs[qp_id] != NULL)
@@ -467,7 +466,7 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 
 	roc_cpt->lf[qp_id] = &qp->lf;
 
-	ret = roc_cpt_lmtline_init(roc_cpt, &qp->lmtline, qp_id);
+	ret = roc_cpt_lmtline_init(roc_cpt, &qp->lmtline, qp_id, true);
 	if (ret < 0) {
 		roc_cpt->lf[qp_id] = NULL;
 		plt_err("Could not init lmtline for queue pair %d", qp_id);
@@ -478,7 +477,7 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 	dev->data->queue_pairs[qp_id] = qp;
 
 	if (qp_id == vf->rx_inject_qp) {
-		ret = roc_cpt_lmtline_init(roc_cpt, &vf->rx_inj_lmtline, vf->rx_inject_qp);
+		ret = roc_cpt_lmtline_init(roc_cpt, &vf->rx_inj_lmtline, vf->rx_inject_qp, true);
 		if (ret) {
 			plt_err("Could not init lmtline Rx inject");
 			goto exit;
@@ -486,14 +485,6 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 
 		vf->rx_inj_sso_pf_func = roc_idev_nix_inl_dev_pffunc_get();
 
-		/* Update IO addr to enable dual submission */
-		io_addr = vf->rx_inj_lmtline.io_addr;
-		io_addr = (io_addr & ~(uint64_t)(0x7 << 4)) | ROC_CN10K_TWO_CPT_INST_DW_M1 << 4;
-		vf->rx_inj_lmtline.io_addr = io_addr;
-
-		/* Update FC threshold to reflect dual submission */
-		vf->rx_inj_lmtline.fc_thresh -= 32;
-
 		/* Block the queue for other submissions */
 		qp->pend_q.pq_mask = 0;
 	}
@@ -969,44 +960,28 @@ rte_pmd_cnxk_crypto_qptr_get(uint8_t dev_id, uint16_t qp_id)
 static inline void
 cnxk_crypto_cn10k_submit(void *qptr, void *inst, uint16_t nb_inst)
 {
-	uint64_t lmt_base, lmt_arg, io_addr;
 	struct cnxk_cpt_qp *qp = qptr;
-	uint16_t i, j, lmt_id;
+	uint64_t lmt_base, io_addr;
+	uint16_t lmt_id;
 	void *lmt_dst;
+	int i;
 
 	lmt_base = qp->lmtline.lmt_base;
 	io_addr = qp->lmtline.io_addr;
 
 	ROC_LMT_BASE_ID_GET(lmt_base, lmt_id);
 
-again:
-	i = RTE_MIN(nb_inst, CN10K_PKTS_PER_LOOP);
 	lmt_dst = PLT_PTR_CAST(lmt_base);
+again:
+	i = RTE_MIN(nb_inst, CN10K_CPT_PKTS_PER_LOOP);
 
-	for (j = 0; j < i; j++) {
-		rte_memcpy(lmt_dst, inst, sizeof(struct cpt_inst_s));
-		inst = RTE_PTR_ADD(inst, sizeof(struct cpt_inst_s));
-		lmt_dst = RTE_PTR_ADD(lmt_dst, 2 * sizeof(struct cpt_inst_s));
-	}
-
-	rte_io_wmb();
-
-	if (i > CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	} else {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	}
+	memcpy(lmt_dst, inst, i * sizeof(struct cpt_inst_s));
 
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
 	if (nb_inst - i > 0) {
-		nb_inst -= i;
+		nb_inst -= CN10K_CPT_PKTS_PER_LOOP;
+		inst = RTE_PTR_ADD(inst, CN10K_CPT_PKTS_PER_LOOP * sizeof(struct cpt_inst_s));
 		goto again;
 	}
 }
diff --git a/drivers/crypto/cnxk/cnxk_cryptodev_ops.h b/drivers/crypto/cnxk/cnxk_cryptodev_ops.h
index 9de7e432e4..caf6ac35e5 100644
--- a/drivers/crypto/cnxk/cnxk_cryptodev_ops.h
+++ b/drivers/crypto/cnxk/cnxk_cryptodev_ops.h
@@ -25,6 +25,8 @@
 
 #define MOD_INC(i, l) ((i) == (l - 1) ? (i) = 0 : (i)++)
 
+#define CN10K_CPT_PKTS_PER_LOOP	  64
+
 /* Macros to form words in CPT instruction */
 #define CNXK_CPT_INST_W2(tag, tt, grp, rvu_pf_func)                            \
 	((tag) | ((uint64_t)(tt) << 32) | ((uint64_t)(grp) << 34) |            \
diff --git a/drivers/event/cnxk/cnxk_eventdev_adptr.c b/drivers/event/cnxk/cnxk_eventdev_adptr.c
index 98db11ad61..2c049e7041 100644
--- a/drivers/event/cnxk/cnxk_eventdev_adptr.c
+++ b/drivers/event/cnxk/cnxk_eventdev_adptr.c
@@ -632,7 +632,7 @@ crypto_adapter_qp_setup(const struct rte_cryptodev *cdev, struct cnxk_cpt_qp *qp
 	 * simultaneous enqueue from all available cores.
 	 */
 	if (roc_model_is_cn10k())
-		nb_desc_min = rte_lcore_count() * 32;
+		nb_desc_min = rte_lcore_count() * CN10K_CPT_PKTS_PER_LOOP;
 	else
 		nb_desc_min = rte_lcore_count() * 2;
 
@@ -707,7 +707,7 @@ crypto_adapter_qp_free(struct cnxk_cpt_qp *qp)
 	rte_mempool_free(qp->ca.req_mp);
 	qp->ca.enabled = false;
 
-	ret = roc_cpt_lmtline_init(qp->lf.roc_cpt, &qp->lmtline, qp->lf.lf_id);
+	ret = roc_cpt_lmtline_init(qp->lf.roc_cpt, &qp->lmtline, qp->lf.lf_id, true);
 	if (ret < 0) {
 		plt_err("Could not reset lmtline for queue pair %d", qp->lf.lf_id);
 		return ret;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 00/12] fixes and improvements to CNXK crypto PMD
  2024-06-20 14:58 [PATCH 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                   ` (11 preceding siblings ...)
  2024-06-20 14:58 ` [PATCH 12/12] crypto/cnxk: enable dual submission to CPT Aakash Sasidharan
@ 2024-06-24  6:23 ` Aakash Sasidharan
  2024-06-24  6:23   ` [PATCH v2 01/12] common/cnxk: add comments to denote skipped entries Aakash Sasidharan
                     ` (12 more replies)
  12 siblings, 13 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-24  6:23 UTC (permalink / raw)
  Cc: gakhil, jerinj, anoobj, vvelumuri, asasidharan, dev

v2:
Fix compilation errors observed with arm gcc-13.

This series adds improvements to CNXK crypto PMD and fixes aes-gcm zero
length input failure.

Aakash Sasidharan (1):
  crypto/cnxk: fix aes-gcm zero len input cases

Anoob Joseph (11):
  common/cnxk: add comments to denote skipped entries
  crypto/cnxk: update version map file with PMD APIs
  common/cnxk: make inline dev PF func get as idev API
  crypto/cnxk: add flow control in Rx inject path
  crypto/cnxk: use SSO PF func of inline device in inst
  crypto/cnxk: use NEON for Rx inject inst preparation
  crypto/cnxk: remove init of CPT result field in packet
  crypto/cnxk: add dual submission in Rx inject
  crypto/cnxk: update sess pointer for next iteration
  crypto/cnxk: make pack IV variable as const
  crypto/cnxk: enable dual submission to CPT

 drivers/common/cnxk/roc_ae.c              |   6 +-
 drivers/common/cnxk/roc_ae_fpm_tables.c   |   6 +-
 drivers/common/cnxk/roc_cpt.c             |  17 +-
 drivers/common/cnxk/roc_cpt.h             |  51 +++--
 drivers/common/cnxk/roc_idev.c            |   6 +
 drivers/common/cnxk/roc_idev.h            |   2 +
 drivers/common/cnxk/roc_nix_inl.h         |   1 -
 drivers/common/cnxk/roc_nix_inl_dev.c     |   6 -
 drivers/common/cnxk/version.map           |   2 +-
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 234 +++++++++-------------
 drivers/crypto/cnxk/cn10k_cryptodev_ops.h |  60 +++++-
 drivers/crypto/cnxk/cnxk_cryptodev.h      |   2 +-
 drivers/crypto/cnxk/cnxk_cryptodev_ops.c  |  40 ++--
 drivers/crypto/cnxk/cnxk_cryptodev_ops.h  |   2 +
 drivers/crypto/cnxk/cnxk_se.h             |  55 ++---
 drivers/crypto/cnxk/rte_pmd_cnxk_crypto.h |   2 +
 drivers/crypto/cnxk/version.map           |   8 +
 drivers/event/cnxk/cnxk_eventdev_adptr.c  |   4 +-
 drivers/net/cnxk/cn10k_ethdev_sec.c       |   2 +-
 drivers/net/cnxk/cnxk_ethdev_telemetry.c  |   3 +-
 20 files changed, 275 insertions(+), 234 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 01/12] common/cnxk: add comments to denote skipped entries
  2024-06-24  6:23 ` [PATCH v2 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
@ 2024-06-24  6:23   ` Aakash Sasidharan
  2024-06-24  6:23   ` [PATCH v2 02/12] crypto/cnxk: update version map file with PMD APIs Aakash Sasidharan
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-24  6:23 UTC (permalink / raw)
  To: Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Harman Kalra
  Cc: gakhil, jerinj, anoobj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Add comments to denote unused table entries.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/common/cnxk/roc_ae.c            | 6 +++---
 drivers/common/cnxk/roc_ae_fpm_tables.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/common/cnxk/roc_ae.c b/drivers/common/cnxk/roc_ae.c
index e6a013d7c4..7ef0efe2b3 100644
--- a/drivers/common/cnxk/roc_ae.c
+++ b/drivers/common/cnxk/roc_ae.c
@@ -151,9 +151,9 @@ const struct roc_ae_ec_group ae_ec_grp[ROC_AE_EC_ID_PMAX] = {
 			     0x3F, 0x00},
 		    .length = 66},
 	},
-	{},
-	{},
-	{},
+	{ /* ROC_AE_EC_ID_P160 */ },
+	{ /* ROC_AE_EC_ID_P320 */ },
+	{ /* ROC_AE_EC_ID_P512 */ },
 	{
 		.prime = {.data = {0xFF, 0xFF, 0xFF, 0xFE, 0xFF, 0xFF, 0xFF,
 				   0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
diff --git a/drivers/common/cnxk/roc_ae_fpm_tables.c b/drivers/common/cnxk/roc_ae_fpm_tables.c
index ead3128e7f..942657b56a 100644
--- a/drivers/common/cnxk/roc_ae_fpm_tables.c
+++ b/drivers/common/cnxk/roc_ae_fpm_tables.c
@@ -1261,9 +1261,9 @@ const struct ae_fpm_entry ae_fpm_tbl_scalar[ROC_AE_EC_ID_PMAX] = {
 		.data = ae_fpm_tbl_p521,
 		.len = sizeof(ae_fpm_tbl_p521)
 	},
-	{},
-	{},
-	{},
+	{ /* ROC_AE_EC_ID_P160 */ },
+	{ /* ROC_AE_EC_ID_P320 */ },
+	{ /* ROC_AE_EC_ID_P512 */ },
 	{
 		.data = ae_fpm_tbl_p256_sm2,
 		.len = sizeof(ae_fpm_tbl_p256_sm2)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 02/12] crypto/cnxk: update version map file with PMD APIs
  2024-06-24  6:23 ` [PATCH v2 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
  2024-06-24  6:23   ` [PATCH v2 01/12] common/cnxk: add comments to denote skipped entries Aakash Sasidharan
@ 2024-06-24  6:23   ` Aakash Sasidharan
  2024-06-24  6:23   ` [PATCH v2 03/12] common/cnxk: make inline dev PF func get as idev API Aakash Sasidharan
                     ` (10 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-24  6:23 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Update version map with details of PMD APIs added.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/crypto/cnxk/rte_pmd_cnxk_crypto.h | 2 ++
 drivers/crypto/cnxk/version.map           | 8 ++++++++
 2 files changed, 10 insertions(+)

diff --git a/drivers/crypto/cnxk/rte_pmd_cnxk_crypto.h b/drivers/crypto/cnxk/rte_pmd_cnxk_crypto.h
index 8b0a5ba0f2..eab1243065 100644
--- a/drivers/crypto/cnxk/rte_pmd_cnxk_crypto.h
+++ b/drivers/crypto/cnxk/rte_pmd_cnxk_crypto.h
@@ -23,6 +23,7 @@
  * @return
  *   Pointer to queue pair structure that would be the input to submit APIs.
  */
+__rte_experimental
 void *rte_pmd_cnxk_crypto_qptr_get(uint8_t dev_id, uint16_t qp_id);
 
 /**
@@ -41,6 +42,7 @@ void *rte_pmd_cnxk_crypto_qptr_get(uint8_t dev_id, uint16_t qp_id);
  * @param nb_inst
  *   Number of instructions.
  */
+__rte_experimental
 void rte_pmd_cnxk_crypto_submit(void *qptr, void *inst, uint16_t nb_inst);
 
 #endif /* _PMD_CNXK_CRYPTO_H_ */
diff --git a/drivers/crypto/cnxk/version.map b/drivers/crypto/cnxk/version.map
index 5789a6bfc9..7a77607774 100644
--- a/drivers/crypto/cnxk/version.map
+++ b/drivers/crypto/cnxk/version.map
@@ -1,3 +1,11 @@
+EXPERIMENTAL {
+	global:
+
+	# added in 24.03
+	rte_pmd_cnxk_crypto_submit;
+	rte_pmd_cnxk_crypto_qptr_get;
+};
+
 INTERNAL {
 	global:
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 03/12] common/cnxk: make inline dev PF func get as idev API
  2024-06-24  6:23 ` [PATCH v2 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
  2024-06-24  6:23   ` [PATCH v2 01/12] common/cnxk: add comments to denote skipped entries Aakash Sasidharan
  2024-06-24  6:23   ` [PATCH v2 02/12] crypto/cnxk: update version map file with PMD APIs Aakash Sasidharan
@ 2024-06-24  6:23   ` Aakash Sasidharan
  2024-06-24  6:23   ` [PATCH v2 04/12] crypto/cnxk: add flow control in Rx inject path Aakash Sasidharan
                     ` (9 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-24  6:23 UTC (permalink / raw)
  To: Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Harman Kalra
  Cc: gakhil, jerinj, anoobj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Inline PF FUNC would be required to set SSO_PF_FUNC in the instruction
for cryptodev Rx inject. Move the API to idev to allow usage of the
same.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/common/cnxk/roc_idev.c           | 6 ++++++
 drivers/common/cnxk/roc_idev.h           | 2 ++
 drivers/common/cnxk/roc_nix_inl.h        | 1 -
 drivers/common/cnxk/roc_nix_inl_dev.c    | 6 ------
 drivers/common/cnxk/version.map          | 2 +-
 drivers/net/cnxk/cn10k_ethdev_sec.c      | 2 +-
 drivers/net/cnxk/cnxk_ethdev_telemetry.c | 3 +--
 7 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/common/cnxk/roc_idev.c b/drivers/common/cnxk/roc_idev.c
index d0307c666c..0778d51d1e 100644
--- a/drivers/common/cnxk/roc_idev.c
+++ b/drivers/common/cnxk/roc_idev.c
@@ -374,3 +374,9 @@ roc_idev_nix_rx_chan_set(uint16_t port, uint16_t chan)
 	if (idev != NULL && port < PLT_MAX_ETHPORTS)
 		__atomic_store_n(&idev->inl_rx_inj_cfg.chan[port], chan, __ATOMIC_RELEASE);
 }
+
+uint16_t
+roc_idev_nix_inl_dev_pffunc_get(void)
+{
+	return nix_inl_dev_pffunc_get();
+}
diff --git a/drivers/common/cnxk/roc_idev.h b/drivers/common/cnxk/roc_idev.h
index 00664eaed6..fc0f7db54e 100644
--- a/drivers/common/cnxk/roc_idev.h
+++ b/drivers/common/cnxk/roc_idev.h
@@ -27,4 +27,6 @@ uint8_t __roc_api roc_idev_nix_rx_inject_get(uint16_t port);
 void __roc_api roc_idev_nix_rx_inject_set(uint16_t port, uint8_t enable);
 uint16_t *__roc_api roc_idev_nix_rx_chan_base_get(void);
 void __roc_api roc_idev_nix_rx_chan_set(uint16_t port, uint16_t chan);
+
+uint16_t __roc_api roc_idev_nix_inl_dev_pffunc_get(void);
 #endif /* _ROC_IDEV_H_ */
diff --git a/drivers/common/cnxk/roc_nix_inl.h b/drivers/common/cnxk/roc_nix_inl.h
index ab0965e512..1a4bf8808c 100644
--- a/drivers/common/cnxk/roc_nix_inl.h
+++ b/drivers/common/cnxk/roc_nix_inl.h
@@ -112,7 +112,6 @@ void __roc_api roc_nix_inl_dev_lock(void);
 void __roc_api roc_nix_inl_dev_unlock(void);
 int __roc_api roc_nix_inl_dev_xaq_realloc(uint64_t aura_handle);
 int __roc_api roc_nix_inl_dev_stats_get(struct roc_nix_stats *stats);
-uint16_t __roc_api roc_nix_inl_dev_pffunc_get(void);
 int __roc_api roc_nix_inl_dev_cpt_setup(bool use_inl_dev_sso);
 int __roc_api roc_nix_inl_dev_cpt_release(void);
 bool __roc_api roc_nix_inl_dev_is_multi_channel(void);
diff --git a/drivers/common/cnxk/roc_nix_inl_dev.c b/drivers/common/cnxk/roc_nix_inl_dev.c
index 60e6a43033..e2bbe3a67b 100644
--- a/drivers/common/cnxk/roc_nix_inl_dev.c
+++ b/drivers/common/cnxk/roc_nix_inl_dev.c
@@ -34,12 +34,6 @@ nix_inl_dev_pffunc_get(void)
 	return 0;
 }
 
-uint16_t
-roc_nix_inl_dev_pffunc_get(void)
-{
-	return nix_inl_dev_pffunc_get();
-}
-
 static void
 nix_inl_selftest_work_cb(uint64_t *gw, void *args, uint32_t soft_exp_event)
 {
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index eac2ea9ff8..f98738d07e 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -112,6 +112,7 @@ INTERNAL {
 	roc_idev_npa_nix_get;
 	roc_idev_num_lmtlines_get;
 	roc_idev_nix_inl_meta_aura_get;
+	roc_idev_nix_inl_dev_pffunc_get;
 	roc_idev_nix_list_get;
 	roc_idev_nix_rx_chan_base_get;
 	roc_idev_nix_rx_chan_set;
@@ -244,7 +245,6 @@ INTERNAL {
 	roc_nix_inl_dev_is_probed;
 	roc_nix_inl_dev_stats_get;
 	roc_nix_inl_dev_lock;
-	roc_nix_inl_dev_pffunc_get;
 	roc_nix_inl_dev_rq;
 	roc_nix_inl_dev_rq_get;
 	roc_nix_inl_dev_rq_put;
diff --git a/drivers/net/cnxk/cn10k_ethdev_sec.c b/drivers/net/cnxk/cn10k_ethdev_sec.c
index b8b0da5ea9..5e509e97d4 100644
--- a/drivers/net/cnxk/cn10k_ethdev_sec.c
+++ b/drivers/net/cnxk/cn10k_ethdev_sec.c
@@ -1360,7 +1360,7 @@ cn10k_eth_sec_rx_inject_config(void *device, uint16_t port_id, bool enable)
 	inj_cfg->io_addr = inl_lf->io_addr;
 	inj_cfg->lmt_base = nix->lmt_base;
 	channel = roc_nix_get_base_chan(nix);
-	pf_func = roc_nix_inl_dev_pffunc_get();
+	pf_func = roc_idev_nix_inl_dev_pffunc_get();
 	inj_cfg->cmd_w0 = pf_func << 48 | inj_match_id << 32 | channel << 4;
 
 	return 0;
diff --git a/drivers/net/cnxk/cnxk_ethdev_telemetry.c b/drivers/net/cnxk/cnxk_ethdev_telemetry.c
index 3027ca4735..a1958185f2 100644
--- a/drivers/net/cnxk/cnxk_ethdev_telemetry.c
+++ b/drivers/net/cnxk/cnxk_ethdev_telemetry.c
@@ -65,8 +65,7 @@ ethdev_tel_handle_info(const char *cmd __rte_unused,
 			info = &eth_info.info;
 			dev = cnxk_eth_pmd_priv(eth_dev);
 			if (dev) {
-				info->inl_dev_pf_func =
-					roc_nix_inl_dev_pffunc_get();
+				info->inl_dev_pf_func = roc_idev_nix_inl_dev_pffunc_get();
 				info->pf_func = roc_nix_get_pf_func(&dev->nix);
 				info->max_mac_entries = dev->max_mac_entries;
 				info->dmac_filter_ena = dev->dmac_filter_enable;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 04/12] crypto/cnxk: add flow control in Rx inject path
  2024-06-24  6:23 ` [PATCH v2 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                     ` (2 preceding siblings ...)
  2024-06-24  6:23   ` [PATCH v2 03/12] common/cnxk: make inline dev PF func get as idev API Aakash Sasidharan
@ 2024-06-24  6:23   ` Aakash Sasidharan
  2024-06-24  6:23   ` [PATCH v2 05/12] crypto/cnxk: use SSO PF func of inline device in inst Aakash Sasidharan
                     ` (8 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-24  6:23 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Add flow control in Rx inject path to avoid over submission to CPT.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index 720b756001..9f1c074925 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -1400,8 +1400,10 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 	struct rte_cryptodev *cdev = dev;
 	union cpt_res_s *hw_res = NULL;
 	struct cpt_inst_s *inst;
+	union cpt_fc_write_s fc;
 	struct cnxk_cpt_vf *vf;
 	struct rte_mbuf *m;
+	uint64_t *fc_addr;
 	uint64_t dptr;
 	int i;
 
@@ -1413,13 +1415,24 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 
 	lmt_base = vf->rx_inj_lmtline.lmt_base;
 	io_addr = vf->rx_inj_lmtline.io_addr;
+	fc_addr = vf->rx_inj_lmtline.fc_addr;
 
 	ROC_LMT_BASE_ID_GET(lmt_base, lmt_id);
 	pf_func = vf->rx_inj_pf_func;
 
+	const uint32_t fc_thresh = vf->rx_inj_lmtline.fc_thresh;
+
 again:
+	fc.u64[0] =
+		rte_atomic_load_explicit((RTE_ATOMIC(uint64_t) *)fc_addr, rte_memory_order_relaxed);
 	inst = (struct cpt_inst_s *)lmt_base;
-	for (i = 0; i < RTE_MIN(CN10K_PKTS_PER_LOOP, nb_pkts); i++) {
+
+	i = 0;
+
+	if (unlikely(fc.s.qsize > fc_thresh))
+		goto exit;
+
+	for (; i < RTE_MIN(CN10K_PKTS_PER_LOOP, nb_pkts); i++) {
 
 		m = pkts[i];
 		sec_sess = (struct cn10k_sec_session *)sess[i];
@@ -1487,6 +1500,7 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 		goto again;
 	}
 
+exit:
 	return count + i;
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 05/12] crypto/cnxk: use SSO PF func of inline device in inst
  2024-06-24  6:23 ` [PATCH v2 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                     ` (3 preceding siblings ...)
  2024-06-24  6:23   ` [PATCH v2 04/12] crypto/cnxk: add flow control in Rx inject path Aakash Sasidharan
@ 2024-06-24  6:23   ` Aakash Sasidharan
  2024-06-24  6:23   ` [PATCH v2 06/12] crypto/cnxk: use NEON for Rx inject inst preparation Aakash Sasidharan
                     ` (7 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-24  6:23 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

RVU PF FUNC of the CPT LF need not be set as the hardware would
determine that. Instead SSO PF FUNC need to be set as inline device so
that critical errors would reach inline device.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 2 +-
 drivers/crypto/cnxk/cnxk_cryptodev.h      | 2 +-
 drivers/crypto/cnxk/cnxk_cryptodev_ops.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index 9f1c074925..f2980399c5 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -1418,7 +1418,7 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 	fc_addr = vf->rx_inj_lmtline.fc_addr;
 
 	ROC_LMT_BASE_ID_GET(lmt_base, lmt_id);
-	pf_func = vf->rx_inj_pf_func;
+	pf_func = vf->rx_inj_sso_pf_func;
 
 	const uint32_t fc_thresh = vf->rx_inj_lmtline.fc_thresh;
 
diff --git a/drivers/crypto/cnxk/cnxk_cryptodev.h b/drivers/crypto/cnxk/cnxk_cryptodev.h
index fffc4a47b4..4000e84a7e 100644
--- a/drivers/crypto/cnxk/cnxk_cryptodev.h
+++ b/drivers/crypto/cnxk/cnxk_cryptodev.h
@@ -22,7 +22,7 @@
  */
 struct cnxk_cpt_vf {
 	struct roc_cpt_lmtline rx_inj_lmtline;
-	uint16_t rx_inj_pf_func;
+	uint16_t rx_inj_sso_pf_func;
 	uint16_t *rx_chan_base;
 	struct roc_cpt cpt;
 	struct rte_cryptodev_capabilities crypto_caps[CNXK_CPT_MAX_CAPS];
diff --git a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
index d7f5780637..51369309c5 100644
--- a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
@@ -483,7 +483,7 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 			goto exit;
 		}
 
-		vf->rx_inj_pf_func = qp->lf.pf_func;
+		vf->rx_inj_sso_pf_func = roc_idev_nix_inl_dev_pffunc_get();
 
 		/* Block the queue for other submissions */
 		qp->pend_q.pq_mask = 0;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 06/12] crypto/cnxk: use NEON for Rx inject inst preparation
  2024-06-24  6:23 ` [PATCH v2 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                     ` (4 preceding siblings ...)
  2024-06-24  6:23   ` [PATCH v2 05/12] crypto/cnxk: use SSO PF func of inline device in inst Aakash Sasidharan
@ 2024-06-24  6:23   ` Aakash Sasidharan
  2024-06-24  6:23   ` [PATCH v2 07/12] crypto/cnxk: remove init of CPT result field in packet Aakash Sasidharan
                     ` (6 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-24  6:23 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Use NEON instructions for Rx inject instruction preparation.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
Signed-off-by: Aakash Sasidharan <asasidharan@marvell.com>
---
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 62 +++++++++++++++++------
 1 file changed, 46 insertions(+), 16 deletions(-)

diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index f2980399c5..446a3c3fd8 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -7,6 +7,7 @@
 #include <rte_event_crypto_adapter.h>
 #include <rte_hexdump.h>
 #include <rte_ip.h>
+#include <rte_vect.h>
 
 #include <ethdev_driver.h>
 
@@ -1390,15 +1391,17 @@ cn10k_cpt_dequeue_burst(void *qptr, struct rte_crypto_op **ops, uint16_t nb_ops)
 	return i;
 }
 
+#if defined(RTE_ARCH_ARM64)
 uint16_t __rte_hot
 cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 				  struct rte_security_session **sess, uint16_t nb_pkts)
 {
-	uint16_t l2_len, pf_func, lmt_id, count = 0;
-	uint64_t lmt_base, lmt_arg, io_addr;
+	uint64_t lmt_base, lmt_arg, io_addr, u64_0, u64_1, l2_len, pf_func;
+	uint64x2_t inst_01, inst_23, inst_45, inst_67;
 	struct cn10k_sec_session *sec_sess;
 	struct rte_cryptodev *cdev = dev;
 	union cpt_res_s *hw_res = NULL;
+	uint16_t lmt_id, count = 0;
 	struct cpt_inst_s *inst;
 	union cpt_fc_write_s fc;
 	struct cnxk_cpt_vf *vf;
@@ -1456,26 +1459,41 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 		hw_res = RTE_PTR_ALIGN_CEIL(hw_res, 16);
 
 		/* Prepare CPT instruction */
-		inst->w0.u64 = 0;
-		inst->w2.u64 = 0;
-		inst->w2.s.rvu_pf_func = pf_func;
-		inst->w3.u64 = (((uint64_t)m + sizeof(struct rte_mbuf)) >> 3) << 3 | 1;
 
-		inst->w4.u64 = sec_sess->inst.w4 | (rte_pktmbuf_pkt_len(m));
+		/* Word 0 and 1 */
+		inst_01 = vdupq_n_u64(0);
+		u64_0 = pf_func << 48 | *(vf->rx_chan_base + m->port) << 4 | (l2_len - 2) << 24 |
+			l2_len << 16;
+		inst_01 = vsetq_lane_u64(u64_0, inst_01, 0);
+		inst_01 = vsetq_lane_u64((uint64_t)hw_res, inst_01, 1);
+		vst1q_u64(&inst->w0.u64, inst_01);
+
+		/* Word 2 and 3 */
+		inst_23 = vdupq_n_u64(0);
+		u64_1 = (((uint64_t)m + sizeof(struct rte_mbuf)) >> 3) << 3 | 1;
+		inst_23 = vsetq_lane_u64(u64_1, inst_23, 1);
+		vst1q_u64(&inst->w2.u64, inst_23);
+
+		/* Word 4 and 5 */
+		inst_45 = vdupq_n_u64(0);
+		u64_0 = sec_sess->inst.w4 | (rte_pktmbuf_pkt_len(m));
+		inst_45 = vsetq_lane_u64(u64_0, inst_45, 0);
 		dptr = (uint64_t)rte_pktmbuf_iova(m);
-		inst->dptr = dptr;
-		inst->rptr = dptr;
-
-		inst->w0.hw_s.chan = *(vf->rx_chan_base + m->port);
-		inst->w0.hw_s.l2_len = l2_len;
-		inst->w0.hw_s.et_offset = l2_len - 2;
+		u64_1 = dptr;
+		inst_45 = vsetq_lane_u64(u64_1, inst_45, 1);
+		vst1q_u64(&inst->w4.u64, inst_45);
+
+		/* Word 6 and 7 */
+		inst_67 = vdupq_n_u64(0);
+		u64_0 = dptr;
+		u64_1 = sec_sess->inst.w7;
+		inst_67 = vsetq_lane_u64(u64_0, inst_67, 0);
+		inst_67 = vsetq_lane_u64(u64_1, inst_67, 1);
+		vst1q_u64(&inst->w6.u64, inst_67);
 
-		inst->res_addr = (uint64_t)hw_res;
 		rte_atomic_store_explicit((unsigned long __rte_atomic *)&hw_res->u64[0], res.u64[0],
 					  rte_memory_order_relaxed);
 
-		inst->w7.u64 = sec_sess->inst.w7;
-
 		inst += 2;
 	}
 
@@ -1503,6 +1521,18 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 exit:
 	return count + i;
 }
+#else
+uint16_t __rte_hot
+cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
+				  struct rte_security_session **sess, uint16_t nb_pkts)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(pkts);
+	RTE_SET_USED(sess);
+	RTE_SET_USED(nb_pkts);
+	return 0;
+}
+#endif
 
 void
 cn10k_cpt_set_enqdeq_fns(struct rte_cryptodev *dev, struct cnxk_cpt_vf *vf)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 07/12] crypto/cnxk: remove init of CPT result field in packet
  2024-06-24  6:23 ` [PATCH v2 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                     ` (5 preceding siblings ...)
  2024-06-24  6:23   ` [PATCH v2 06/12] crypto/cnxk: use NEON for Rx inject inst preparation Aakash Sasidharan
@ 2024-06-24  6:23   ` Aakash Sasidharan
  2024-06-24  6:23   ` [PATCH v2 08/12] crypto/cnxk: add dual submission in Rx inject Aakash Sasidharan
                     ` (5 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-24  6:23 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

The packet would be posted to CPT only when there is a valid result.
Skip setting of the same.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index 446a3c3fd8..90ca9eec27 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -1410,10 +1410,6 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 	uint64_t dptr;
 	int i;
 
-	const union cpt_res_s res = {
-		.cn10k.compcode = CPT_COMP_NOT_DONE,
-	};
-
 	vf = cdev->data->dev_private;
 
 	lmt_base = vf->rx_inj_lmtline.lmt_base;
@@ -1491,9 +1487,6 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 		inst_67 = vsetq_lane_u64(u64_1, inst_67, 1);
 		vst1q_u64(&inst->w6.u64, inst_67);
 
-		rte_atomic_store_explicit((unsigned long __rte_atomic *)&hw_res->u64[0], res.u64[0],
-					  rte_memory_order_relaxed);
-
 		inst += 2;
 	}
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 08/12] crypto/cnxk: add dual submission in Rx inject
  2024-06-24  6:23 ` [PATCH v2 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                     ` (6 preceding siblings ...)
  2024-06-24  6:23   ` [PATCH v2 07/12] crypto/cnxk: remove init of CPT result field in packet Aakash Sasidharan
@ 2024-06-24  6:23   ` Aakash Sasidharan
  2024-06-26  6:41     ` Akhil Goyal
  2024-06-24  6:23   ` [PATCH v2 09/12] crypto/cnxk: update sess pointer for next iteration Aakash Sasidharan
                     ` (4 subsequent siblings)
  12 siblings, 1 reply; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-24  6:23 UTC (permalink / raw)
  To: Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Harman Kalra, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Add dual submission to CPT in Rx inject path.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
Signed-off-by: Vidya Sagar Velumuri <vvelumuri@marvell.com>
---
 drivers/common/cnxk/roc_cpt.h             | 43 +++++++++-----
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 70 +++++++++++++++++------
 drivers/crypto/cnxk/cnxk_cryptodev_ops.c  |  9 +++
 3 files changed, 90 insertions(+), 32 deletions(-)

diff --git a/drivers/common/cnxk/roc_cpt.h b/drivers/common/cnxk/roc_cpt.h
index 3721fa08c0..8ef9062ae0 100644
--- a/drivers/common/cnxk/roc_cpt.h
+++ b/drivers/common/cnxk/roc_cpt.h
@@ -30,23 +30,36 @@
 /* Vector of sizes in the burst of 16 CPT inst except first in 63:19 of
  * APT_LMT_ARG_S
  */
-#define ROC_CN10K_CPT_LMT_ARG                                                  \
-	(ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 0) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 1) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 2) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 3) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 4) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 5) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 6) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 7) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 8) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 9) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 10) |                           \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 11) |                           \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 12) |                           \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 13) |                           \
+#define ROC_CN10K_CPT_LMT_ARG                                                                      \
+	(ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 0) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 1) |     \
+	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 2) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 3) |     \
+	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 4) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 5) |     \
+	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 6) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 7) |     \
+	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 8) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 9) |     \
+	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 10) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 11) |   \
+	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 12) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 13) |   \
 	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 14))
 
+/* Vector of sizes in the burst of 2 * 16 CPT inst except first in 63:19 of
+ * APT_LMT_ARG_S
+ */
+#define ROC_CN10K_DUAL_CPT_LMT_ARG                                                                 \
+	(ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 0) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 1) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 2) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 3) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 4) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 5) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 6) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 7) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 8) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 9) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 10) |                                           \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 11) |                                           \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 12) |                                           \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 13) |                                           \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 14))
+
 /* CPT helper macros */
 #define ROC_CPT_AH_HDR_LEN	12
 #define ROC_CPT_AES_GCM_IV_LEN	8
diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index 90ca9eec27..a3a13c032e 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -55,6 +55,54 @@ struct vec_request {
 	uint64_t w2;
 };
 
+static __rte_always_inline void __rte_hot
+cn10k_cpt_lmtst_dual_submit(uint64_t *io_addr, const uint16_t lmt_id, int *i)
+{
+	uint64_t lmt_arg;
+
+	/* Check if the total number of instructions is odd or even. */
+	const int flag_odd = *i & 0x1;
+
+	/* Reduce i by 1 when odd number of instructions.*/
+	*i -= flag_odd;
+
+	if (*i > 2 * CN10K_PKTS_PER_STEORL) {
+		lmt_arg = ROC_CN10K_DUAL_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
+			  (uint64_t)lmt_id;
+		roc_lmt_submit_steorl(lmt_arg, *io_addr);
+		lmt_arg = ROC_CN10K_DUAL_CPT_LMT_ARG | (*i / 2 - CN10K_PKTS_PER_STEORL - 1) << 12 |
+			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
+		roc_lmt_submit_steorl(lmt_arg, *io_addr);
+		if (flag_odd) {
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_CPT_INST_DW_M1 << 4);
+			lmt_arg = (uint64_t)(lmt_id + *i / 2);
+			roc_lmt_submit_steorl(lmt_arg, *io_addr);
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_TWO_CPT_INST_DW_M1 << 4);
+			*i += 1;
+		}
+	} else {
+		if (*i != 0) {
+			lmt_arg =
+				ROC_CN10K_DUAL_CPT_LMT_ARG | (*i / 2 - 1) << 12 | (uint64_t)lmt_id;
+			roc_lmt_submit_steorl(lmt_arg, *io_addr);
+		}
+
+		if (flag_odd) {
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_CPT_INST_DW_M1 << 4);
+			lmt_arg = (uint64_t)(lmt_id + *i / 2);
+			roc_lmt_submit_steorl(lmt_arg, *io_addr);
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_TWO_CPT_INST_DW_M1 << 4);
+			*i += 1;
+		}
+	}
+
+	rte_io_wmb();
+}
+
 static inline struct cnxk_se_sess *
 cn10k_cpt_sym_temp_sess_create(struct cnxk_cpt_qp *qp, struct rte_crypto_op *op)
 {
@@ -1396,7 +1444,7 @@ uint16_t __rte_hot
 cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 				  struct rte_security_session **sess, uint16_t nb_pkts)
 {
-	uint64_t lmt_base, lmt_arg, io_addr, u64_0, u64_1, l2_len, pf_func;
+	uint64_t lmt_base, io_addr, u64_0, u64_1, l2_len, pf_func;
 	uint64x2_t inst_01, inst_23, inst_45, inst_67;
 	struct cn10k_sec_session *sec_sess;
 	struct rte_cryptodev *cdev = dev;
@@ -1431,7 +1479,7 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 	if (unlikely(fc.s.qsize > fc_thresh))
 		goto exit;
 
-	for (; i < RTE_MIN(CN10K_PKTS_PER_LOOP, nb_pkts); i++) {
+	for (; i < RTE_MIN(2 * CN10K_PKTS_PER_LOOP, nb_pkts); i++) {
 
 		m = pkts[i];
 		sec_sess = (struct cn10k_sec_session *)sess[i];
@@ -1487,24 +1535,12 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 		inst_67 = vsetq_lane_u64(u64_1, inst_67, 1);
 		vst1q_u64(&inst->w6.u64, inst_67);
 
-		inst += 2;
-	}
-
-	if (i > CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	} else {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
+		inst++;
 	}
 
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
-	if (nb_pkts - i > 0 && i == CN10K_PKTS_PER_LOOP) {
+	if (nb_pkts - i > 0 && i == 2 * CN10K_PKTS_PER_LOOP) {
 		nb_pkts -= i;
 		pkts += i;
 		count += i;
diff --git a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
index 51369309c5..6acaa4413b 100644
--- a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
@@ -431,6 +431,7 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 	struct rte_pci_device *pci_dev;
 	struct cnxk_cpt_qp *qp;
 	uint32_t nb_desc;
+	uint64_t io_addr;
 	int ret;
 
 	if (dev->data->queue_pairs[qp_id] != NULL)
@@ -485,6 +486,14 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 
 		vf->rx_inj_sso_pf_func = roc_idev_nix_inl_dev_pffunc_get();
 
+		/* Update IO addr to enable dual submission */
+		io_addr = vf->rx_inj_lmtline.io_addr;
+		io_addr = (io_addr & ~(uint64_t)(0x7 << 4)) | ROC_CN10K_TWO_CPT_INST_DW_M1 << 4;
+		vf->rx_inj_lmtline.io_addr = io_addr;
+
+		/* Update FC threshold to reflect dual submission */
+		vf->rx_inj_lmtline.fc_thresh -= 32;
+
 		/* Block the queue for other submissions */
 		qp->pend_q.pq_mask = 0;
 	}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 09/12] crypto/cnxk: update sess pointer for next iteration
  2024-06-24  6:23 ` [PATCH v2 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                     ` (7 preceding siblings ...)
  2024-06-24  6:23   ` [PATCH v2 08/12] crypto/cnxk: add dual submission in Rx inject Aakash Sasidharan
@ 2024-06-24  6:23   ` Aakash Sasidharan
  2024-06-24  6:23   ` [PATCH v2 10/12] crypto/cnxk: fix aes-gcm zero len input cases Aakash Sasidharan
                     ` (3 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-24  6:23 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Update sess pointer while working on next set of packets.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index a3a13c032e..0d5a9ab5ef 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -1460,6 +1460,8 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 
 	vf = cdev->data->dev_private;
 
+	const int nb_pkts_per_loop = 2 * CN10K_PKTS_PER_LOOP;
+
 	lmt_base = vf->rx_inj_lmtline.lmt_base;
 	io_addr = vf->rx_inj_lmtline.io_addr;
 	fc_addr = vf->rx_inj_lmtline.fc_addr;
@@ -1479,7 +1481,7 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 	if (unlikely(fc.s.qsize > fc_thresh))
 		goto exit;
 
-	for (; i < RTE_MIN(2 * CN10K_PKTS_PER_LOOP, nb_pkts); i++) {
+	for (; i < RTE_MIN(nb_pkts_per_loop, nb_pkts); i++) {
 
 		m = pkts[i];
 		sec_sess = (struct cn10k_sec_session *)sess[i];
@@ -1540,10 +1542,11 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 
 	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
-	if (nb_pkts - i > 0 && i == 2 * CN10K_PKTS_PER_LOOP) {
-		nb_pkts -= i;
-		pkts += i;
-		count += i;
+	if (nb_pkts - i > 0 && i == nb_pkts_per_loop) {
+		nb_pkts -= nb_pkts_per_loop;
+		pkts += nb_pkts_per_loop;
+		count += nb_pkts_per_loop;
+		sess += nb_pkts_per_loop;
 		goto again;
 	}
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 10/12] crypto/cnxk: fix aes-gcm zero len input cases
  2024-06-24  6:23 ` [PATCH v2 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                     ` (8 preceding siblings ...)
  2024-06-24  6:23   ` [PATCH v2 09/12] crypto/cnxk: update sess pointer for next iteration Aakash Sasidharan
@ 2024-06-24  6:23   ` Aakash Sasidharan
  2024-06-24  6:24   ` [PATCH v2 11/12] crypto/cnxk: make pack IV variable as const Aakash Sasidharan
                     ` (2 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-24  6:23 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj, Akhil Goyal
  Cc: jerinj, vvelumuri, asasidharan, dev

For aes-gcm (AEAD) zero length input, sg code path is taken unlike
the digest only cases as AAD is treated as a separate input component.
Fix the zero len case in SG path by avoiding the gather component
only when it is a non AEAD algorithm. Also add sg version check as
the fix only applies to specific model.

Fixes: 4d8166d64988 ("crypto/cnxk: enable digest for zero length input")

Signed-off-by: Aakash Sasidharan <asasidharan@marvell.com>
---
 drivers/crypto/cnxk/cnxk_se.h | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/crypto/cnxk/cnxk_se.h b/drivers/crypto/cnxk/cnxk_se.h
index 6374718a82..63dbef4411 100644
--- a/drivers/crypto/cnxk/cnxk_se.h
+++ b/drivers/crypto/cnxk/cnxk_se.h
@@ -2468,13 +2468,14 @@ fill_sess_gmac(struct rte_crypto_sym_xform *xform, struct cnxk_se_sess *sess)
 }
 
 static __rte_always_inline uint32_t
-prepare_iov_from_pkt(struct rte_mbuf *pkt, struct roc_se_iov_ptr *iovec, uint32_t start_offset)
+prepare_iov_from_pkt(struct rte_mbuf *pkt, struct roc_se_iov_ptr *iovec, uint32_t start_offset,
+		     const bool is_aead, const bool is_sg_ver2)
 {
 	uint16_t index = 0;
 	void *seg_data = NULL;
 	int32_t seg_size = 0;
 
-	if (!pkt || pkt->data_len == 0) {
+	if (!pkt || (is_sg_ver2 && (pkt->data_len == 0) && !is_aead)) {
 		iovec->buf_cnt = 0;
 		return 0;
 	}
@@ -2619,13 +2620,13 @@ fill_sm_params(struct rte_crypto_op *cop, struct cnxk_se_sess *sess,
 		fc_params.dst_iov = (void *)dst;
 
 		/* Store SG I/O in the api for reuse */
-		if (prepare_iov_from_pkt(m_src, fc_params.src_iov, 0)) {
+		if (prepare_iov_from_pkt(m_src, fc_params.src_iov, 0, false, is_sg_ver2)) {
 			plt_dp_err("Prepare src iov failed");
 			ret = -EINVAL;
 			goto err_exit;
 		}
 
-		if (prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0)) {
+		if (prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0, false, is_sg_ver2)) {
 			plt_dp_err("Prepare dst iov failed for m_dst %p", m_dst);
 			ret = -EINVAL;
 			goto err_exit;
@@ -2816,14 +2817,15 @@ fill_fc_params(struct rte_crypto_op *cop, struct cnxk_se_sess *sess,
 		fc_params.dst_iov = (void *)dst;
 
 		/* Store SG I/O in the api for reuse */
-		if (prepare_iov_from_pkt(m_src, fc_params.src_iov, 0)) {
+		if (prepare_iov_from_pkt(m_src, fc_params.src_iov, 0, is_aead, is_sg_ver2)) {
 			plt_dp_err("Prepare src iov failed");
 			ret = -EINVAL;
 			goto err_exit;
 		}
 
 		if (unlikely(m_dst != NULL)) {
-			if (prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0)) {
+			if (prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0, is_aead,
+						 is_sg_ver2)) {
 				plt_dp_err("Prepare dst iov failed for "
 					   "m_dst %p",
 					   m_dst);
@@ -2957,13 +2959,15 @@ fill_pdcp_params(struct rte_crypto_op *cop, struct cnxk_se_sess *sess,
 		fc_params.dst_iov = (void *)dst;
 
 		/* Store SG I/O in the api for reuse */
-		if (unlikely(prepare_iov_from_pkt(m_src, fc_params.src_iov, 0))) {
+		if (unlikely(
+			    prepare_iov_from_pkt(m_src, fc_params.src_iov, 0, false, is_sg_ver2))) {
 			plt_dp_err("Prepare src iov failed");
 			ret = -EINVAL;
 			goto err_exit;
 		}
 
-		if (unlikely(prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0))) {
+		if (unlikely(
+			    prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0, false, is_sg_ver2))) {
 			plt_dp_err("Prepare dst iov failed for m_dst %p", m_dst);
 			ret = -EINVAL;
 			goto err_exit;
@@ -3080,14 +3084,16 @@ fill_pdcp_chain_params(struct rte_crypto_op *cop, struct cnxk_se_sess *sess,
 		fc_params.dst_iov = (void *)dst;
 
 		/* Store SG I/O in the api for reuse */
-		if (unlikely(prepare_iov_from_pkt(m_src, fc_params.src_iov, 0))) {
+		if (unlikely(
+			    prepare_iov_from_pkt(m_src, fc_params.src_iov, 0, false, is_sg_ver2))) {
 			plt_dp_err("Could not prepare src iov");
 			ret = -EINVAL;
 			goto err_exit;
 		}
 
 		if (unlikely(m_dst != NULL)) {
-			if (unlikely(prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0))) {
+			if (unlikely(prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0, false,
+							  is_sg_ver2))) {
 				plt_dp_err("Could not prepare m_dst iov %p", m_dst);
 				ret = -EINVAL;
 				goto err_exit;
@@ -3306,7 +3312,7 @@ fill_digest_params(struct rte_crypto_op *cop, struct cnxk_se_sess *sess,
 	params.src_iov = (void *)src;
 
 	/*Store SG I/O in the api for reuse */
-	if (prepare_iov_from_pkt(m_src, params.src_iov, auth_range_off)) {
+	if (prepare_iov_from_pkt(m_src, params.src_iov, auth_range_off, false, is_sg_ver2)) {
 		plt_dp_err("Prepare src iov failed");
 		ret = -EINVAL;
 		goto free_mdata_and_exit;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 11/12] crypto/cnxk: make pack IV variable as const
  2024-06-24  6:23 ` [PATCH v2 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                     ` (9 preceding siblings ...)
  2024-06-24  6:23   ` [PATCH v2 10/12] crypto/cnxk: fix aes-gcm zero len input cases Aakash Sasidharan
@ 2024-06-24  6:24   ` Aakash Sasidharan
  2024-06-24  6:24   ` [PATCH v2 12/12] crypto/cnxk: enable dual submission to CPT Aakash Sasidharan
  2024-06-26 10:55   ` [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD Aakash Sasidharan
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-24  6:24 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Make 'pack_iv' variable as const to avoid multiple checks.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/crypto/cnxk/cnxk_se.h | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/crypto/cnxk/cnxk_se.h b/drivers/crypto/cnxk/cnxk_se.h
index 63dbef4411..dbd36a8a54 100644
--- a/drivers/crypto/cnxk/cnxk_se.h
+++ b/drivers/crypto/cnxk/cnxk_se.h
@@ -105,7 +105,7 @@ cpt_pack_iv(uint8_t *iv_src, uint8_t *iv_dst)
 }
 
 static inline void
-pdcp_iv_copy(uint8_t *iv_d, const uint8_t *iv_s, const uint8_t pdcp_alg_type, uint8_t pack_iv)
+pdcp_iv_copy(uint8_t *iv_d, const uint8_t *iv_s, const uint8_t pdcp_alg_type, const bool pack_iv)
 {
 	const uint32_t *iv_s_temp;
 	uint32_t iv_temp[4];
@@ -261,7 +261,7 @@ cpt_mac_len_verify(struct rte_crypto_auth_xform *auth)
 
 static __rte_always_inline int
 sg_inst_prep(struct roc_se_fc_params *params, struct cpt_inst_s *inst, uint64_t offset_ctrl,
-	     const uint8_t *iv_s, int iv_len, uint8_t pack_iv, uint8_t pdcp_alg_type,
+	     const uint8_t *iv_s, int iv_len, const bool pack_iv, uint8_t pdcp_alg_type,
 	     int32_t inputlen, int32_t outputlen, uint32_t passthrough_len, uint32_t req_flags,
 	     int pdcp_flag, int decrypt)
 {
@@ -457,7 +457,7 @@ sg_inst_prep(struct roc_se_fc_params *params, struct cpt_inst_s *inst, uint64_t
 
 static __rte_always_inline int
 sg2_inst_prep(struct roc_se_fc_params *params, struct cpt_inst_s *inst, uint64_t offset_ctrl,
-	      const uint8_t *iv_s, int iv_len, uint8_t pack_iv, uint8_t pdcp_alg_type,
+	      const uint8_t *iv_s, int iv_len, const bool pack_iv, uint8_t pdcp_alg_type,
 	      int32_t inputlen, int32_t outputlen, uint32_t passthrough_len, uint32_t req_flags,
 	      int pdcp_flag, int decrypt)
 {
@@ -882,7 +882,7 @@ static inline int
 pdcp_chain_sg1_prep(struct roc_se_fc_params *params, struct roc_se_ctx *cpt_ctx,
 		    struct cpt_inst_s *inst, union cpt_inst_w4 w4, int32_t inputlen,
 		    uint8_t hdr_len, uint64_t offset_ctrl, uint32_t req_flags,
-		    const uint8_t *cipher_iv, const uint8_t *auth_iv, const int pack_iv,
+		    const uint8_t *cipher_iv, const uint8_t *auth_iv, const bool pack_iv,
 		    const uint8_t pdcp_ci_alg, const uint8_t pdcp_auth_alg)
 {
 	struct roc_sglist_comp *scatter_comp, *gather_comp;
@@ -991,7 +991,7 @@ static inline int
 pdcp_chain_sg2_prep(struct roc_se_fc_params *params, struct roc_se_ctx *cpt_ctx,
 		    struct cpt_inst_s *inst, union cpt_inst_w4 w4, int32_t inputlen,
 		    uint8_t hdr_len, uint64_t offset_ctrl, uint32_t req_flags,
-		    const uint8_t *cipher_iv, const uint8_t *auth_iv, const int pack_iv,
+		    const uint8_t *cipher_iv, const uint8_t *auth_iv, const bool pack_iv,
 		    const uint8_t pdcp_ci_alg, const uint8_t pdcp_auth_alg)
 {
 	struct roc_sg2list_comp *gather_comp, *scatter_comp;
@@ -1528,7 +1528,6 @@ cpt_pdcp_chain_alg_prep(uint32_t req_flags, uint64_t d_offs, uint64_t d_lens,
 	struct roc_se_ctx *se_ctx;
 	uint64_t *offset_vaddr;
 	uint64_t offset_ctrl;
-	uint8_t pack_iv = 0;
 	int32_t inputlen;
 	void *dm_vaddr;
 	uint8_t *iv_d;
@@ -1606,10 +1605,10 @@ cpt_pdcp_chain_alg_prep(uint32_t req_flags, uint64_t d_offs, uint64_t d_lens,
 		cpt_inst_w4.s.dlen = inputlen + ROC_SE_OFF_CTRL_LEN;
 
 		iv_d = ((uint8_t *)offset_vaddr + ROC_SE_OFF_CTRL_LEN);
-		pdcp_iv_copy(iv_d, cipher_iv, pdcp_ci_alg, pack_iv);
+		pdcp_iv_copy(iv_d, cipher_iv, pdcp_ci_alg, false);
 
 		iv_d = ((uint8_t *)offset_vaddr + ROC_SE_OFF_CTRL_LEN + pdcp_iv_off);
-		pdcp_iv_copy(iv_d, auth_iv, pdcp_auth_alg, pack_iv);
+		pdcp_iv_copy(iv_d, auth_iv, pdcp_auth_alg, false);
 
 		inst->w4.u64 = cpt_inst_w4.u64;
 		return 0;
@@ -1618,11 +1617,11 @@ cpt_pdcp_chain_alg_prep(uint32_t req_flags, uint64_t d_offs, uint64_t d_lens,
 		if (is_sg_ver2)
 			return pdcp_chain_sg2_prep(params, se_ctx, inst, cpt_inst_w4, inputlen,
 						   hdr_len, offset_ctrl, req_flags, cipher_iv,
-						   auth_iv, pack_iv, pdcp_ci_alg, pdcp_auth_alg);
+						   auth_iv, false, pdcp_ci_alg, pdcp_auth_alg);
 		else
 			return pdcp_chain_sg1_prep(params, se_ctx, inst, cpt_inst_w4, inputlen,
 						   hdr_len, offset_ctrl, req_flags, cipher_iv,
-						   auth_iv, pack_iv, pdcp_ci_alg, pdcp_auth_alg);
+						   auth_iv, false, pdcp_ci_alg, pdcp_auth_alg);
 	}
 }
 
@@ -1647,9 +1646,9 @@ cpt_pdcp_alg_prep(uint32_t req_flags, uint64_t d_offs, uint64_t d_lens,
 	uint64_t *offset_vaddr;
 	uint8_t pdcp_alg_type;
 	uint32_t mac_len = 0;
-	const uint8_t *iv_s;
-	uint8_t pack_iv = 0;
 	uint64_t offset_ctrl;
+	bool pack_iv = false;
+	const uint8_t *iv_s;
 	int ret;
 
 	mac_len = se_ctx->mac_len;
@@ -1671,7 +1670,7 @@ cpt_pdcp_alg_prep(uint32_t req_flags, uint64_t d_offs, uint64_t d_lens,
 		if (pdcp_alg_type != ROC_SE_PDCP_ALG_TYPE_AES_CMAC) {
 
 			if (params->auth_iv_len == 25)
-				pack_iv = 1;
+				pack_iv = true;
 
 			auth_offset = auth_offset / 8;
 			auth_data_len = RTE_ALIGN(auth_data_len, 8) / 8;
@@ -1694,7 +1693,7 @@ cpt_pdcp_alg_prep(uint32_t req_flags, uint64_t d_offs, uint64_t d_lens,
 		pdcp_alg_type = se_ctx->pdcp_ci_alg;
 
 		if (params->cipher_iv_len == 25)
-			pack_iv = 1;
+			pack_iv = true;
 
 		/*
 		 * Microcode expects offsets in bytes
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 12/12] crypto/cnxk: enable dual submission to CPT
  2024-06-24  6:23 ` [PATCH v2 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                     ` (10 preceding siblings ...)
  2024-06-24  6:24   ` [PATCH v2 11/12] crypto/cnxk: make pack IV variable as const Aakash Sasidharan
@ 2024-06-24  6:24   ` Aakash Sasidharan
  2024-06-26 10:55   ` [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD Aakash Sasidharan
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-24  6:24 UTC (permalink / raw)
  To: Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Harman Kalra, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Pavan Nikhilesh, Shijith Thotton
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Submit two instructions in one LMTLINE.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/common/cnxk/roc_cpt.c             |  17 +-
 drivers/common/cnxk/roc_cpt.h             |   8 +-
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 182 +++++-----------------
 drivers/crypto/cnxk/cn10k_cryptodev_ops.h |  60 ++++++-
 drivers/crypto/cnxk/cnxk_cryptodev_ops.c  |  47 ++----
 drivers/crypto/cnxk/cnxk_cryptodev_ops.h  |   2 +
 drivers/event/cnxk/cnxk_eventdev_adptr.c  |   4 +-
 7 files changed, 124 insertions(+), 196 deletions(-)

diff --git a/drivers/common/cnxk/roc_cpt.c b/drivers/common/cnxk/roc_cpt.c
index 9f283ceb2e..aba2a49d19 100644
--- a/drivers/common/cnxk/roc_cpt.c
+++ b/drivers/common/cnxk/roc_cpt.c
@@ -1135,8 +1135,8 @@ roc_cpt_iq_enable(struct roc_cpt_lf *lf)
 }
 
 int
-roc_cpt_lmtline_init(struct roc_cpt *roc_cpt, struct roc_cpt_lmtline *lmtline,
-		     int lf_id)
+roc_cpt_lmtline_init(struct roc_cpt *roc_cpt, struct roc_cpt_lmtline *lmtline, int lf_id,
+		     bool is_dual)
 {
 	struct roc_cpt_lf *lf;
 
@@ -1145,12 +1145,19 @@ roc_cpt_lmtline_init(struct roc_cpt *roc_cpt, struct roc_cpt_lmtline *lmtline,
 		return -ENOTSUP;
 
 	lmtline->io_addr = lf->io_addr;
-	if (roc_model_is_cn10k())
-		lmtline->io_addr |= ROC_CN10K_CPT_INST_DW_M1 << 4;
+	lmtline->fc_thresh = lf->nb_desc - CPT_LF_FC_MIN_THRESHOLD;
+
+	if (roc_model_is_cn10k()) {
+		if (is_dual) {
+			lmtline->io_addr |= ROC_CN10K_TWO_CPT_INST_DW_M1 << 4;
+			lmtline->fc_thresh = lf->nb_desc -  2 * CPT_LF_FC_MIN_THRESHOLD;
+		} else {
+			lmtline->io_addr |= ROC_CN10K_CPT_INST_DW_M1 << 4;
+		}
+	}
 
 	lmtline->fc_addr = lf->fc_addr;
 	lmtline->lmt_base = lf->lmt_base;
-	lmtline->fc_thresh = lf->nb_desc - CPT_LF_FC_MIN_THRESHOLD;
 
 	return 0;
 }
diff --git a/drivers/common/cnxk/roc_cpt.h b/drivers/common/cnxk/roc_cpt.h
index 8ef9062ae0..e2e919f80f 100644
--- a/drivers/common/cnxk/roc_cpt.h
+++ b/drivers/common/cnxk/roc_cpt.h
@@ -200,12 +200,12 @@ int __roc_api roc_cpt_afs_print(struct roc_cpt *roc_cpt);
 int __roc_api roc_cpt_lfs_print(struct roc_cpt *roc_cpt);
 void __roc_api roc_cpt_iq_disable(struct roc_cpt_lf *lf);
 void __roc_api roc_cpt_iq_enable(struct roc_cpt_lf *lf);
-int __roc_api roc_cpt_lmtline_init(struct roc_cpt *roc_cpt,
-				   struct roc_cpt_lmtline *lmtline, int lf_id);
+int __roc_api roc_cpt_lmtline_init(struct roc_cpt *roc_cpt, struct roc_cpt_lmtline *lmtline,
+				   int lf_id, bool is_dual);
 
 void __roc_api roc_cpt_parse_hdr_dump(FILE *file, const struct cpt_parse_hdr_s *cpth);
-int __roc_api roc_cpt_ctx_write(struct roc_cpt_lf *lf, void *sa_dptr,
-				void *sa_cptr, uint16_t sa_len);
+int __roc_api roc_cpt_ctx_write(struct roc_cpt_lf *lf, void *sa_dptr, void *sa_cptr,
+				uint16_t sa_len);
 
 void __roc_api roc_cpt_int_misc_cb_register(roc_cpt_int_misc_cb_t cb, void *args);
 int __roc_api roc_cpt_int_misc_cb_unregister(roc_cpt_int_misc_cb_t cb, void *args);
diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index 0d5a9ab5ef..9d6ac06bd2 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -12,11 +12,6 @@
 #include <ethdev_driver.h>
 
 #include "roc_cpt.h"
-#if defined(__aarch64__)
-#include "roc_io.h"
-#else
-#include "roc_io_generic.h"
-#endif
 #include "roc_idev.h"
 #include "roc_sso.h"
 #include "roc_sso_dp.h"
@@ -40,8 +35,8 @@
 
 /* Holds information required to send crypto operations in one burst */
 struct ops_burst {
-	struct rte_crypto_op *op[CN10K_PKTS_PER_LOOP];
-	uint64_t w2[CN10K_PKTS_PER_LOOP];
+	struct rte_crypto_op *op[CN10K_CPT_PKTS_PER_LOOP];
+	uint64_t w2[CN10K_CPT_PKTS_PER_LOOP];
 	struct cn10k_sso_hws *ws;
 	struct cnxk_cpt_qp *qp;
 	uint16_t nb_ops;
@@ -55,54 +50,6 @@ struct vec_request {
 	uint64_t w2;
 };
 
-static __rte_always_inline void __rte_hot
-cn10k_cpt_lmtst_dual_submit(uint64_t *io_addr, const uint16_t lmt_id, int *i)
-{
-	uint64_t lmt_arg;
-
-	/* Check if the total number of instructions is odd or even. */
-	const int flag_odd = *i & 0x1;
-
-	/* Reduce i by 1 when odd number of instructions.*/
-	*i -= flag_odd;
-
-	if (*i > 2 * CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_DUAL_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, *io_addr);
-		lmt_arg = ROC_CN10K_DUAL_CPT_LMT_ARG | (*i / 2 - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, *io_addr);
-		if (flag_odd) {
-			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
-				   (ROC_CN10K_CPT_INST_DW_M1 << 4);
-			lmt_arg = (uint64_t)(lmt_id + *i / 2);
-			roc_lmt_submit_steorl(lmt_arg, *io_addr);
-			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
-				   (ROC_CN10K_TWO_CPT_INST_DW_M1 << 4);
-			*i += 1;
-		}
-	} else {
-		if (*i != 0) {
-			lmt_arg =
-				ROC_CN10K_DUAL_CPT_LMT_ARG | (*i / 2 - 1) << 12 | (uint64_t)lmt_id;
-			roc_lmt_submit_steorl(lmt_arg, *io_addr);
-		}
-
-		if (flag_odd) {
-			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
-				   (ROC_CN10K_CPT_INST_DW_M1 << 4);
-			lmt_arg = (uint64_t)(lmt_id + *i / 2);
-			roc_lmt_submit_steorl(lmt_arg, *io_addr);
-			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
-				   (ROC_CN10K_TWO_CPT_INST_DW_M1 << 4);
-			*i += 1;
-		}
-	}
-
-	rte_io_wmb();
-}
-
 static inline struct cnxk_se_sess *
 cn10k_cpt_sym_temp_sess_create(struct cnxk_cpt_qp *qp, struct rte_crypto_op *op)
 {
@@ -385,8 +332,8 @@ static uint16_t
 cn10k_cpt_enqueue_burst(void *qptr, struct rte_crypto_op **ops, uint16_t nb_ops,
 			const bool is_sg_ver2)
 {
-	uint64_t lmt_base, lmt_arg, io_addr;
 	struct cpt_inflight_req *infl_req;
+	uint64_t head, lmt_base, io_addr;
 	uint16_t nb_allowed, count = 0;
 	struct cnxk_cpt_qp *qp = qptr;
 	struct pending_queue *pend_q;
@@ -394,7 +341,6 @@ cn10k_cpt_enqueue_burst(void *qptr, struct rte_crypto_op **ops, uint16_t nb_ops,
 	union cpt_fc_write_s fc;
 	uint64_t *fc_addr;
 	uint16_t lmt_id;
-	uint64_t head;
 	int ret, i;
 
 	pend_q = &qp->pend_q;
@@ -424,11 +370,11 @@ cn10k_cpt_enqueue_burst(void *qptr, struct rte_crypto_op **ops, uint16_t nb_ops,
 		goto pend_q_commit;
 	}
 
-	for (i = 0; i < RTE_MIN(CN10K_PKTS_PER_LOOP, nb_ops); i++) {
+	for (i = 0; i < RTE_MIN(CN10K_CPT_PKTS_PER_LOOP, nb_ops); i++) {
 		infl_req = &pend_q->req_queue[head];
 		infl_req->op_flags = 0;
 
-		ret = cn10k_cpt_fill_inst(qp, ops + i, &inst[2 * i], infl_req, is_sg_ver2);
+		ret = cn10k_cpt_fill_inst(qp, ops + i, &inst[i], infl_req, is_sg_ver2);
 		if (unlikely(ret != 1)) {
 			plt_dp_err("Could not process op: %p", ops + i);
 			if (i == 0)
@@ -439,24 +385,12 @@ cn10k_cpt_enqueue_burst(void *qptr, struct rte_crypto_op **ops, uint16_t nb_ops,
 		pending_queue_advance(&head, pq_mask);
 	}
 
-	if (i > CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	} else {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	}
-
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
-	if (nb_ops - i > 0 && i == CN10K_PKTS_PER_LOOP) {
-		nb_ops -= i;
-		ops += i;
-		count += i;
+	if (nb_ops - i > 0 && i == CN10K_CPT_PKTS_PER_LOOP) {
+		nb_ops -= CN10K_CPT_PKTS_PER_LOOP;
+		ops += CN10K_CPT_PKTS_PER_LOOP;
+		count += CN10K_CPT_PKTS_PER_LOOP;
 		goto again;
 	}
 
@@ -631,7 +565,7 @@ cn10k_cpt_vec_pkt_submission_timeout_handle(void)
 static inline void
 cn10k_cpt_vec_submit(struct vec_request vec_tbl[], uint16_t vec_tbl_len, struct cnxk_cpt_qp *qp)
 {
-	uint64_t lmt_base, lmt_arg, lmt_id, io_addr;
+	uint64_t lmt_base, lmt_id, io_addr;
 	union cpt_fc_write_s fc;
 	struct cpt_inst_s *inst;
 	uint16_t burst_size;
@@ -659,7 +593,7 @@ cn10k_cpt_vec_submit(struct vec_request vec_tbl[], uint16_t vec_tbl_len, struct
 again:
 	burst_size = RTE_MIN(CN10K_PKTS_PER_STEORL, vec_tbl_len);
 	for (i = 0; i < burst_size; i++)
-		cn10k_cpt_vec_inst_fill(&vec_tbl[i], &inst[i * 2], qp, vec_tbl[0].w7);
+		cn10k_cpt_vec_inst_fill(&vec_tbl[i], &inst[i], qp, vec_tbl[0].w7);
 
 	do {
 		fc.u64[0] = __atomic_load_n(fc_addr, __ATOMIC_RELAXED);
@@ -669,10 +603,7 @@ cn10k_cpt_vec_submit(struct vec_request vec_tbl[], uint16_t vec_tbl_len, struct
 			cn10k_cpt_vec_pkt_submission_timeout_handle();
 	} while (true);
 
-	lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | lmt_id;
-	roc_lmt_submit_steorl(lmt_arg, io_addr);
-
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
 	vec_tbl_len -= i;
 
@@ -686,12 +617,12 @@ static inline int
 ca_lmtst_vec_submit(struct ops_burst *burst, struct vec_request vec_tbl[], uint16_t *vec_tbl_len,
 		    const bool is_sg_ver2)
 {
-	struct cpt_inflight_req *infl_reqs[CN10K_PKTS_PER_LOOP];
-	uint64_t lmt_base, lmt_arg, io_addr;
+	struct cpt_inflight_req *infl_reqs[CN10K_CPT_PKTS_PER_LOOP];
 	uint16_t lmt_id, len = *vec_tbl_len;
 	struct cpt_inst_s *inst, *inst_base;
 	struct cpt_inflight_req *infl_req;
 	struct rte_event_vector *vec;
+	uint64_t lmt_base, io_addr;
 	union cpt_fc_write_s fc;
 	struct cnxk_cpt_qp *qp;
 	uint64_t *fc_addr;
@@ -728,7 +659,7 @@ ca_lmtst_vec_submit(struct ops_burst *burst, struct vec_request vec_tbl[], uint1
 	}
 
 	for (i = 0; i < burst->nb_ops; i++) {
-		inst = &inst_base[2 * i];
+		inst = &inst_base[i];
 		infl_req = infl_reqs[i];
 		infl_req->op_flags = 0;
 
@@ -788,24 +719,12 @@ next_op:;
 	if (CNXK_TT_FROM_TAG(burst->ws->gw_rdata) == SSO_TT_ORDERED)
 		roc_sso_hws_head_wait(burst->ws->base);
 
-	if (i > CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	} else {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	}
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
 	/* Store w7 of last successfully filled instruction */
 	inst = &inst_base[2 * (i - 1)];
 	vec_tbl[0].w7 = inst->w7;
 
-	rte_io_wmb();
-
 put:
 	if (i != burst->nb_ops)
 		rte_mempool_put_bulk(qp->ca.req_mp, (void *)&infl_reqs[i], burst->nb_ops - i);
@@ -818,10 +737,10 @@ next_op:;
 static inline uint16_t
 ca_lmtst_burst_submit(struct ops_burst *burst, const bool is_sg_ver2)
 {
-	struct cpt_inflight_req *infl_reqs[CN10K_PKTS_PER_LOOP];
-	uint64_t lmt_base, lmt_arg, io_addr;
+	struct cpt_inflight_req *infl_reqs[CN10K_CPT_PKTS_PER_LOOP];
 	struct cpt_inst_s *inst, *inst_base;
 	struct cpt_inflight_req *infl_req;
+	uint64_t lmt_base, io_addr;
 	union cpt_fc_write_s fc;
 	struct cnxk_cpt_qp *qp;
 	uint64_t *fc_addr;
@@ -852,7 +771,7 @@ ca_lmtst_burst_submit(struct ops_burst *burst, const bool is_sg_ver2)
 	}
 
 	for (i = 0; i < burst->nb_ops; i++) {
-		inst = &inst_base[2 * i];
+		inst = &inst_base[i];
 		infl_req = infl_reqs[i];
 		infl_req->op_flags = 0;
 
@@ -889,19 +808,7 @@ ca_lmtst_burst_submit(struct ops_burst *burst, const bool is_sg_ver2)
 	if (CNXK_TT_FROM_TAG(burst->ws->gw_rdata) == SSO_TT_ORDERED)
 		roc_sso_hws_head_wait(burst->ws->base);
 
-	if (i > CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	} else {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	}
-
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
 put:
 	if (unlikely(i != burst->nb_ops))
@@ -963,7 +870,7 @@ cn10k_cpt_crypto_adapter_enqueue(void *ws, struct rte_event ev[], uint16_t nb_ev
 		burst.op[burst.nb_ops] = op;
 
 		/* Max nb_ops per burst check */
-		if (++burst.nb_ops == CN10K_PKTS_PER_LOOP) {
+		if (++burst.nb_ops == CN10K_CPT_PKTS_PER_LOOP) {
 			if (is_vector)
 				submitted = ca_lmtst_vec_submit(&burst, vec_tbl, &vec_tbl_len,
 								is_sg_ver2);
@@ -1460,8 +1367,6 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 
 	vf = cdev->data->dev_private;
 
-	const int nb_pkts_per_loop = 2 * CN10K_PKTS_PER_LOOP;
-
 	lmt_base = vf->rx_inj_lmtline.lmt_base;
 	io_addr = vf->rx_inj_lmtline.io_addr;
 	fc_addr = vf->rx_inj_lmtline.fc_addr;
@@ -1481,7 +1386,7 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 	if (unlikely(fc.s.qsize > fc_thresh))
 		goto exit;
 
-	for (; i < RTE_MIN(nb_pkts_per_loop, nb_pkts); i++) {
+	for (; i < RTE_MIN(CN10K_CPT_PKTS_PER_LOOP, nb_pkts); i++) {
 
 		m = pkts[i];
 		sec_sess = (struct cn10k_sec_session *)sess[i];
@@ -1542,11 +1447,11 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 
 	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
-	if (nb_pkts - i > 0 && i == nb_pkts_per_loop) {
-		nb_pkts -= nb_pkts_per_loop;
-		pkts += nb_pkts_per_loop;
-		count += nb_pkts_per_loop;
-		sess += nb_pkts_per_loop;
+	if (nb_pkts - i > 0 && i == CN10K_CPT_PKTS_PER_LOOP) {
+		nb_pkts -= CN10K_CPT_PKTS_PER_LOOP;
+		pkts += CN10K_CPT_PKTS_PER_LOOP;
+		count += CN10K_CPT_PKTS_PER_LOOP;
+		sess += CN10K_CPT_PKTS_PER_LOOP;
 		goto again;
 	}
 
@@ -1645,8 +1550,8 @@ cn10k_cpt_raw_enqueue_burst(void *qpair, uint8_t *drv_ctx, struct rte_crypto_sym
 			    const bool is_sgv2)
 {
 	uint16_t lmt_id, nb_allowed, nb_ops = vec->num;
-	uint64_t lmt_base, lmt_arg, io_addr, head;
 	struct cpt_inflight_req *infl_req;
+	uint64_t lmt_base, io_addr, head;
 	struct cnxk_cpt_qp *qp = qpair;
 	struct cnxk_sym_dp_ctx *dp_ctx;
 	struct pending_queue *pend_q;
@@ -1683,7 +1588,7 @@ cn10k_cpt_raw_enqueue_burst(void *qpair, uint8_t *drv_ctx, struct rte_crypto_sym
 		goto pend_q_commit;
 	}
 
-	for (i = 0; i < RTE_MIN(CN10K_PKTS_PER_LOOP, nb_ops); i++) {
+	for (i = 0; i < RTE_MIN(CN10K_CPT_PKTS_PER_LOOP, nb_ops); i++) {
 		struct cnxk_iov iov;
 
 		index = count + i;
@@ -1691,7 +1596,7 @@ cn10k_cpt_raw_enqueue_burst(void *qpair, uint8_t *drv_ctx, struct rte_crypto_sym
 		infl_req->op_flags = 0;
 
 		cnxk_raw_burst_to_iov(vec, &ofs, index, &iov);
-		ret = cn10k_cpt_raw_fill_inst(&iov, qp, dp_ctx, &inst[2 * i], infl_req,
+		ret = cn10k_cpt_raw_fill_inst(&iov, qp, dp_ctx, &inst[i], infl_req,
 					      user_data[index], is_sgv2);
 		if (unlikely(ret != 1)) {
 			plt_dp_err("Could not process vec: %d", index);
@@ -1705,21 +1610,9 @@ cn10k_cpt_raw_enqueue_burst(void *qpair, uint8_t *drv_ctx, struct rte_crypto_sym
 		pending_queue_advance(&head, pq_mask);
 	}
 
-	if (i > CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	} else {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	}
-
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
-	if (nb_ops - i > 0 && i == CN10K_PKTS_PER_LOOP) {
+	if (nb_ops - i > 0 && i == CN10K_CPT_PKTS_PER_LOOP) {
 		nb_ops -= i;
 		count += i;
 		goto again;
@@ -1760,8 +1653,8 @@ cn10k_cpt_raw_enqueue(void *qpair, uint8_t *drv_ctx, struct rte_crypto_vec *data
 		      struct rte_crypto_va_iova_ptr *aad_or_auth_iv, void *user_data,
 		      const bool is_sgv2)
 {
-	uint64_t lmt_base, lmt_arg, io_addr, head;
 	struct cpt_inflight_req *infl_req;
+	uint64_t lmt_base, io_addr, head;
 	struct cnxk_cpt_qp *qp = qpair;
 	struct cnxk_sym_dp_ctx *dp_ctx;
 	uint16_t lmt_id, nb_allowed;
@@ -1769,7 +1662,7 @@ cn10k_cpt_raw_enqueue(void *qpair, uint8_t *drv_ctx, struct rte_crypto_vec *data
 	union cpt_fc_write_s fc;
 	struct cnxk_iov iov;
 	uint64_t *fc_addr;
-	int ret;
+	int ret, i = 1;
 
 	struct pending_queue *pend_q = &qp->pend_q;
 	const uint64_t pq_mask = pend_q->pq_mask;
@@ -1806,10 +1699,7 @@ cn10k_cpt_raw_enqueue(void *qpair, uint8_t *drv_ctx, struct rte_crypto_vec *data
 
 	pending_queue_advance(&head, pq_mask);
 
-	lmt_arg = ROC_CN10K_CPT_LMT_ARG | (uint64_t)lmt_id;
-	roc_lmt_submit_steorl(lmt_arg, io_addr);
-
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
 	pend_q->head = head;
 	pend_q->time_out = rte_get_timer_cycles() + DEFAULT_COMMAND_TIMEOUT * rte_get_timer_hz();
diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.h b/drivers/crypto/cnxk/cn10k_cryptodev_ops.h
index 406c4abc7f..be76c49a65 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.h
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.h
@@ -5,15 +5,21 @@
 #ifndef _CN10K_CRYPTODEV_OPS_H_
 #define _CN10K_CRYPTODEV_OPS_H_
 
-#include <rte_compat.h>
 #include <cryptodev_pmd.h>
+#include <rte_compat.h>
 #include <rte_cryptodev.h>
 #include <rte_eventdev.h>
 
+#if defined(__aarch64__)
+#include "roc_io.h"
+#else
+#include "roc_io_generic.h"
+#endif
+
 #include "cnxk_cryptodev.h"
 
-#define CN10K_PKTS_PER_LOOP   32
-#define CN10K_PKTS_PER_STEORL 16
+#define CN10K_PKTS_PER_STEORL	  32
+#define CN10K_LMTLINES_PER_STEORL 16
 
 extern struct rte_cryptodev_ops cn10k_cpt_ops;
 
@@ -34,4 +40,52 @@ __rte_internal
 uint16_t __rte_hot cn10k_cpt_sg_ver2_crypto_adapter_enqueue(void *ws, struct rte_event ev[],
 		uint16_t nb_events);
 
+static __rte_always_inline void __rte_hot
+cn10k_cpt_lmtst_dual_submit(uint64_t *io_addr, const uint16_t lmt_id, int *i)
+{
+	uint64_t lmt_arg;
+
+	/* Check if the total number of instructions is odd or even. */
+	const int flag_odd = *i & 0x1;
+
+	/* Reduce i by 1 when odd number of instructions.*/
+	*i -= flag_odd;
+
+	if (*i > CN10K_PKTS_PER_STEORL) {
+		lmt_arg = ROC_CN10K_DUAL_CPT_LMT_ARG | (CN10K_LMTLINES_PER_STEORL - 1) << 12 |
+			  (uint64_t)lmt_id;
+		roc_lmt_submit_steorl(lmt_arg, *io_addr);
+		lmt_arg = ROC_CN10K_DUAL_CPT_LMT_ARG |
+			  (*i / 2 - CN10K_LMTLINES_PER_STEORL - 1) << 12 |
+			  (uint64_t)(lmt_id + CN10K_LMTLINES_PER_STEORL);
+		roc_lmt_submit_steorl(lmt_arg, *io_addr);
+		if (flag_odd) {
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_CPT_INST_DW_M1 << 4);
+			lmt_arg = (uint64_t)(lmt_id + *i / 2);
+			roc_lmt_submit_steorl(lmt_arg, *io_addr);
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_TWO_CPT_INST_DW_M1 << 4);
+			*i += 1;
+		}
+	} else {
+		if (*i != 0) {
+			lmt_arg =
+				ROC_CN10K_DUAL_CPT_LMT_ARG | (*i / 2 - 1) << 12 | (uint64_t)lmt_id;
+			roc_lmt_submit_steorl(lmt_arg, *io_addr);
+		}
+
+		if (flag_odd) {
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_CPT_INST_DW_M1 << 4);
+			lmt_arg = (uint64_t)(lmt_id + *i / 2);
+			roc_lmt_submit_steorl(lmt_arg, *io_addr);
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_TWO_CPT_INST_DW_M1 << 4);
+			*i += 1;
+		}
+	}
+
+	rte_io_wmb();
+}
 #endif /* _CN10K_CRYPTODEV_OPS_H_ */
diff --git a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
index 6acaa4413b..cfcfa79fdf 100644
--- a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
@@ -431,7 +431,6 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 	struct rte_pci_device *pci_dev;
 	struct cnxk_cpt_qp *qp;
 	uint32_t nb_desc;
-	uint64_t io_addr;
 	int ret;
 
 	if (dev->data->queue_pairs[qp_id] != NULL)
@@ -467,7 +466,7 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 
 	roc_cpt->lf[qp_id] = &qp->lf;
 
-	ret = roc_cpt_lmtline_init(roc_cpt, &qp->lmtline, qp_id);
+	ret = roc_cpt_lmtline_init(roc_cpt, &qp->lmtline, qp_id, true);
 	if (ret < 0) {
 		roc_cpt->lf[qp_id] = NULL;
 		plt_err("Could not init lmtline for queue pair %d", qp_id);
@@ -478,7 +477,7 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 	dev->data->queue_pairs[qp_id] = qp;
 
 	if (qp_id == vf->rx_inject_qp) {
-		ret = roc_cpt_lmtline_init(roc_cpt, &vf->rx_inj_lmtline, vf->rx_inject_qp);
+		ret = roc_cpt_lmtline_init(roc_cpt, &vf->rx_inj_lmtline, vf->rx_inject_qp, true);
 		if (ret) {
 			plt_err("Could not init lmtline Rx inject");
 			goto exit;
@@ -486,14 +485,6 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 
 		vf->rx_inj_sso_pf_func = roc_idev_nix_inl_dev_pffunc_get();
 
-		/* Update IO addr to enable dual submission */
-		io_addr = vf->rx_inj_lmtline.io_addr;
-		io_addr = (io_addr & ~(uint64_t)(0x7 << 4)) | ROC_CN10K_TWO_CPT_INST_DW_M1 << 4;
-		vf->rx_inj_lmtline.io_addr = io_addr;
-
-		/* Update FC threshold to reflect dual submission */
-		vf->rx_inj_lmtline.fc_thresh -= 32;
-
 		/* Block the queue for other submissions */
 		qp->pend_q.pq_mask = 0;
 	}
@@ -969,44 +960,28 @@ rte_pmd_cnxk_crypto_qptr_get(uint8_t dev_id, uint16_t qp_id)
 static inline void
 cnxk_crypto_cn10k_submit(void *qptr, void *inst, uint16_t nb_inst)
 {
-	uint64_t lmt_base, lmt_arg, io_addr;
 	struct cnxk_cpt_qp *qp = qptr;
-	uint16_t i, j, lmt_id;
+	uint64_t lmt_base, io_addr;
+	uint16_t lmt_id;
 	void *lmt_dst;
+	int i;
 
 	lmt_base = qp->lmtline.lmt_base;
 	io_addr = qp->lmtline.io_addr;
 
 	ROC_LMT_BASE_ID_GET(lmt_base, lmt_id);
 
-again:
-	i = RTE_MIN(nb_inst, CN10K_PKTS_PER_LOOP);
 	lmt_dst = PLT_PTR_CAST(lmt_base);
+again:
+	i = RTE_MIN(nb_inst, CN10K_CPT_PKTS_PER_LOOP);
 
-	for (j = 0; j < i; j++) {
-		rte_memcpy(lmt_dst, inst, sizeof(struct cpt_inst_s));
-		inst = RTE_PTR_ADD(inst, sizeof(struct cpt_inst_s));
-		lmt_dst = RTE_PTR_ADD(lmt_dst, 2 * sizeof(struct cpt_inst_s));
-	}
-
-	rte_io_wmb();
-
-	if (i > CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	} else {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	}
+	memcpy(lmt_dst, inst, i * sizeof(struct cpt_inst_s));
 
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
 	if (nb_inst - i > 0) {
-		nb_inst -= i;
+		nb_inst -= CN10K_CPT_PKTS_PER_LOOP;
+		inst = RTE_PTR_ADD(inst, CN10K_CPT_PKTS_PER_LOOP * sizeof(struct cpt_inst_s));
 		goto again;
 	}
 }
diff --git a/drivers/crypto/cnxk/cnxk_cryptodev_ops.h b/drivers/crypto/cnxk/cnxk_cryptodev_ops.h
index 9de7e432e4..caf6ac35e5 100644
--- a/drivers/crypto/cnxk/cnxk_cryptodev_ops.h
+++ b/drivers/crypto/cnxk/cnxk_cryptodev_ops.h
@@ -25,6 +25,8 @@
 
 #define MOD_INC(i, l) ((i) == (l - 1) ? (i) = 0 : (i)++)
 
+#define CN10K_CPT_PKTS_PER_LOOP	  64
+
 /* Macros to form words in CPT instruction */
 #define CNXK_CPT_INST_W2(tag, tt, grp, rvu_pf_func)                            \
 	((tag) | ((uint64_t)(tt) << 32) | ((uint64_t)(grp) << 34) |            \
diff --git a/drivers/event/cnxk/cnxk_eventdev_adptr.c b/drivers/event/cnxk/cnxk_eventdev_adptr.c
index 98db11ad61..2c049e7041 100644
--- a/drivers/event/cnxk/cnxk_eventdev_adptr.c
+++ b/drivers/event/cnxk/cnxk_eventdev_adptr.c
@@ -632,7 +632,7 @@ crypto_adapter_qp_setup(const struct rte_cryptodev *cdev, struct cnxk_cpt_qp *qp
 	 * simultaneous enqueue from all available cores.
 	 */
 	if (roc_model_is_cn10k())
-		nb_desc_min = rte_lcore_count() * 32;
+		nb_desc_min = rte_lcore_count() * CN10K_CPT_PKTS_PER_LOOP;
 	else
 		nb_desc_min = rte_lcore_count() * 2;
 
@@ -707,7 +707,7 @@ crypto_adapter_qp_free(struct cnxk_cpt_qp *qp)
 	rte_mempool_free(qp->ca.req_mp);
 	qp->ca.enabled = false;
 
-	ret = roc_cpt_lmtline_init(qp->lf.roc_cpt, &qp->lmtline, qp->lf.lf_id);
+	ret = roc_cpt_lmtline_init(qp->lf.roc_cpt, &qp->lmtline, qp->lf.lf_id, true);
 	if (ret < 0) {
 		plt_err("Could not reset lmtline for queue pair %d", qp->lf.lf_id);
 		return ret;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v2 08/12] crypto/cnxk: add dual submission in Rx inject
  2024-06-24  6:23   ` [PATCH v2 08/12] crypto/cnxk: add dual submission in Rx inject Aakash Sasidharan
@ 2024-06-26  6:41     ` Akhil Goyal
  0 siblings, 0 replies; 41+ messages in thread
From: Akhil Goyal @ 2024-06-26  6:41 UTC (permalink / raw)
  To: Aakash Sasidharan, Nithin Kumar Dabilpuram,
	Kiran Kumar Kokkilagadda, Sunil Kumar Kori,
	Satha Koteswara Rao Kottidi, Harman Kalra, Ankur Dwivedi,
	Anoob Joseph, Tejasree Kondoj
  Cc: Jerin Jacob, Vidya Sagar Velumuri, Aakash Sasidharan, dev

[-- Attachment #1: Type: text/plain, Size: 3270 bytes --]

> Subject: [PATCH v2 08/12] crypto/cnxk: add dual submission in Rx inject
> 
> From: Anoob Joseph <anoobj@marvell.com>
> 
> Add dual submission to CPT in Rx inject path.
> 
> Signed-off-by: Anoob Joseph <anoobj@marvell.com>
> Signed-off-by: Vidya Sagar Velumuri <vvelumuri@marvell.com>
Please fix 

[146/241] Compiling C object 'drivers/a715181@@tmp_rte_crypto_cnxk@sta/crypto_cnxk_cn10k_cryptodev_ops.c.o'.
FAILED: drivers/a715181@@tmp_rte_crypto_cnxk@sta/crypto_cnxk_cn10k_cryptodev_ops.c.o
ccache clang -Idrivers/a715181@@tmp_rte_crypto_cnxk@sta -Idrivers -I../drivers -Idrivers/crypto/cnxk -I../drivers/crypto/cnxk -Idrivers/crypto/cnxk/../../../lib/net -I../drivers/crypto/cnxk/../../../lib/net -Idrivers/crypto/cnxk/../../event/cnxk -I../drivers/crypto/cnxk/../../event/cnxk -Ilib/cryptodev -I../lib/cryptodev -I. -I../ -Iconfig -I../config -Ilib/eal/include -I../lib/eal/include -Ilib/eal/linux/include -I../lib/eal/linux/include -Ilib/eal/x86/include -I../lib/eal/x86/include -Ilib/eal/common -I../lib/eal/common -Ilib/eal -I../lib/eal -Ilib/kvargs -I../lib/kvargs -Ilib/log -I../lib/log -Ilib/telemetry/../metrics -I../lib/telemetry/../metrics -Ilib/telemetry -I../lib/telemetry -Ilib/mbuf -I../lib/mbuf -Ilib/mempool -I../lib/mempool -Ilib/ring -I../lib/ring -Ilib/rcu -I../lib/rcu -Idrivers/bus/pci -I../drivers/bus/pci -I../drivers/bus/pci/linux -Ilib/pci -I../lib/pci -Idrivers/common/cnxk -I../drivers/common/cnxk -Idrivers/common/cnxk/../../bus/pci -I../drivers/common/cnxk/../../bus/pci -Idrivers/common/cnxk/../../../lib/net -I../drivers/common/cnxk/../../../lib/net -Idrivers/common/cnxk/../../../lib/ethdev -I../drivers/common/cnxk/../../../lib/ethdev -Idrivers/common/cnxk/../../../lib/meter -I../drivers/common/cnxk/../../../lib/meter -Ilib/security -I../lib/security -Ilib/net -I../lib/net -Ilib/eventdev -I../lib/eventdev -Ilib/ethdev -I../lib/ethdev -Ilib/meter -I../lib/meter -Ilib/hash -I../lib/hash -Ilib/timer -I../lib/timer -Ilib/dmadev -I../lib/dmadev -Xclang -fcolor-diagnostics -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -Werror -std=c11 -O2 -g -include rte_config.h -Wcast-qual -Wdeprecated -Wformat -Wformat-nonliteral -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef -Wwrite-strings -Wno-address-of-packed-member -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=native -mrtm -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -DLA_IPSEC_DEBUG -DCNXK_CRYPTODEV_DEBUG -DRTE_LOG_DEFAULT_LOGTYPE=pmd.crypto.cnxk -DRTE_ANNOTATE_LOCKS -Wthread-safety -MD -MQ 'drivers/a715181@@tmp_rte_crypto_cnxk@sta/crypto_cnxk_cn10k_cryptodev_ops.c.o' -MF 'drivers/a715181@@tmp_rte_crypto_cnxk@sta/crypto_cnxk_cn10k_cryptodev_ops.c.o.d' -o 'drivers/a715181@@tmp_rte_crypto_cnxk@sta/crypto_cnxk_cn10k_cryptodev_ops.c.o' -c ../drivers/crypto/cnxk/cn10k_cryptodev_ops.c
../drivers/crypto/cnxk/cn10k_cryptodev_ops.c:59:1: error: unused function 'cn10k_cpt_lmtst_dual_submit' [-Werror,-Wunused-function]
cn10k_cpt_lmtst_dual_submit(uint64_t *io_addr, const uint16_t lmt_id, int *i)
^
1 error generated.
[167/241] Compiling C object 'drivers/a715181@@tmp_rte_crypto_cnxk@sta/crypto_cnxk_cn9k_cryptodev_ops.c.o'.

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 15799 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD
  2024-06-24  6:23 ` [PATCH v2 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                     ` (11 preceding siblings ...)
  2024-06-24  6:24   ` [PATCH v2 12/12] crypto/cnxk: enable dual submission to CPT Aakash Sasidharan
@ 2024-06-26 10:55   ` Aakash Sasidharan
  2024-06-26 10:55     ` [PATCH v3 01/12] common/cnxk: add comments to denote skipped entries Aakash Sasidharan
                       ` (12 more replies)
  12 siblings, 13 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-26 10:55 UTC (permalink / raw)
  Cc: gakhil, jerinj, anoobj, vvelumuri, asasidharan, dev

v3:
* Fix compilation error by moving function meant for arm64 under
  "#if defined(RTE_ARCH_ARM64)" guard.
v2:
* Fix compilation errors observed with arm gcc-13.

This series adds improvements to CNXK crypto PMD and fixes aes-gcm zero
length input failure.

Aakash Sasidharan (1):
  crypto/cnxk: fix aes-gcm zero len input cases

Anoob Joseph (11):
  common/cnxk: add comments to denote skipped entries
  crypto/cnxk: update version map file with PMD APIs
  common/cnxk: make inline dev PF func get as idev API
  crypto/cnxk: add flow control in Rx inject path
  crypto/cnxk: use SSO PF func of inline device in inst
  crypto/cnxk: use NEON for Rx inject inst preparation
  crypto/cnxk: remove init of CPT result field in packet
  crypto/cnxk: add dual submission in Rx inject
  crypto/cnxk: update sess pointer for next iteration
  crypto/cnxk: make pack IV variable as const
  crypto/cnxk: enable dual submission to CPT

 drivers/common/cnxk/roc_ae.c              |   6 +-
 drivers/common/cnxk/roc_ae_fpm_tables.c   |   6 +-
 drivers/common/cnxk/roc_cpt.c             |  17 +-
 drivers/common/cnxk/roc_cpt.h             |  51 +++--
 drivers/common/cnxk/roc_idev.c            |   6 +
 drivers/common/cnxk/roc_idev.h            |   2 +
 drivers/common/cnxk/roc_nix_inl.h         |   1 -
 drivers/common/cnxk/roc_nix_inl_dev.c     |   6 -
 drivers/common/cnxk/version.map           |   2 +-
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 234 +++++++++-------------
 drivers/crypto/cnxk/cn10k_cryptodev_ops.h |  60 +++++-
 drivers/crypto/cnxk/cnxk_cryptodev.h      |   2 +-
 drivers/crypto/cnxk/cnxk_cryptodev_ops.c  |  40 ++--
 drivers/crypto/cnxk/cnxk_cryptodev_ops.h  |   2 +
 drivers/crypto/cnxk/cnxk_se.h             |  55 ++---
 drivers/crypto/cnxk/rte_pmd_cnxk_crypto.h |   2 +
 drivers/crypto/cnxk/version.map           |   8 +
 drivers/event/cnxk/cnxk_eventdev_adptr.c  |   4 +-
 drivers/net/cnxk/cn10k_ethdev_sec.c       |   2 +-
 drivers/net/cnxk/cnxk_ethdev_telemetry.c  |   3 +-
 20 files changed, 275 insertions(+), 234 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 01/12] common/cnxk: add comments to denote skipped entries
  2024-06-26 10:55   ` [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD Aakash Sasidharan
@ 2024-06-26 10:55     ` Aakash Sasidharan
  2024-06-26 10:55     ` [PATCH v3 02/12] crypto/cnxk: update version map file with PMD APIs Aakash Sasidharan
                       ` (11 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-26 10:55 UTC (permalink / raw)
  To: Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Harman Kalra
  Cc: gakhil, jerinj, anoobj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Add comments to denote unused table entries.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/common/cnxk/roc_ae.c            | 6 +++---
 drivers/common/cnxk/roc_ae_fpm_tables.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/common/cnxk/roc_ae.c b/drivers/common/cnxk/roc_ae.c
index e6a013d7c4..7ef0efe2b3 100644
--- a/drivers/common/cnxk/roc_ae.c
+++ b/drivers/common/cnxk/roc_ae.c
@@ -151,9 +151,9 @@ const struct roc_ae_ec_group ae_ec_grp[ROC_AE_EC_ID_PMAX] = {
 			     0x3F, 0x00},
 		    .length = 66},
 	},
-	{},
-	{},
-	{},
+	{ /* ROC_AE_EC_ID_P160 */ },
+	{ /* ROC_AE_EC_ID_P320 */ },
+	{ /* ROC_AE_EC_ID_P512 */ },
 	{
 		.prime = {.data = {0xFF, 0xFF, 0xFF, 0xFE, 0xFF, 0xFF, 0xFF,
 				   0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
diff --git a/drivers/common/cnxk/roc_ae_fpm_tables.c b/drivers/common/cnxk/roc_ae_fpm_tables.c
index ead3128e7f..942657b56a 100644
--- a/drivers/common/cnxk/roc_ae_fpm_tables.c
+++ b/drivers/common/cnxk/roc_ae_fpm_tables.c
@@ -1261,9 +1261,9 @@ const struct ae_fpm_entry ae_fpm_tbl_scalar[ROC_AE_EC_ID_PMAX] = {
 		.data = ae_fpm_tbl_p521,
 		.len = sizeof(ae_fpm_tbl_p521)
 	},
-	{},
-	{},
-	{},
+	{ /* ROC_AE_EC_ID_P160 */ },
+	{ /* ROC_AE_EC_ID_P320 */ },
+	{ /* ROC_AE_EC_ID_P512 */ },
 	{
 		.data = ae_fpm_tbl_p256_sm2,
 		.len = sizeof(ae_fpm_tbl_p256_sm2)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 02/12] crypto/cnxk: update version map file with PMD APIs
  2024-06-26 10:55   ` [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD Aakash Sasidharan
  2024-06-26 10:55     ` [PATCH v3 01/12] common/cnxk: add comments to denote skipped entries Aakash Sasidharan
@ 2024-06-26 10:55     ` Aakash Sasidharan
  2024-06-26 10:55     ` [PATCH v3 03/12] common/cnxk: make inline dev PF func get as idev API Aakash Sasidharan
                       ` (10 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-26 10:55 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj, Akhil Goyal
  Cc: jerinj, vvelumuri, asasidharan, dev, stable

From: Anoob Joseph <anoobj@marvell.com>

Update version map with details of PMD APIs added.

Fixes: 26bb5c4de63e ("crypto/cnxk: add CPT raw submission PMD API")
Cc: stable@dpdk.org

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/crypto/cnxk/rte_pmd_cnxk_crypto.h | 2 ++
 drivers/crypto/cnxk/version.map           | 8 ++++++++
 2 files changed, 10 insertions(+)

diff --git a/drivers/crypto/cnxk/rte_pmd_cnxk_crypto.h b/drivers/crypto/cnxk/rte_pmd_cnxk_crypto.h
index 8b0a5ba0f2..eab1243065 100644
--- a/drivers/crypto/cnxk/rte_pmd_cnxk_crypto.h
+++ b/drivers/crypto/cnxk/rte_pmd_cnxk_crypto.h
@@ -23,6 +23,7 @@
  * @return
  *   Pointer to queue pair structure that would be the input to submit APIs.
  */
+__rte_experimental
 void *rte_pmd_cnxk_crypto_qptr_get(uint8_t dev_id, uint16_t qp_id);
 
 /**
@@ -41,6 +42,7 @@ void *rte_pmd_cnxk_crypto_qptr_get(uint8_t dev_id, uint16_t qp_id);
  * @param nb_inst
  *   Number of instructions.
  */
+__rte_experimental
 void rte_pmd_cnxk_crypto_submit(void *qptr, void *inst, uint16_t nb_inst);
 
 #endif /* _PMD_CNXK_CRYPTO_H_ */
diff --git a/drivers/crypto/cnxk/version.map b/drivers/crypto/cnxk/version.map
index 5789a6bfc9..7a77607774 100644
--- a/drivers/crypto/cnxk/version.map
+++ b/drivers/crypto/cnxk/version.map
@@ -1,3 +1,11 @@
+EXPERIMENTAL {
+	global:
+
+	# added in 24.03
+	rte_pmd_cnxk_crypto_submit;
+	rte_pmd_cnxk_crypto_qptr_get;
+};
+
 INTERNAL {
 	global:
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 03/12] common/cnxk: make inline dev PF func get as idev API
  2024-06-26 10:55   ` [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD Aakash Sasidharan
  2024-06-26 10:55     ` [PATCH v3 01/12] common/cnxk: add comments to denote skipped entries Aakash Sasidharan
  2024-06-26 10:55     ` [PATCH v3 02/12] crypto/cnxk: update version map file with PMD APIs Aakash Sasidharan
@ 2024-06-26 10:55     ` Aakash Sasidharan
  2024-06-26 10:55     ` [PATCH v3 04/12] crypto/cnxk: add flow control in Rx inject path Aakash Sasidharan
                       ` (9 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-26 10:55 UTC (permalink / raw)
  To: Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Harman Kalra
  Cc: gakhil, jerinj, anoobj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Inline PF FUNC would be required to set SSO_PF_FUNC in the instruction
for cryptodev Rx inject. Move the API to idev to allow usage of the
same.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/common/cnxk/roc_idev.c           | 6 ++++++
 drivers/common/cnxk/roc_idev.h           | 2 ++
 drivers/common/cnxk/roc_nix_inl.h        | 1 -
 drivers/common/cnxk/roc_nix_inl_dev.c    | 6 ------
 drivers/common/cnxk/version.map          | 2 +-
 drivers/net/cnxk/cn10k_ethdev_sec.c      | 2 +-
 drivers/net/cnxk/cnxk_ethdev_telemetry.c | 3 +--
 7 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/common/cnxk/roc_idev.c b/drivers/common/cnxk/roc_idev.c
index d0307c666c..0778d51d1e 100644
--- a/drivers/common/cnxk/roc_idev.c
+++ b/drivers/common/cnxk/roc_idev.c
@@ -374,3 +374,9 @@ roc_idev_nix_rx_chan_set(uint16_t port, uint16_t chan)
 	if (idev != NULL && port < PLT_MAX_ETHPORTS)
 		__atomic_store_n(&idev->inl_rx_inj_cfg.chan[port], chan, __ATOMIC_RELEASE);
 }
+
+uint16_t
+roc_idev_nix_inl_dev_pffunc_get(void)
+{
+	return nix_inl_dev_pffunc_get();
+}
diff --git a/drivers/common/cnxk/roc_idev.h b/drivers/common/cnxk/roc_idev.h
index 00664eaed6..fc0f7db54e 100644
--- a/drivers/common/cnxk/roc_idev.h
+++ b/drivers/common/cnxk/roc_idev.h
@@ -27,4 +27,6 @@ uint8_t __roc_api roc_idev_nix_rx_inject_get(uint16_t port);
 void __roc_api roc_idev_nix_rx_inject_set(uint16_t port, uint8_t enable);
 uint16_t *__roc_api roc_idev_nix_rx_chan_base_get(void);
 void __roc_api roc_idev_nix_rx_chan_set(uint16_t port, uint16_t chan);
+
+uint16_t __roc_api roc_idev_nix_inl_dev_pffunc_get(void);
 #endif /* _ROC_IDEV_H_ */
diff --git a/drivers/common/cnxk/roc_nix_inl.h b/drivers/common/cnxk/roc_nix_inl.h
index ab0965e512..1a4bf8808c 100644
--- a/drivers/common/cnxk/roc_nix_inl.h
+++ b/drivers/common/cnxk/roc_nix_inl.h
@@ -112,7 +112,6 @@ void __roc_api roc_nix_inl_dev_lock(void);
 void __roc_api roc_nix_inl_dev_unlock(void);
 int __roc_api roc_nix_inl_dev_xaq_realloc(uint64_t aura_handle);
 int __roc_api roc_nix_inl_dev_stats_get(struct roc_nix_stats *stats);
-uint16_t __roc_api roc_nix_inl_dev_pffunc_get(void);
 int __roc_api roc_nix_inl_dev_cpt_setup(bool use_inl_dev_sso);
 int __roc_api roc_nix_inl_dev_cpt_release(void);
 bool __roc_api roc_nix_inl_dev_is_multi_channel(void);
diff --git a/drivers/common/cnxk/roc_nix_inl_dev.c b/drivers/common/cnxk/roc_nix_inl_dev.c
index 60e6a43033..e2bbe3a67b 100644
--- a/drivers/common/cnxk/roc_nix_inl_dev.c
+++ b/drivers/common/cnxk/roc_nix_inl_dev.c
@@ -34,12 +34,6 @@ nix_inl_dev_pffunc_get(void)
 	return 0;
 }
 
-uint16_t
-roc_nix_inl_dev_pffunc_get(void)
-{
-	return nix_inl_dev_pffunc_get();
-}
-
 static void
 nix_inl_selftest_work_cb(uint64_t *gw, void *args, uint32_t soft_exp_event)
 {
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index eac2ea9ff8..f98738d07e 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -112,6 +112,7 @@ INTERNAL {
 	roc_idev_npa_nix_get;
 	roc_idev_num_lmtlines_get;
 	roc_idev_nix_inl_meta_aura_get;
+	roc_idev_nix_inl_dev_pffunc_get;
 	roc_idev_nix_list_get;
 	roc_idev_nix_rx_chan_base_get;
 	roc_idev_nix_rx_chan_set;
@@ -244,7 +245,6 @@ INTERNAL {
 	roc_nix_inl_dev_is_probed;
 	roc_nix_inl_dev_stats_get;
 	roc_nix_inl_dev_lock;
-	roc_nix_inl_dev_pffunc_get;
 	roc_nix_inl_dev_rq;
 	roc_nix_inl_dev_rq_get;
 	roc_nix_inl_dev_rq_put;
diff --git a/drivers/net/cnxk/cn10k_ethdev_sec.c b/drivers/net/cnxk/cn10k_ethdev_sec.c
index b8b0da5ea9..5e509e97d4 100644
--- a/drivers/net/cnxk/cn10k_ethdev_sec.c
+++ b/drivers/net/cnxk/cn10k_ethdev_sec.c
@@ -1360,7 +1360,7 @@ cn10k_eth_sec_rx_inject_config(void *device, uint16_t port_id, bool enable)
 	inj_cfg->io_addr = inl_lf->io_addr;
 	inj_cfg->lmt_base = nix->lmt_base;
 	channel = roc_nix_get_base_chan(nix);
-	pf_func = roc_nix_inl_dev_pffunc_get();
+	pf_func = roc_idev_nix_inl_dev_pffunc_get();
 	inj_cfg->cmd_w0 = pf_func << 48 | inj_match_id << 32 | channel << 4;
 
 	return 0;
diff --git a/drivers/net/cnxk/cnxk_ethdev_telemetry.c b/drivers/net/cnxk/cnxk_ethdev_telemetry.c
index 3027ca4735..a1958185f2 100644
--- a/drivers/net/cnxk/cnxk_ethdev_telemetry.c
+++ b/drivers/net/cnxk/cnxk_ethdev_telemetry.c
@@ -65,8 +65,7 @@ ethdev_tel_handle_info(const char *cmd __rte_unused,
 			info = &eth_info.info;
 			dev = cnxk_eth_pmd_priv(eth_dev);
 			if (dev) {
-				info->inl_dev_pf_func =
-					roc_nix_inl_dev_pffunc_get();
+				info->inl_dev_pf_func = roc_idev_nix_inl_dev_pffunc_get();
 				info->pf_func = roc_nix_get_pf_func(&dev->nix);
 				info->max_mac_entries = dev->max_mac_entries;
 				info->dmac_filter_ena = dev->dmac_filter_enable;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 04/12] crypto/cnxk: add flow control in Rx inject path
  2024-06-26 10:55   ` [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                       ` (2 preceding siblings ...)
  2024-06-26 10:55     ` [PATCH v3 03/12] common/cnxk: make inline dev PF func get as idev API Aakash Sasidharan
@ 2024-06-26 10:55     ` Aakash Sasidharan
  2024-06-26 10:55     ` [PATCH v3 05/12] crypto/cnxk: use SSO PF func of inline device in inst Aakash Sasidharan
                       ` (8 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-26 10:55 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Add flow control in Rx inject path to avoid over submission to CPT.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index 07bd13b16d..673220977c 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -1405,8 +1405,10 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 	struct rte_cryptodev *cdev = dev;
 	union cpt_res_s *hw_res = NULL;
 	struct cpt_inst_s *inst;
+	union cpt_fc_write_s fc;
 	struct cnxk_cpt_vf *vf;
 	struct rte_mbuf *m;
+	uint64_t *fc_addr;
 	uint64_t dptr;
 	int i;
 
@@ -1418,13 +1420,24 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 
 	lmt_base = vf->rx_inj_lmtline.lmt_base;
 	io_addr = vf->rx_inj_lmtline.io_addr;
+	fc_addr = vf->rx_inj_lmtline.fc_addr;
 
 	ROC_LMT_BASE_ID_GET(lmt_base, lmt_id);
 	pf_func = vf->rx_inj_pf_func;
 
+	const uint32_t fc_thresh = vf->rx_inj_lmtline.fc_thresh;
+
 again:
+	fc.u64[0] =
+		rte_atomic_load_explicit((RTE_ATOMIC(uint64_t) *)fc_addr, rte_memory_order_relaxed);
 	inst = (struct cpt_inst_s *)lmt_base;
-	for (i = 0; i < RTE_MIN(CN10K_PKTS_PER_LOOP, nb_pkts); i++) {
+
+	i = 0;
+
+	if (unlikely(fc.s.qsize > fc_thresh))
+		goto exit;
+
+	for (; i < RTE_MIN(CN10K_PKTS_PER_LOOP, nb_pkts); i++) {
 
 		m = pkts[i];
 		sec_sess = (struct cn10k_sec_session *)sess[i];
@@ -1492,6 +1505,7 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 		goto again;
 	}
 
+exit:
 	return count + i;
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 05/12] crypto/cnxk: use SSO PF func of inline device in inst
  2024-06-26 10:55   ` [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                       ` (3 preceding siblings ...)
  2024-06-26 10:55     ` [PATCH v3 04/12] crypto/cnxk: add flow control in Rx inject path Aakash Sasidharan
@ 2024-06-26 10:55     ` Aakash Sasidharan
  2024-06-26 10:55     ` [PATCH v3 06/12] crypto/cnxk: use NEON for Rx inject inst preparation Aakash Sasidharan
                       ` (7 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-26 10:55 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

RVU PF FUNC of the CPT LF need not be set as the hardware would
determine that. Instead SSO PF FUNC need to be set as inline device so
that critical errors would reach inline device.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 2 +-
 drivers/crypto/cnxk/cnxk_cryptodev.h      | 2 +-
 drivers/crypto/cnxk/cnxk_cryptodev_ops.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index 673220977c..2bdf4d29c2 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -1423,7 +1423,7 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 	fc_addr = vf->rx_inj_lmtline.fc_addr;
 
 	ROC_LMT_BASE_ID_GET(lmt_base, lmt_id);
-	pf_func = vf->rx_inj_pf_func;
+	pf_func = vf->rx_inj_sso_pf_func;
 
 	const uint32_t fc_thresh = vf->rx_inj_lmtline.fc_thresh;
 
diff --git a/drivers/crypto/cnxk/cnxk_cryptodev.h b/drivers/crypto/cnxk/cnxk_cryptodev.h
index fffc4a47b4..4000e84a7e 100644
--- a/drivers/crypto/cnxk/cnxk_cryptodev.h
+++ b/drivers/crypto/cnxk/cnxk_cryptodev.h
@@ -22,7 +22,7 @@
  */
 struct cnxk_cpt_vf {
 	struct roc_cpt_lmtline rx_inj_lmtline;
-	uint16_t rx_inj_pf_func;
+	uint16_t rx_inj_sso_pf_func;
 	uint16_t *rx_chan_base;
 	struct roc_cpt cpt;
 	struct rte_cryptodev_capabilities crypto_caps[CNXK_CPT_MAX_CAPS];
diff --git a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
index d7f5780637..51369309c5 100644
--- a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
@@ -483,7 +483,7 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 			goto exit;
 		}
 
-		vf->rx_inj_pf_func = qp->lf.pf_func;
+		vf->rx_inj_sso_pf_func = roc_idev_nix_inl_dev_pffunc_get();
 
 		/* Block the queue for other submissions */
 		qp->pend_q.pq_mask = 0;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 06/12] crypto/cnxk: use NEON for Rx inject inst preparation
  2024-06-26 10:55   ` [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                       ` (4 preceding siblings ...)
  2024-06-26 10:55     ` [PATCH v3 05/12] crypto/cnxk: use SSO PF func of inline device in inst Aakash Sasidharan
@ 2024-06-26 10:55     ` Aakash Sasidharan
  2024-06-26 10:55     ` [PATCH v3 07/12] crypto/cnxk: remove init of CPT result field in packet Aakash Sasidharan
                       ` (6 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-26 10:55 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Use NEON instructions for Rx inject instruction preparation.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
Signed-off-by: Aakash Sasidharan <asasidharan@marvell.com>
---
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 62 +++++++++++++++++------
 1 file changed, 46 insertions(+), 16 deletions(-)

diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index 2bdf4d29c2..c489f0884e 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -7,6 +7,7 @@
 #include <rte_event_crypto_adapter.h>
 #include <rte_hexdump.h>
 #include <rte_ip.h>
+#include <rte_vect.h>
 
 #include <ethdev_driver.h>
 
@@ -1395,15 +1396,17 @@ cn10k_cpt_dequeue_burst(void *qptr, struct rte_crypto_op **ops, uint16_t nb_ops)
 	return i;
 }
 
+#if defined(RTE_ARCH_ARM64)
 uint16_t __rte_hot
 cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 				  struct rte_security_session **sess, uint16_t nb_pkts)
 {
-	uint16_t l2_len, pf_func, lmt_id, count = 0;
-	uint64_t lmt_base, lmt_arg, io_addr;
+	uint64_t lmt_base, lmt_arg, io_addr, u64_0, u64_1, l2_len, pf_func;
+	uint64x2_t inst_01, inst_23, inst_45, inst_67;
 	struct cn10k_sec_session *sec_sess;
 	struct rte_cryptodev *cdev = dev;
 	union cpt_res_s *hw_res = NULL;
+	uint16_t lmt_id, count = 0;
 	struct cpt_inst_s *inst;
 	union cpt_fc_write_s fc;
 	struct cnxk_cpt_vf *vf;
@@ -1461,26 +1464,41 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 		hw_res = RTE_PTR_ALIGN_CEIL(hw_res, 16);
 
 		/* Prepare CPT instruction */
-		inst->w0.u64 = 0;
-		inst->w2.u64 = 0;
-		inst->w2.s.rvu_pf_func = pf_func;
-		inst->w3.u64 = (((uint64_t)m + sizeof(struct rte_mbuf)) >> 3) << 3 | 1;
 
-		inst->w4.u64 = sec_sess->inst.w4 | (rte_pktmbuf_pkt_len(m));
+		/* Word 0 and 1 */
+		inst_01 = vdupq_n_u64(0);
+		u64_0 = pf_func << 48 | *(vf->rx_chan_base + m->port) << 4 | (l2_len - 2) << 24 |
+			l2_len << 16;
+		inst_01 = vsetq_lane_u64(u64_0, inst_01, 0);
+		inst_01 = vsetq_lane_u64((uint64_t)hw_res, inst_01, 1);
+		vst1q_u64(&inst->w0.u64, inst_01);
+
+		/* Word 2 and 3 */
+		inst_23 = vdupq_n_u64(0);
+		u64_1 = (((uint64_t)m + sizeof(struct rte_mbuf)) >> 3) << 3 | 1;
+		inst_23 = vsetq_lane_u64(u64_1, inst_23, 1);
+		vst1q_u64(&inst->w2.u64, inst_23);
+
+		/* Word 4 and 5 */
+		inst_45 = vdupq_n_u64(0);
+		u64_0 = sec_sess->inst.w4 | (rte_pktmbuf_pkt_len(m));
+		inst_45 = vsetq_lane_u64(u64_0, inst_45, 0);
 		dptr = (uint64_t)rte_pktmbuf_iova(m);
-		inst->dptr = dptr;
-		inst->rptr = dptr;
-
-		inst->w0.hw_s.chan = *(vf->rx_chan_base + m->port);
-		inst->w0.hw_s.l2_len = l2_len;
-		inst->w0.hw_s.et_offset = l2_len - 2;
+		u64_1 = dptr;
+		inst_45 = vsetq_lane_u64(u64_1, inst_45, 1);
+		vst1q_u64(&inst->w4.u64, inst_45);
+
+		/* Word 6 and 7 */
+		inst_67 = vdupq_n_u64(0);
+		u64_0 = dptr;
+		u64_1 = sec_sess->inst.w7;
+		inst_67 = vsetq_lane_u64(u64_0, inst_67, 0);
+		inst_67 = vsetq_lane_u64(u64_1, inst_67, 1);
+		vst1q_u64(&inst->w6.u64, inst_67);
 
-		inst->res_addr = (uint64_t)hw_res;
 		rte_atomic_store_explicit((unsigned long __rte_atomic *)&hw_res->u64[0], res.u64[0],
 					  rte_memory_order_relaxed);
 
-		inst->w7.u64 = sec_sess->inst.w7;
-
 		inst += 2;
 	}
 
@@ -1508,6 +1526,18 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 exit:
 	return count + i;
 }
+#else
+uint16_t __rte_hot
+cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
+				  struct rte_security_session **sess, uint16_t nb_pkts)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(pkts);
+	RTE_SET_USED(sess);
+	RTE_SET_USED(nb_pkts);
+	return 0;
+}
+#endif
 
 void
 cn10k_cpt_set_enqdeq_fns(struct rte_cryptodev *dev, struct cnxk_cpt_vf *vf)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 07/12] crypto/cnxk: remove init of CPT result field in packet
  2024-06-26 10:55   ` [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                       ` (5 preceding siblings ...)
  2024-06-26 10:55     ` [PATCH v3 06/12] crypto/cnxk: use NEON for Rx inject inst preparation Aakash Sasidharan
@ 2024-06-26 10:55     ` Aakash Sasidharan
  2024-06-26 10:55     ` [PATCH v3 08/12] crypto/cnxk: add dual submission in Rx inject Aakash Sasidharan
                       ` (5 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-26 10:55 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

The packet would be posted to CPT only when there is a valid result.
Skip setting of the same.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index c489f0884e..b808ea2946 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -1415,10 +1415,6 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 	uint64_t dptr;
 	int i;
 
-	const union cpt_res_s res = {
-		.cn10k.compcode = CPT_COMP_NOT_DONE,
-	};
-
 	vf = cdev->data->dev_private;
 
 	lmt_base = vf->rx_inj_lmtline.lmt_base;
@@ -1496,9 +1492,6 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 		inst_67 = vsetq_lane_u64(u64_1, inst_67, 1);
 		vst1q_u64(&inst->w6.u64, inst_67);
 
-		rte_atomic_store_explicit((unsigned long __rte_atomic *)&hw_res->u64[0], res.u64[0],
-					  rte_memory_order_relaxed);
-
 		inst += 2;
 	}
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 08/12] crypto/cnxk: add dual submission in Rx inject
  2024-06-26 10:55   ` [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                       ` (6 preceding siblings ...)
  2024-06-26 10:55     ` [PATCH v3 07/12] crypto/cnxk: remove init of CPT result field in packet Aakash Sasidharan
@ 2024-06-26 10:55     ` Aakash Sasidharan
  2024-06-26 10:55     ` [PATCH v3 09/12] crypto/cnxk: update sess pointer for next iteration Aakash Sasidharan
                       ` (4 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-26 10:55 UTC (permalink / raw)
  To: Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Harman Kalra, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Add dual submission to CPT in Rx inject path.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
Signed-off-by: Vidya Sagar Velumuri <vvelumuri@marvell.com>
Signed-off-by: Aakash Sasidharan <asasidharan@marvell.com>
---
 drivers/common/cnxk/roc_cpt.h             | 43 +++++++++-----
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 72 +++++++++++++++++------
 drivers/crypto/cnxk/cnxk_cryptodev_ops.c  |  9 +++
 3 files changed, 92 insertions(+), 32 deletions(-)

diff --git a/drivers/common/cnxk/roc_cpt.h b/drivers/common/cnxk/roc_cpt.h
index 3721fa08c0..8ef9062ae0 100644
--- a/drivers/common/cnxk/roc_cpt.h
+++ b/drivers/common/cnxk/roc_cpt.h
@@ -30,23 +30,36 @@
 /* Vector of sizes in the burst of 16 CPT inst except first in 63:19 of
  * APT_LMT_ARG_S
  */
-#define ROC_CN10K_CPT_LMT_ARG                                                  \
-	(ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 0) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 1) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 2) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 3) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 4) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 5) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 6) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 7) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 8) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 9) |                            \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 10) |                           \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 11) |                           \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 12) |                           \
-	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 13) |                           \
+#define ROC_CN10K_CPT_LMT_ARG                                                                      \
+	(ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 0) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 1) |     \
+	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 2) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 3) |     \
+	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 4) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 5) |     \
+	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 6) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 7) |     \
+	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 8) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 9) |     \
+	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 10) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 11) |   \
+	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 12) | ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 13) |   \
 	 ROC_CN10K_CPT_INST_DW_M1 << (19 + 3 * 14))
 
+/* Vector of sizes in the burst of 2 * 16 CPT inst except first in 63:19 of
+ * APT_LMT_ARG_S
+ */
+#define ROC_CN10K_DUAL_CPT_LMT_ARG                                                                 \
+	(ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 0) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 1) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 2) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 3) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 4) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 5) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 6) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 7) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 8) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 9) |                                            \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 10) |                                           \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 11) |                                           \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 12) |                                           \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 13) |                                           \
+	 ROC_CN10K_TWO_CPT_INST_DW_M1 << (19 + 3 * 14))
+
 /* CPT helper macros */
 #define ROC_CPT_AH_HDR_LEN	12
 #define ROC_CPT_AES_GCM_IV_LEN	8
diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index b808ea2946..e42a3d2ea6 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -55,6 +55,56 @@ struct vec_request {
 	uint64_t w2;
 };
 
+#if defined(RTE_ARCH_ARM64)
+static __rte_always_inline void __rte_hot
+cn10k_cpt_lmtst_dual_submit(uint64_t *io_addr, const uint16_t lmt_id, int *i)
+{
+	uint64_t lmt_arg;
+
+	/* Check if the total number of instructions is odd or even. */
+	const int flag_odd = *i & 0x1;
+
+	/* Reduce i by 1 when odd number of instructions.*/
+	*i -= flag_odd;
+
+	if (*i > 2 * CN10K_PKTS_PER_STEORL) {
+		lmt_arg = ROC_CN10K_DUAL_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
+			  (uint64_t)lmt_id;
+		roc_lmt_submit_steorl(lmt_arg, *io_addr);
+		lmt_arg = ROC_CN10K_DUAL_CPT_LMT_ARG | (*i / 2 - CN10K_PKTS_PER_STEORL - 1) << 12 |
+			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
+		roc_lmt_submit_steorl(lmt_arg, *io_addr);
+		if (flag_odd) {
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_CPT_INST_DW_M1 << 4);
+			lmt_arg = (uint64_t)(lmt_id + *i / 2);
+			roc_lmt_submit_steorl(lmt_arg, *io_addr);
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_TWO_CPT_INST_DW_M1 << 4);
+			*i += 1;
+		}
+	} else {
+		if (*i != 0) {
+			lmt_arg =
+				ROC_CN10K_DUAL_CPT_LMT_ARG | (*i / 2 - 1) << 12 | (uint64_t)lmt_id;
+			roc_lmt_submit_steorl(lmt_arg, *io_addr);
+		}
+
+		if (flag_odd) {
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_CPT_INST_DW_M1 << 4);
+			lmt_arg = (uint64_t)(lmt_id + *i / 2);
+			roc_lmt_submit_steorl(lmt_arg, *io_addr);
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_TWO_CPT_INST_DW_M1 << 4);
+			*i += 1;
+		}
+	}
+
+	rte_io_wmb();
+}
+#endif
+
 static inline struct cnxk_se_sess *
 cn10k_cpt_sym_temp_sess_create(struct cnxk_cpt_qp *qp, struct rte_crypto_op *op)
 {
@@ -1401,7 +1451,7 @@ uint16_t __rte_hot
 cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 				  struct rte_security_session **sess, uint16_t nb_pkts)
 {
-	uint64_t lmt_base, lmt_arg, io_addr, u64_0, u64_1, l2_len, pf_func;
+	uint64_t lmt_base, io_addr, u64_0, u64_1, l2_len, pf_func;
 	uint64x2_t inst_01, inst_23, inst_45, inst_67;
 	struct cn10k_sec_session *sec_sess;
 	struct rte_cryptodev *cdev = dev;
@@ -1436,7 +1486,7 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 	if (unlikely(fc.s.qsize > fc_thresh))
 		goto exit;
 
-	for (; i < RTE_MIN(CN10K_PKTS_PER_LOOP, nb_pkts); i++) {
+	for (; i < RTE_MIN(2 * CN10K_PKTS_PER_LOOP, nb_pkts); i++) {
 
 		m = pkts[i];
 		sec_sess = (struct cn10k_sec_session *)sess[i];
@@ -1492,24 +1542,12 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 		inst_67 = vsetq_lane_u64(u64_1, inst_67, 1);
 		vst1q_u64(&inst->w6.u64, inst_67);
 
-		inst += 2;
+		inst++;
 	}
 
-	if (i > CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	} else {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	}
-
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
-	if (nb_pkts - i > 0 && i == CN10K_PKTS_PER_LOOP) {
+	if (nb_pkts - i > 0 && i == 2 * CN10K_PKTS_PER_LOOP) {
 		nb_pkts -= i;
 		pkts += i;
 		count += i;
diff --git a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
index 51369309c5..6acaa4413b 100644
--- a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
@@ -431,6 +431,7 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 	struct rte_pci_device *pci_dev;
 	struct cnxk_cpt_qp *qp;
 	uint32_t nb_desc;
+	uint64_t io_addr;
 	int ret;
 
 	if (dev->data->queue_pairs[qp_id] != NULL)
@@ -485,6 +486,14 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 
 		vf->rx_inj_sso_pf_func = roc_idev_nix_inl_dev_pffunc_get();
 
+		/* Update IO addr to enable dual submission */
+		io_addr = vf->rx_inj_lmtline.io_addr;
+		io_addr = (io_addr & ~(uint64_t)(0x7 << 4)) | ROC_CN10K_TWO_CPT_INST_DW_M1 << 4;
+		vf->rx_inj_lmtline.io_addr = io_addr;
+
+		/* Update FC threshold to reflect dual submission */
+		vf->rx_inj_lmtline.fc_thresh -= 32;
+
 		/* Block the queue for other submissions */
 		qp->pend_q.pq_mask = 0;
 	}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 09/12] crypto/cnxk: update sess pointer for next iteration
  2024-06-26 10:55   ` [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                       ` (7 preceding siblings ...)
  2024-06-26 10:55     ` [PATCH v3 08/12] crypto/cnxk: add dual submission in Rx inject Aakash Sasidharan
@ 2024-06-26 10:55     ` Aakash Sasidharan
  2024-06-26 10:55     ` [PATCH v3 10/12] crypto/cnxk: fix aes-gcm zero len input cases Aakash Sasidharan
                       ` (3 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-26 10:55 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Update sess pointer while working on next set of packets.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index e42a3d2ea6..ed964d4d01 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -1467,6 +1467,8 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 
 	vf = cdev->data->dev_private;
 
+	const int nb_pkts_per_loop = 2 * CN10K_PKTS_PER_LOOP;
+
 	lmt_base = vf->rx_inj_lmtline.lmt_base;
 	io_addr = vf->rx_inj_lmtline.io_addr;
 	fc_addr = vf->rx_inj_lmtline.fc_addr;
@@ -1486,7 +1488,7 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 	if (unlikely(fc.s.qsize > fc_thresh))
 		goto exit;
 
-	for (; i < RTE_MIN(2 * CN10K_PKTS_PER_LOOP, nb_pkts); i++) {
+	for (; i < RTE_MIN(nb_pkts_per_loop, nb_pkts); i++) {
 
 		m = pkts[i];
 		sec_sess = (struct cn10k_sec_session *)sess[i];
@@ -1547,10 +1549,11 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 
 	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
-	if (nb_pkts - i > 0 && i == 2 * CN10K_PKTS_PER_LOOP) {
-		nb_pkts -= i;
-		pkts += i;
-		count += i;
+	if (nb_pkts - i > 0 && i == nb_pkts_per_loop) {
+		nb_pkts -= nb_pkts_per_loop;
+		pkts += nb_pkts_per_loop;
+		count += nb_pkts_per_loop;
+		sess += nb_pkts_per_loop;
 		goto again;
 	}
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 10/12] crypto/cnxk: fix aes-gcm zero len input cases
  2024-06-26 10:55   ` [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                       ` (8 preceding siblings ...)
  2024-06-26 10:55     ` [PATCH v3 09/12] crypto/cnxk: update sess pointer for next iteration Aakash Sasidharan
@ 2024-06-26 10:55     ` Aakash Sasidharan
  2024-06-26 10:55     ` [PATCH v3 11/12] crypto/cnxk: make pack IV variable as const Aakash Sasidharan
                       ` (2 subsequent siblings)
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-26 10:55 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj, Akhil Goyal
  Cc: jerinj, vvelumuri, asasidharan, dev, stable

For aes-gcm (AEAD) zero length input, sg code path is taken unlike
the digest only cases as AAD is treated as a separate input component.
Fix the zero len case in SG path by avoiding the gather component
only when it is a non AEAD algorithm. Also add sg version check as
the fix only applies to specific model.

Fixes: 4d8166d64988 ("crypto/cnxk: enable digest for zero length input")
Cc: stable@dpdk.org

Signed-off-by: Aakash Sasidharan <asasidharan@marvell.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/crypto/cnxk/cnxk_se.h | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/crypto/cnxk/cnxk_se.h b/drivers/crypto/cnxk/cnxk_se.h
index 6374718a82..63dbef4411 100644
--- a/drivers/crypto/cnxk/cnxk_se.h
+++ b/drivers/crypto/cnxk/cnxk_se.h
@@ -2468,13 +2468,14 @@ fill_sess_gmac(struct rte_crypto_sym_xform *xform, struct cnxk_se_sess *sess)
 }
 
 static __rte_always_inline uint32_t
-prepare_iov_from_pkt(struct rte_mbuf *pkt, struct roc_se_iov_ptr *iovec, uint32_t start_offset)
+prepare_iov_from_pkt(struct rte_mbuf *pkt, struct roc_se_iov_ptr *iovec, uint32_t start_offset,
+		     const bool is_aead, const bool is_sg_ver2)
 {
 	uint16_t index = 0;
 	void *seg_data = NULL;
 	int32_t seg_size = 0;
 
-	if (!pkt || pkt->data_len == 0) {
+	if (!pkt || (is_sg_ver2 && (pkt->data_len == 0) && !is_aead)) {
 		iovec->buf_cnt = 0;
 		return 0;
 	}
@@ -2619,13 +2620,13 @@ fill_sm_params(struct rte_crypto_op *cop, struct cnxk_se_sess *sess,
 		fc_params.dst_iov = (void *)dst;
 
 		/* Store SG I/O in the api for reuse */
-		if (prepare_iov_from_pkt(m_src, fc_params.src_iov, 0)) {
+		if (prepare_iov_from_pkt(m_src, fc_params.src_iov, 0, false, is_sg_ver2)) {
 			plt_dp_err("Prepare src iov failed");
 			ret = -EINVAL;
 			goto err_exit;
 		}
 
-		if (prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0)) {
+		if (prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0, false, is_sg_ver2)) {
 			plt_dp_err("Prepare dst iov failed for m_dst %p", m_dst);
 			ret = -EINVAL;
 			goto err_exit;
@@ -2816,14 +2817,15 @@ fill_fc_params(struct rte_crypto_op *cop, struct cnxk_se_sess *sess,
 		fc_params.dst_iov = (void *)dst;
 
 		/* Store SG I/O in the api for reuse */
-		if (prepare_iov_from_pkt(m_src, fc_params.src_iov, 0)) {
+		if (prepare_iov_from_pkt(m_src, fc_params.src_iov, 0, is_aead, is_sg_ver2)) {
 			plt_dp_err("Prepare src iov failed");
 			ret = -EINVAL;
 			goto err_exit;
 		}
 
 		if (unlikely(m_dst != NULL)) {
-			if (prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0)) {
+			if (prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0, is_aead,
+						 is_sg_ver2)) {
 				plt_dp_err("Prepare dst iov failed for "
 					   "m_dst %p",
 					   m_dst);
@@ -2957,13 +2959,15 @@ fill_pdcp_params(struct rte_crypto_op *cop, struct cnxk_se_sess *sess,
 		fc_params.dst_iov = (void *)dst;
 
 		/* Store SG I/O in the api for reuse */
-		if (unlikely(prepare_iov_from_pkt(m_src, fc_params.src_iov, 0))) {
+		if (unlikely(
+			    prepare_iov_from_pkt(m_src, fc_params.src_iov, 0, false, is_sg_ver2))) {
 			plt_dp_err("Prepare src iov failed");
 			ret = -EINVAL;
 			goto err_exit;
 		}
 
-		if (unlikely(prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0))) {
+		if (unlikely(
+			    prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0, false, is_sg_ver2))) {
 			plt_dp_err("Prepare dst iov failed for m_dst %p", m_dst);
 			ret = -EINVAL;
 			goto err_exit;
@@ -3080,14 +3084,16 @@ fill_pdcp_chain_params(struct rte_crypto_op *cop, struct cnxk_se_sess *sess,
 		fc_params.dst_iov = (void *)dst;
 
 		/* Store SG I/O in the api for reuse */
-		if (unlikely(prepare_iov_from_pkt(m_src, fc_params.src_iov, 0))) {
+		if (unlikely(
+			    prepare_iov_from_pkt(m_src, fc_params.src_iov, 0, false, is_sg_ver2))) {
 			plt_dp_err("Could not prepare src iov");
 			ret = -EINVAL;
 			goto err_exit;
 		}
 
 		if (unlikely(m_dst != NULL)) {
-			if (unlikely(prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0))) {
+			if (unlikely(prepare_iov_from_pkt(m_dst, fc_params.dst_iov, 0, false,
+							  is_sg_ver2))) {
 				plt_dp_err("Could not prepare m_dst iov %p", m_dst);
 				ret = -EINVAL;
 				goto err_exit;
@@ -3306,7 +3312,7 @@ fill_digest_params(struct rte_crypto_op *cop, struct cnxk_se_sess *sess,
 	params.src_iov = (void *)src;
 
 	/*Store SG I/O in the api for reuse */
-	if (prepare_iov_from_pkt(m_src, params.src_iov, auth_range_off)) {
+	if (prepare_iov_from_pkt(m_src, params.src_iov, auth_range_off, false, is_sg_ver2)) {
 		plt_dp_err("Prepare src iov failed");
 		ret = -EINVAL;
 		goto free_mdata_and_exit;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 11/12] crypto/cnxk: make pack IV variable as const
  2024-06-26 10:55   ` [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                       ` (9 preceding siblings ...)
  2024-06-26 10:55     ` [PATCH v3 10/12] crypto/cnxk: fix aes-gcm zero len input cases Aakash Sasidharan
@ 2024-06-26 10:55     ` Aakash Sasidharan
  2024-06-26 10:55     ` [PATCH v3 12/12] crypto/cnxk: enable dual submission to CPT Aakash Sasidharan
  2024-06-27  5:11     ` [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD Akhil Goyal
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-26 10:55 UTC (permalink / raw)
  To: Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Make 'pack_iv' variable as const to avoid multiple checks.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
---
 drivers/crypto/cnxk/cnxk_se.h | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/crypto/cnxk/cnxk_se.h b/drivers/crypto/cnxk/cnxk_se.h
index 63dbef4411..dbd36a8a54 100644
--- a/drivers/crypto/cnxk/cnxk_se.h
+++ b/drivers/crypto/cnxk/cnxk_se.h
@@ -105,7 +105,7 @@ cpt_pack_iv(uint8_t *iv_src, uint8_t *iv_dst)
 }
 
 static inline void
-pdcp_iv_copy(uint8_t *iv_d, const uint8_t *iv_s, const uint8_t pdcp_alg_type, uint8_t pack_iv)
+pdcp_iv_copy(uint8_t *iv_d, const uint8_t *iv_s, const uint8_t pdcp_alg_type, const bool pack_iv)
 {
 	const uint32_t *iv_s_temp;
 	uint32_t iv_temp[4];
@@ -261,7 +261,7 @@ cpt_mac_len_verify(struct rte_crypto_auth_xform *auth)
 
 static __rte_always_inline int
 sg_inst_prep(struct roc_se_fc_params *params, struct cpt_inst_s *inst, uint64_t offset_ctrl,
-	     const uint8_t *iv_s, int iv_len, uint8_t pack_iv, uint8_t pdcp_alg_type,
+	     const uint8_t *iv_s, int iv_len, const bool pack_iv, uint8_t pdcp_alg_type,
 	     int32_t inputlen, int32_t outputlen, uint32_t passthrough_len, uint32_t req_flags,
 	     int pdcp_flag, int decrypt)
 {
@@ -457,7 +457,7 @@ sg_inst_prep(struct roc_se_fc_params *params, struct cpt_inst_s *inst, uint64_t
 
 static __rte_always_inline int
 sg2_inst_prep(struct roc_se_fc_params *params, struct cpt_inst_s *inst, uint64_t offset_ctrl,
-	      const uint8_t *iv_s, int iv_len, uint8_t pack_iv, uint8_t pdcp_alg_type,
+	      const uint8_t *iv_s, int iv_len, const bool pack_iv, uint8_t pdcp_alg_type,
 	      int32_t inputlen, int32_t outputlen, uint32_t passthrough_len, uint32_t req_flags,
 	      int pdcp_flag, int decrypt)
 {
@@ -882,7 +882,7 @@ static inline int
 pdcp_chain_sg1_prep(struct roc_se_fc_params *params, struct roc_se_ctx *cpt_ctx,
 		    struct cpt_inst_s *inst, union cpt_inst_w4 w4, int32_t inputlen,
 		    uint8_t hdr_len, uint64_t offset_ctrl, uint32_t req_flags,
-		    const uint8_t *cipher_iv, const uint8_t *auth_iv, const int pack_iv,
+		    const uint8_t *cipher_iv, const uint8_t *auth_iv, const bool pack_iv,
 		    const uint8_t pdcp_ci_alg, const uint8_t pdcp_auth_alg)
 {
 	struct roc_sglist_comp *scatter_comp, *gather_comp;
@@ -991,7 +991,7 @@ static inline int
 pdcp_chain_sg2_prep(struct roc_se_fc_params *params, struct roc_se_ctx *cpt_ctx,
 		    struct cpt_inst_s *inst, union cpt_inst_w4 w4, int32_t inputlen,
 		    uint8_t hdr_len, uint64_t offset_ctrl, uint32_t req_flags,
-		    const uint8_t *cipher_iv, const uint8_t *auth_iv, const int pack_iv,
+		    const uint8_t *cipher_iv, const uint8_t *auth_iv, const bool pack_iv,
 		    const uint8_t pdcp_ci_alg, const uint8_t pdcp_auth_alg)
 {
 	struct roc_sg2list_comp *gather_comp, *scatter_comp;
@@ -1528,7 +1528,6 @@ cpt_pdcp_chain_alg_prep(uint32_t req_flags, uint64_t d_offs, uint64_t d_lens,
 	struct roc_se_ctx *se_ctx;
 	uint64_t *offset_vaddr;
 	uint64_t offset_ctrl;
-	uint8_t pack_iv = 0;
 	int32_t inputlen;
 	void *dm_vaddr;
 	uint8_t *iv_d;
@@ -1606,10 +1605,10 @@ cpt_pdcp_chain_alg_prep(uint32_t req_flags, uint64_t d_offs, uint64_t d_lens,
 		cpt_inst_w4.s.dlen = inputlen + ROC_SE_OFF_CTRL_LEN;
 
 		iv_d = ((uint8_t *)offset_vaddr + ROC_SE_OFF_CTRL_LEN);
-		pdcp_iv_copy(iv_d, cipher_iv, pdcp_ci_alg, pack_iv);
+		pdcp_iv_copy(iv_d, cipher_iv, pdcp_ci_alg, false);
 
 		iv_d = ((uint8_t *)offset_vaddr + ROC_SE_OFF_CTRL_LEN + pdcp_iv_off);
-		pdcp_iv_copy(iv_d, auth_iv, pdcp_auth_alg, pack_iv);
+		pdcp_iv_copy(iv_d, auth_iv, pdcp_auth_alg, false);
 
 		inst->w4.u64 = cpt_inst_w4.u64;
 		return 0;
@@ -1618,11 +1617,11 @@ cpt_pdcp_chain_alg_prep(uint32_t req_flags, uint64_t d_offs, uint64_t d_lens,
 		if (is_sg_ver2)
 			return pdcp_chain_sg2_prep(params, se_ctx, inst, cpt_inst_w4, inputlen,
 						   hdr_len, offset_ctrl, req_flags, cipher_iv,
-						   auth_iv, pack_iv, pdcp_ci_alg, pdcp_auth_alg);
+						   auth_iv, false, pdcp_ci_alg, pdcp_auth_alg);
 		else
 			return pdcp_chain_sg1_prep(params, se_ctx, inst, cpt_inst_w4, inputlen,
 						   hdr_len, offset_ctrl, req_flags, cipher_iv,
-						   auth_iv, pack_iv, pdcp_ci_alg, pdcp_auth_alg);
+						   auth_iv, false, pdcp_ci_alg, pdcp_auth_alg);
 	}
 }
 
@@ -1647,9 +1646,9 @@ cpt_pdcp_alg_prep(uint32_t req_flags, uint64_t d_offs, uint64_t d_lens,
 	uint64_t *offset_vaddr;
 	uint8_t pdcp_alg_type;
 	uint32_t mac_len = 0;
-	const uint8_t *iv_s;
-	uint8_t pack_iv = 0;
 	uint64_t offset_ctrl;
+	bool pack_iv = false;
+	const uint8_t *iv_s;
 	int ret;
 
 	mac_len = se_ctx->mac_len;
@@ -1671,7 +1670,7 @@ cpt_pdcp_alg_prep(uint32_t req_flags, uint64_t d_offs, uint64_t d_lens,
 		if (pdcp_alg_type != ROC_SE_PDCP_ALG_TYPE_AES_CMAC) {
 
 			if (params->auth_iv_len == 25)
-				pack_iv = 1;
+				pack_iv = true;
 
 			auth_offset = auth_offset / 8;
 			auth_data_len = RTE_ALIGN(auth_data_len, 8) / 8;
@@ -1694,7 +1693,7 @@ cpt_pdcp_alg_prep(uint32_t req_flags, uint64_t d_offs, uint64_t d_lens,
 		pdcp_alg_type = se_ctx->pdcp_ci_alg;
 
 		if (params->cipher_iv_len == 25)
-			pack_iv = 1;
+			pack_iv = true;
 
 		/*
 		 * Microcode expects offsets in bytes
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 12/12] crypto/cnxk: enable dual submission to CPT
  2024-06-26 10:55   ` [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                       ` (10 preceding siblings ...)
  2024-06-26 10:55     ` [PATCH v3 11/12] crypto/cnxk: make pack IV variable as const Aakash Sasidharan
@ 2024-06-26 10:55     ` Aakash Sasidharan
  2024-06-27  5:11     ` [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD Akhil Goyal
  12 siblings, 0 replies; 41+ messages in thread
From: Aakash Sasidharan @ 2024-06-26 10:55 UTC (permalink / raw)
  To: Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Harman Kalra, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Pavan Nikhilesh, Shijith Thotton
  Cc: gakhil, jerinj, vvelumuri, asasidharan, dev

From: Anoob Joseph <anoobj@marvell.com>

Submit two instructions in one LMTLINE.

Signed-off-by: Anoob Joseph <anoobj@marvell.com>
Signed-off-by: Aakash Sasidharan <asasidharan@marvell.com>
---
 drivers/common/cnxk/roc_cpt.c             |  17 +-
 drivers/common/cnxk/roc_cpt.h             |   8 +-
 drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 184 +++++-----------------
 drivers/crypto/cnxk/cn10k_cryptodev_ops.h |  60 ++++++-
 drivers/crypto/cnxk/cnxk_cryptodev_ops.c  |  47 ++----
 drivers/crypto/cnxk/cnxk_cryptodev_ops.h  |   2 +
 drivers/event/cnxk/cnxk_eventdev_adptr.c  |   4 +-
 7 files changed, 124 insertions(+), 198 deletions(-)

diff --git a/drivers/common/cnxk/roc_cpt.c b/drivers/common/cnxk/roc_cpt.c
index 9f283ceb2e..aba2a49d19 100644
--- a/drivers/common/cnxk/roc_cpt.c
+++ b/drivers/common/cnxk/roc_cpt.c
@@ -1135,8 +1135,8 @@ roc_cpt_iq_enable(struct roc_cpt_lf *lf)
 }
 
 int
-roc_cpt_lmtline_init(struct roc_cpt *roc_cpt, struct roc_cpt_lmtline *lmtline,
-		     int lf_id)
+roc_cpt_lmtline_init(struct roc_cpt *roc_cpt, struct roc_cpt_lmtline *lmtline, int lf_id,
+		     bool is_dual)
 {
 	struct roc_cpt_lf *lf;
 
@@ -1145,12 +1145,19 @@ roc_cpt_lmtline_init(struct roc_cpt *roc_cpt, struct roc_cpt_lmtline *lmtline,
 		return -ENOTSUP;
 
 	lmtline->io_addr = lf->io_addr;
-	if (roc_model_is_cn10k())
-		lmtline->io_addr |= ROC_CN10K_CPT_INST_DW_M1 << 4;
+	lmtline->fc_thresh = lf->nb_desc - CPT_LF_FC_MIN_THRESHOLD;
+
+	if (roc_model_is_cn10k()) {
+		if (is_dual) {
+			lmtline->io_addr |= ROC_CN10K_TWO_CPT_INST_DW_M1 << 4;
+			lmtline->fc_thresh = lf->nb_desc -  2 * CPT_LF_FC_MIN_THRESHOLD;
+		} else {
+			lmtline->io_addr |= ROC_CN10K_CPT_INST_DW_M1 << 4;
+		}
+	}
 
 	lmtline->fc_addr = lf->fc_addr;
 	lmtline->lmt_base = lf->lmt_base;
-	lmtline->fc_thresh = lf->nb_desc - CPT_LF_FC_MIN_THRESHOLD;
 
 	return 0;
 }
diff --git a/drivers/common/cnxk/roc_cpt.h b/drivers/common/cnxk/roc_cpt.h
index 8ef9062ae0..e2e919f80f 100644
--- a/drivers/common/cnxk/roc_cpt.h
+++ b/drivers/common/cnxk/roc_cpt.h
@@ -200,12 +200,12 @@ int __roc_api roc_cpt_afs_print(struct roc_cpt *roc_cpt);
 int __roc_api roc_cpt_lfs_print(struct roc_cpt *roc_cpt);
 void __roc_api roc_cpt_iq_disable(struct roc_cpt_lf *lf);
 void __roc_api roc_cpt_iq_enable(struct roc_cpt_lf *lf);
-int __roc_api roc_cpt_lmtline_init(struct roc_cpt *roc_cpt,
-				   struct roc_cpt_lmtline *lmtline, int lf_id);
+int __roc_api roc_cpt_lmtline_init(struct roc_cpt *roc_cpt, struct roc_cpt_lmtline *lmtline,
+				   int lf_id, bool is_dual);
 
 void __roc_api roc_cpt_parse_hdr_dump(FILE *file, const struct cpt_parse_hdr_s *cpth);
-int __roc_api roc_cpt_ctx_write(struct roc_cpt_lf *lf, void *sa_dptr,
-				void *sa_cptr, uint16_t sa_len);
+int __roc_api roc_cpt_ctx_write(struct roc_cpt_lf *lf, void *sa_dptr, void *sa_cptr,
+				uint16_t sa_len);
 
 void __roc_api roc_cpt_int_misc_cb_register(roc_cpt_int_misc_cb_t cb, void *args);
 int __roc_api roc_cpt_int_misc_cb_unregister(roc_cpt_int_misc_cb_t cb, void *args);
diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
index ed964d4d01..780785d656 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.c
@@ -12,11 +12,6 @@
 #include <ethdev_driver.h>
 
 #include "roc_cpt.h"
-#if defined(__aarch64__)
-#include "roc_io.h"
-#else
-#include "roc_io_generic.h"
-#endif
 #include "roc_idev.h"
 #include "roc_sso.h"
 #include "roc_sso_dp.h"
@@ -40,8 +35,8 @@
 
 /* Holds information required to send crypto operations in one burst */
 struct ops_burst {
-	struct rte_crypto_op *op[CN10K_PKTS_PER_LOOP];
-	uint64_t w2[CN10K_PKTS_PER_LOOP];
+	struct rte_crypto_op *op[CN10K_CPT_PKTS_PER_LOOP];
+	uint64_t w2[CN10K_CPT_PKTS_PER_LOOP];
 	struct cn10k_sso_hws *ws;
 	struct cnxk_cpt_qp *qp;
 	uint16_t nb_ops;
@@ -55,56 +50,6 @@ struct vec_request {
 	uint64_t w2;
 };
 
-#if defined(RTE_ARCH_ARM64)
-static __rte_always_inline void __rte_hot
-cn10k_cpt_lmtst_dual_submit(uint64_t *io_addr, const uint16_t lmt_id, int *i)
-{
-	uint64_t lmt_arg;
-
-	/* Check if the total number of instructions is odd or even. */
-	const int flag_odd = *i & 0x1;
-
-	/* Reduce i by 1 when odd number of instructions.*/
-	*i -= flag_odd;
-
-	if (*i > 2 * CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_DUAL_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, *io_addr);
-		lmt_arg = ROC_CN10K_DUAL_CPT_LMT_ARG | (*i / 2 - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, *io_addr);
-		if (flag_odd) {
-			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
-				   (ROC_CN10K_CPT_INST_DW_M1 << 4);
-			lmt_arg = (uint64_t)(lmt_id + *i / 2);
-			roc_lmt_submit_steorl(lmt_arg, *io_addr);
-			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
-				   (ROC_CN10K_TWO_CPT_INST_DW_M1 << 4);
-			*i += 1;
-		}
-	} else {
-		if (*i != 0) {
-			lmt_arg =
-				ROC_CN10K_DUAL_CPT_LMT_ARG | (*i / 2 - 1) << 12 | (uint64_t)lmt_id;
-			roc_lmt_submit_steorl(lmt_arg, *io_addr);
-		}
-
-		if (flag_odd) {
-			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
-				   (ROC_CN10K_CPT_INST_DW_M1 << 4);
-			lmt_arg = (uint64_t)(lmt_id + *i / 2);
-			roc_lmt_submit_steorl(lmt_arg, *io_addr);
-			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
-				   (ROC_CN10K_TWO_CPT_INST_DW_M1 << 4);
-			*i += 1;
-		}
-	}
-
-	rte_io_wmb();
-}
-#endif
-
 static inline struct cnxk_se_sess *
 cn10k_cpt_sym_temp_sess_create(struct cnxk_cpt_qp *qp, struct rte_crypto_op *op)
 {
@@ -387,8 +332,8 @@ static uint16_t
 cn10k_cpt_enqueue_burst(void *qptr, struct rte_crypto_op **ops, uint16_t nb_ops,
 			const bool is_sg_ver2)
 {
-	uint64_t lmt_base, lmt_arg, io_addr;
 	struct cpt_inflight_req *infl_req;
+	uint64_t head, lmt_base, io_addr;
 	uint16_t nb_allowed, count = 0;
 	struct cnxk_cpt_qp *qp = qptr;
 	struct pending_queue *pend_q;
@@ -396,7 +341,6 @@ cn10k_cpt_enqueue_burst(void *qptr, struct rte_crypto_op **ops, uint16_t nb_ops,
 	union cpt_fc_write_s fc;
 	uint64_t *fc_addr;
 	uint16_t lmt_id;
-	uint64_t head;
 	int ret, i;
 
 	pend_q = &qp->pend_q;
@@ -426,11 +370,11 @@ cn10k_cpt_enqueue_burst(void *qptr, struct rte_crypto_op **ops, uint16_t nb_ops,
 		goto pend_q_commit;
 	}
 
-	for (i = 0; i < RTE_MIN(CN10K_PKTS_PER_LOOP, nb_ops); i++) {
+	for (i = 0; i < RTE_MIN(CN10K_CPT_PKTS_PER_LOOP, nb_ops); i++) {
 		infl_req = &pend_q->req_queue[head];
 		infl_req->op_flags = 0;
 
-		ret = cn10k_cpt_fill_inst(qp, ops + i, &inst[2 * i], infl_req, is_sg_ver2);
+		ret = cn10k_cpt_fill_inst(qp, ops + i, &inst[i], infl_req, is_sg_ver2);
 		if (unlikely(ret != 1)) {
 			plt_dp_err("Could not process op: %p", ops + i);
 			if (i == 0)
@@ -441,24 +385,12 @@ cn10k_cpt_enqueue_burst(void *qptr, struct rte_crypto_op **ops, uint16_t nb_ops,
 		pending_queue_advance(&head, pq_mask);
 	}
 
-	if (i > CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	} else {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	}
-
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
-	if (nb_ops - i > 0 && i == CN10K_PKTS_PER_LOOP) {
-		nb_ops -= i;
-		ops += i;
-		count += i;
+	if (nb_ops - i > 0 && i == CN10K_CPT_PKTS_PER_LOOP) {
+		nb_ops -= CN10K_CPT_PKTS_PER_LOOP;
+		ops += CN10K_CPT_PKTS_PER_LOOP;
+		count += CN10K_CPT_PKTS_PER_LOOP;
 		goto again;
 	}
 
@@ -633,7 +565,7 @@ cn10k_cpt_vec_pkt_submission_timeout_handle(void)
 static inline void
 cn10k_cpt_vec_submit(struct vec_request vec_tbl[], uint16_t vec_tbl_len, struct cnxk_cpt_qp *qp)
 {
-	uint64_t lmt_base, lmt_arg, lmt_id, io_addr;
+	uint64_t lmt_base, lmt_id, io_addr;
 	union cpt_fc_write_s fc;
 	struct cpt_inst_s *inst;
 	uint16_t burst_size;
@@ -661,7 +593,7 @@ cn10k_cpt_vec_submit(struct vec_request vec_tbl[], uint16_t vec_tbl_len, struct
 again:
 	burst_size = RTE_MIN(CN10K_PKTS_PER_STEORL, vec_tbl_len);
 	for (i = 0; i < burst_size; i++)
-		cn10k_cpt_vec_inst_fill(&vec_tbl[i], &inst[i * 2], qp, vec_tbl[0].w7);
+		cn10k_cpt_vec_inst_fill(&vec_tbl[i], &inst[i], qp, vec_tbl[0].w7);
 
 	do {
 		fc.u64[0] = __atomic_load_n(fc_addr, __ATOMIC_RELAXED);
@@ -671,10 +603,7 @@ cn10k_cpt_vec_submit(struct vec_request vec_tbl[], uint16_t vec_tbl_len, struct
 			cn10k_cpt_vec_pkt_submission_timeout_handle();
 	} while (true);
 
-	lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | lmt_id;
-	roc_lmt_submit_steorl(lmt_arg, io_addr);
-
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
 	vec_tbl_len -= i;
 
@@ -688,12 +617,12 @@ static inline int
 ca_lmtst_vec_submit(struct ops_burst *burst, struct vec_request vec_tbl[], uint16_t *vec_tbl_len,
 		    const bool is_sg_ver2)
 {
-	struct cpt_inflight_req *infl_reqs[CN10K_PKTS_PER_LOOP];
-	uint64_t lmt_base, lmt_arg, io_addr;
+	struct cpt_inflight_req *infl_reqs[CN10K_CPT_PKTS_PER_LOOP];
 	uint16_t lmt_id, len = *vec_tbl_len;
 	struct cpt_inst_s *inst, *inst_base;
 	struct cpt_inflight_req *infl_req;
 	struct rte_event_vector *vec;
+	uint64_t lmt_base, io_addr;
 	union cpt_fc_write_s fc;
 	struct cnxk_cpt_qp *qp;
 	uint64_t *fc_addr;
@@ -730,7 +659,7 @@ ca_lmtst_vec_submit(struct ops_burst *burst, struct vec_request vec_tbl[], uint1
 	}
 
 	for (i = 0; i < burst->nb_ops; i++) {
-		inst = &inst_base[2 * i];
+		inst = &inst_base[i];
 		infl_req = infl_reqs[i];
 		infl_req->op_flags = 0;
 
@@ -790,24 +719,12 @@ next_op:;
 	if (CNXK_TT_FROM_TAG(burst->ws->gw_rdata) == SSO_TT_ORDERED)
 		roc_sso_hws_head_wait(burst->ws->base);
 
-	if (i > CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	} else {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	}
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
 	/* Store w7 of last successfully filled instruction */
 	inst = &inst_base[2 * (i - 1)];
 	vec_tbl[0].w7 = inst->w7;
 
-	rte_io_wmb();
-
 put:
 	if (i != burst->nb_ops)
 		rte_mempool_put_bulk(qp->ca.req_mp, (void *)&infl_reqs[i], burst->nb_ops - i);
@@ -820,10 +737,10 @@ next_op:;
 static inline uint16_t
 ca_lmtst_burst_submit(struct ops_burst *burst, const bool is_sg_ver2)
 {
-	struct cpt_inflight_req *infl_reqs[CN10K_PKTS_PER_LOOP];
-	uint64_t lmt_base, lmt_arg, io_addr;
+	struct cpt_inflight_req *infl_reqs[CN10K_CPT_PKTS_PER_LOOP];
 	struct cpt_inst_s *inst, *inst_base;
 	struct cpt_inflight_req *infl_req;
+	uint64_t lmt_base, io_addr;
 	union cpt_fc_write_s fc;
 	struct cnxk_cpt_qp *qp;
 	uint64_t *fc_addr;
@@ -854,7 +771,7 @@ ca_lmtst_burst_submit(struct ops_burst *burst, const bool is_sg_ver2)
 	}
 
 	for (i = 0; i < burst->nb_ops; i++) {
-		inst = &inst_base[2 * i];
+		inst = &inst_base[i];
 		infl_req = infl_reqs[i];
 		infl_req->op_flags = 0;
 
@@ -891,19 +808,7 @@ ca_lmtst_burst_submit(struct ops_burst *burst, const bool is_sg_ver2)
 	if (CNXK_TT_FROM_TAG(burst->ws->gw_rdata) == SSO_TT_ORDERED)
 		roc_sso_hws_head_wait(burst->ws->base);
 
-	if (i > CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	} else {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	}
-
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
 put:
 	if (unlikely(i != burst->nb_ops))
@@ -965,7 +870,7 @@ cn10k_cpt_crypto_adapter_enqueue(void *ws, struct rte_event ev[], uint16_t nb_ev
 		burst.op[burst.nb_ops] = op;
 
 		/* Max nb_ops per burst check */
-		if (++burst.nb_ops == CN10K_PKTS_PER_LOOP) {
+		if (++burst.nb_ops == CN10K_CPT_PKTS_PER_LOOP) {
 			if (is_vector)
 				submitted = ca_lmtst_vec_submit(&burst, vec_tbl, &vec_tbl_len,
 								is_sg_ver2);
@@ -1467,8 +1372,6 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 
 	vf = cdev->data->dev_private;
 
-	const int nb_pkts_per_loop = 2 * CN10K_PKTS_PER_LOOP;
-
 	lmt_base = vf->rx_inj_lmtline.lmt_base;
 	io_addr = vf->rx_inj_lmtline.io_addr;
 	fc_addr = vf->rx_inj_lmtline.fc_addr;
@@ -1488,7 +1391,7 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 	if (unlikely(fc.s.qsize > fc_thresh))
 		goto exit;
 
-	for (; i < RTE_MIN(nb_pkts_per_loop, nb_pkts); i++) {
+	for (; i < RTE_MIN(CN10K_CPT_PKTS_PER_LOOP, nb_pkts); i++) {
 
 		m = pkts[i];
 		sec_sess = (struct cn10k_sec_session *)sess[i];
@@ -1549,11 +1452,11 @@ cn10k_cryptodev_sec_inb_rx_inject(void *dev, struct rte_mbuf **pkts,
 
 	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
-	if (nb_pkts - i > 0 && i == nb_pkts_per_loop) {
-		nb_pkts -= nb_pkts_per_loop;
-		pkts += nb_pkts_per_loop;
-		count += nb_pkts_per_loop;
-		sess += nb_pkts_per_loop;
+	if (nb_pkts - i > 0 && i == CN10K_CPT_PKTS_PER_LOOP) {
+		nb_pkts -= CN10K_CPT_PKTS_PER_LOOP;
+		pkts += CN10K_CPT_PKTS_PER_LOOP;
+		count += CN10K_CPT_PKTS_PER_LOOP;
+		sess += CN10K_CPT_PKTS_PER_LOOP;
 		goto again;
 	}
 
@@ -1652,8 +1555,8 @@ cn10k_cpt_raw_enqueue_burst(void *qpair, uint8_t *drv_ctx, struct rte_crypto_sym
 			    const bool is_sgv2)
 {
 	uint16_t lmt_id, nb_allowed, nb_ops = vec->num;
-	uint64_t lmt_base, lmt_arg, io_addr, head;
 	struct cpt_inflight_req *infl_req;
+	uint64_t lmt_base, io_addr, head;
 	struct cnxk_cpt_qp *qp = qpair;
 	struct cnxk_sym_dp_ctx *dp_ctx;
 	struct pending_queue *pend_q;
@@ -1690,7 +1593,7 @@ cn10k_cpt_raw_enqueue_burst(void *qpair, uint8_t *drv_ctx, struct rte_crypto_sym
 		goto pend_q_commit;
 	}
 
-	for (i = 0; i < RTE_MIN(CN10K_PKTS_PER_LOOP, nb_ops); i++) {
+	for (i = 0; i < RTE_MIN(CN10K_CPT_PKTS_PER_LOOP, nb_ops); i++) {
 		struct cnxk_iov iov;
 
 		index = count + i;
@@ -1698,7 +1601,7 @@ cn10k_cpt_raw_enqueue_burst(void *qpair, uint8_t *drv_ctx, struct rte_crypto_sym
 		infl_req->op_flags = 0;
 
 		cnxk_raw_burst_to_iov(vec, &ofs, index, &iov);
-		ret = cn10k_cpt_raw_fill_inst(&iov, qp, dp_ctx, &inst[2 * i], infl_req,
+		ret = cn10k_cpt_raw_fill_inst(&iov, qp, dp_ctx, &inst[i], infl_req,
 					      user_data[index], is_sgv2);
 		if (unlikely(ret != 1)) {
 			plt_dp_err("Could not process vec: %d", index);
@@ -1712,21 +1615,9 @@ cn10k_cpt_raw_enqueue_burst(void *qpair, uint8_t *drv_ctx, struct rte_crypto_sym
 		pending_queue_advance(&head, pq_mask);
 	}
 
-	if (i > CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	} else {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	}
-
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
-	if (nb_ops - i > 0 && i == CN10K_PKTS_PER_LOOP) {
+	if (nb_ops - i > 0 && i == CN10K_CPT_PKTS_PER_LOOP) {
 		nb_ops -= i;
 		count += i;
 		goto again;
@@ -1767,8 +1658,8 @@ cn10k_cpt_raw_enqueue(void *qpair, uint8_t *drv_ctx, struct rte_crypto_vec *data
 		      struct rte_crypto_va_iova_ptr *aad_or_auth_iv, void *user_data,
 		      const bool is_sgv2)
 {
-	uint64_t lmt_base, lmt_arg, io_addr, head;
 	struct cpt_inflight_req *infl_req;
+	uint64_t lmt_base, io_addr, head;
 	struct cnxk_cpt_qp *qp = qpair;
 	struct cnxk_sym_dp_ctx *dp_ctx;
 	uint16_t lmt_id, nb_allowed;
@@ -1776,7 +1667,7 @@ cn10k_cpt_raw_enqueue(void *qpair, uint8_t *drv_ctx, struct rte_crypto_vec *data
 	union cpt_fc_write_s fc;
 	struct cnxk_iov iov;
 	uint64_t *fc_addr;
-	int ret;
+	int ret, i = 1;
 
 	struct pending_queue *pend_q = &qp->pend_q;
 	const uint64_t pq_mask = pend_q->pq_mask;
@@ -1813,10 +1704,7 @@ cn10k_cpt_raw_enqueue(void *qpair, uint8_t *drv_ctx, struct rte_crypto_vec *data
 
 	pending_queue_advance(&head, pq_mask);
 
-	lmt_arg = ROC_CN10K_CPT_LMT_ARG | (uint64_t)lmt_id;
-	roc_lmt_submit_steorl(lmt_arg, io_addr);
-
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
 	pend_q->head = head;
 	pend_q->time_out = rte_get_timer_cycles() + DEFAULT_COMMAND_TIMEOUT * rte_get_timer_hz();
diff --git a/drivers/crypto/cnxk/cn10k_cryptodev_ops.h b/drivers/crypto/cnxk/cn10k_cryptodev_ops.h
index 406c4abc7f..be76c49a65 100644
--- a/drivers/crypto/cnxk/cn10k_cryptodev_ops.h
+++ b/drivers/crypto/cnxk/cn10k_cryptodev_ops.h
@@ -5,15 +5,21 @@
 #ifndef _CN10K_CRYPTODEV_OPS_H_
 #define _CN10K_CRYPTODEV_OPS_H_
 
-#include <rte_compat.h>
 #include <cryptodev_pmd.h>
+#include <rte_compat.h>
 #include <rte_cryptodev.h>
 #include <rte_eventdev.h>
 
+#if defined(__aarch64__)
+#include "roc_io.h"
+#else
+#include "roc_io_generic.h"
+#endif
+
 #include "cnxk_cryptodev.h"
 
-#define CN10K_PKTS_PER_LOOP   32
-#define CN10K_PKTS_PER_STEORL 16
+#define CN10K_PKTS_PER_STEORL	  32
+#define CN10K_LMTLINES_PER_STEORL 16
 
 extern struct rte_cryptodev_ops cn10k_cpt_ops;
 
@@ -34,4 +40,52 @@ __rte_internal
 uint16_t __rte_hot cn10k_cpt_sg_ver2_crypto_adapter_enqueue(void *ws, struct rte_event ev[],
 		uint16_t nb_events);
 
+static __rte_always_inline void __rte_hot
+cn10k_cpt_lmtst_dual_submit(uint64_t *io_addr, const uint16_t lmt_id, int *i)
+{
+	uint64_t lmt_arg;
+
+	/* Check if the total number of instructions is odd or even. */
+	const int flag_odd = *i & 0x1;
+
+	/* Reduce i by 1 when odd number of instructions.*/
+	*i -= flag_odd;
+
+	if (*i > CN10K_PKTS_PER_STEORL) {
+		lmt_arg = ROC_CN10K_DUAL_CPT_LMT_ARG | (CN10K_LMTLINES_PER_STEORL - 1) << 12 |
+			  (uint64_t)lmt_id;
+		roc_lmt_submit_steorl(lmt_arg, *io_addr);
+		lmt_arg = ROC_CN10K_DUAL_CPT_LMT_ARG |
+			  (*i / 2 - CN10K_LMTLINES_PER_STEORL - 1) << 12 |
+			  (uint64_t)(lmt_id + CN10K_LMTLINES_PER_STEORL);
+		roc_lmt_submit_steorl(lmt_arg, *io_addr);
+		if (flag_odd) {
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_CPT_INST_DW_M1 << 4);
+			lmt_arg = (uint64_t)(lmt_id + *i / 2);
+			roc_lmt_submit_steorl(lmt_arg, *io_addr);
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_TWO_CPT_INST_DW_M1 << 4);
+			*i += 1;
+		}
+	} else {
+		if (*i != 0) {
+			lmt_arg =
+				ROC_CN10K_DUAL_CPT_LMT_ARG | (*i / 2 - 1) << 12 | (uint64_t)lmt_id;
+			roc_lmt_submit_steorl(lmt_arg, *io_addr);
+		}
+
+		if (flag_odd) {
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_CPT_INST_DW_M1 << 4);
+			lmt_arg = (uint64_t)(lmt_id + *i / 2);
+			roc_lmt_submit_steorl(lmt_arg, *io_addr);
+			*io_addr = (*io_addr & ~(uint64_t)(0x7 << 4)) |
+				   (ROC_CN10K_TWO_CPT_INST_DW_M1 << 4);
+			*i += 1;
+		}
+	}
+
+	rte_io_wmb();
+}
 #endif /* _CN10K_CRYPTODEV_OPS_H_ */
diff --git a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
index 6acaa4413b..cfcfa79fdf 100644
--- a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
@@ -431,7 +431,6 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 	struct rte_pci_device *pci_dev;
 	struct cnxk_cpt_qp *qp;
 	uint32_t nb_desc;
-	uint64_t io_addr;
 	int ret;
 
 	if (dev->data->queue_pairs[qp_id] != NULL)
@@ -467,7 +466,7 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 
 	roc_cpt->lf[qp_id] = &qp->lf;
 
-	ret = roc_cpt_lmtline_init(roc_cpt, &qp->lmtline, qp_id);
+	ret = roc_cpt_lmtline_init(roc_cpt, &qp->lmtline, qp_id, true);
 	if (ret < 0) {
 		roc_cpt->lf[qp_id] = NULL;
 		plt_err("Could not init lmtline for queue pair %d", qp_id);
@@ -478,7 +477,7 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 	dev->data->queue_pairs[qp_id] = qp;
 
 	if (qp_id == vf->rx_inject_qp) {
-		ret = roc_cpt_lmtline_init(roc_cpt, &vf->rx_inj_lmtline, vf->rx_inject_qp);
+		ret = roc_cpt_lmtline_init(roc_cpt, &vf->rx_inj_lmtline, vf->rx_inject_qp, true);
 		if (ret) {
 			plt_err("Could not init lmtline Rx inject");
 			goto exit;
@@ -486,14 +485,6 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 
 		vf->rx_inj_sso_pf_func = roc_idev_nix_inl_dev_pffunc_get();
 
-		/* Update IO addr to enable dual submission */
-		io_addr = vf->rx_inj_lmtline.io_addr;
-		io_addr = (io_addr & ~(uint64_t)(0x7 << 4)) | ROC_CN10K_TWO_CPT_INST_DW_M1 << 4;
-		vf->rx_inj_lmtline.io_addr = io_addr;
-
-		/* Update FC threshold to reflect dual submission */
-		vf->rx_inj_lmtline.fc_thresh -= 32;
-
 		/* Block the queue for other submissions */
 		qp->pend_q.pq_mask = 0;
 	}
@@ -969,44 +960,28 @@ rte_pmd_cnxk_crypto_qptr_get(uint8_t dev_id, uint16_t qp_id)
 static inline void
 cnxk_crypto_cn10k_submit(void *qptr, void *inst, uint16_t nb_inst)
 {
-	uint64_t lmt_base, lmt_arg, io_addr;
 	struct cnxk_cpt_qp *qp = qptr;
-	uint16_t i, j, lmt_id;
+	uint64_t lmt_base, io_addr;
+	uint16_t lmt_id;
 	void *lmt_dst;
+	int i;
 
 	lmt_base = qp->lmtline.lmt_base;
 	io_addr = qp->lmtline.io_addr;
 
 	ROC_LMT_BASE_ID_GET(lmt_base, lmt_id);
 
-again:
-	i = RTE_MIN(nb_inst, CN10K_PKTS_PER_LOOP);
 	lmt_dst = PLT_PTR_CAST(lmt_base);
+again:
+	i = RTE_MIN(nb_inst, CN10K_CPT_PKTS_PER_LOOP);
 
-	for (j = 0; j < i; j++) {
-		rte_memcpy(lmt_dst, inst, sizeof(struct cpt_inst_s));
-		inst = RTE_PTR_ADD(inst, sizeof(struct cpt_inst_s));
-		lmt_dst = RTE_PTR_ADD(lmt_dst, 2 * sizeof(struct cpt_inst_s));
-	}
-
-	rte_io_wmb();
-
-	if (i > CN10K_PKTS_PER_STEORL) {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - CN10K_PKTS_PER_STEORL - 1) << 12 |
-			  (uint64_t)(lmt_id + CN10K_PKTS_PER_STEORL);
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	} else {
-		lmt_arg = ROC_CN10K_CPT_LMT_ARG | (i - 1) << 12 | (uint64_t)lmt_id;
-		roc_lmt_submit_steorl(lmt_arg, io_addr);
-	}
+	memcpy(lmt_dst, inst, i * sizeof(struct cpt_inst_s));
 
-	rte_io_wmb();
+	cn10k_cpt_lmtst_dual_submit(&io_addr, lmt_id, &i);
 
 	if (nb_inst - i > 0) {
-		nb_inst -= i;
+		nb_inst -= CN10K_CPT_PKTS_PER_LOOP;
+		inst = RTE_PTR_ADD(inst, CN10K_CPT_PKTS_PER_LOOP * sizeof(struct cpt_inst_s));
 		goto again;
 	}
 }
diff --git a/drivers/crypto/cnxk/cnxk_cryptodev_ops.h b/drivers/crypto/cnxk/cnxk_cryptodev_ops.h
index 9de7e432e4..caf6ac35e5 100644
--- a/drivers/crypto/cnxk/cnxk_cryptodev_ops.h
+++ b/drivers/crypto/cnxk/cnxk_cryptodev_ops.h
@@ -25,6 +25,8 @@
 
 #define MOD_INC(i, l) ((i) == (l - 1) ? (i) = 0 : (i)++)
 
+#define CN10K_CPT_PKTS_PER_LOOP	  64
+
 /* Macros to form words in CPT instruction */
 #define CNXK_CPT_INST_W2(tag, tt, grp, rvu_pf_func)                            \
 	((tag) | ((uint64_t)(tt) << 32) | ((uint64_t)(grp) << 34) |            \
diff --git a/drivers/event/cnxk/cnxk_eventdev_adptr.c b/drivers/event/cnxk/cnxk_eventdev_adptr.c
index 98db11ad61..2c049e7041 100644
--- a/drivers/event/cnxk/cnxk_eventdev_adptr.c
+++ b/drivers/event/cnxk/cnxk_eventdev_adptr.c
@@ -632,7 +632,7 @@ crypto_adapter_qp_setup(const struct rte_cryptodev *cdev, struct cnxk_cpt_qp *qp
 	 * simultaneous enqueue from all available cores.
 	 */
 	if (roc_model_is_cn10k())
-		nb_desc_min = rte_lcore_count() * 32;
+		nb_desc_min = rte_lcore_count() * CN10K_CPT_PKTS_PER_LOOP;
 	else
 		nb_desc_min = rte_lcore_count() * 2;
 
@@ -707,7 +707,7 @@ crypto_adapter_qp_free(struct cnxk_cpt_qp *qp)
 	rte_mempool_free(qp->ca.req_mp);
 	qp->ca.enabled = false;
 
-	ret = roc_cpt_lmtline_init(qp->lf.roc_cpt, &qp->lmtline, qp->lf.lf_id);
+	ret = roc_cpt_lmtline_init(qp->lf.roc_cpt, &qp->lmtline, qp->lf.lf_id, true);
 	if (ret < 0) {
 		plt_err("Could not reset lmtline for queue pair %d", qp->lf.lf_id);
 		return ret;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD
  2024-06-26 10:55   ` [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD Aakash Sasidharan
                       ` (11 preceding siblings ...)
  2024-06-26 10:55     ` [PATCH v3 12/12] crypto/cnxk: enable dual submission to CPT Aakash Sasidharan
@ 2024-06-27  5:11     ` Akhil Goyal
  12 siblings, 0 replies; 41+ messages in thread
From: Akhil Goyal @ 2024-06-27  5:11 UTC (permalink / raw)
  To: Aakash Sasidharan
  Cc: Jerin Jacob, Anoob Joseph, Vidya Sagar Velumuri, Aakash Sasidharan, dev

[-- Attachment #1: Type: text/plain, Size: 2385 bytes --]

> Subject: [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD
> 
> v3:
> * Fix compilation error by moving function meant for arm64 under
>   "#if defined(RTE_ARCH_ARM64)" guard.
> v2:
> * Fix compilation errors observed with arm gcc-13.
> 
> This series adds improvements to CNXK crypto PMD and fixes aes-gcm zero
> length input failure.
> 
> Aakash Sasidharan (1):
>   crypto/cnxk: fix aes-gcm zero len input cases
> 
> Anoob Joseph (11):
>   common/cnxk: add comments to denote skipped entries
>   crypto/cnxk: update version map file with PMD APIs
>   common/cnxk: make inline dev PF func get as idev API
>   crypto/cnxk: add flow control in Rx inject path
>   crypto/cnxk: use SSO PF func of inline device in inst
>   crypto/cnxk: use NEON for Rx inject inst preparation
>   crypto/cnxk: remove init of CPT result field in packet
>   crypto/cnxk: add dual submission in Rx inject
>   crypto/cnxk: update sess pointer for next iteration
>   crypto/cnxk: make pack IV variable as const
>   crypto/cnxk: enable dual submission to CPT
> 
>  drivers/common/cnxk/roc_ae.c              |   6 +-
>  drivers/common/cnxk/roc_ae_fpm_tables.c   |   6 +-
>  drivers/common/cnxk/roc_cpt.c             |  17 +-
>  drivers/common/cnxk/roc_cpt.h             |  51 +++--
>  drivers/common/cnxk/roc_idev.c            |   6 +
>  drivers/common/cnxk/roc_idev.h            |   2 +
>  drivers/common/cnxk/roc_nix_inl.h         |   1 -
>  drivers/common/cnxk/roc_nix_inl_dev.c     |   6 -
>  drivers/common/cnxk/version.map           |   2 +-
>  drivers/crypto/cnxk/cn10k_cryptodev_ops.c | 234 +++++++++-------------
>  drivers/crypto/cnxk/cn10k_cryptodev_ops.h |  60 +++++-
>  drivers/crypto/cnxk/cnxk_cryptodev.h      |   2 +-
>  drivers/crypto/cnxk/cnxk_cryptodev_ops.c  |  40 ++--
>  drivers/crypto/cnxk/cnxk_cryptodev_ops.h  |   2 +
>  drivers/crypto/cnxk/cnxk_se.h             |  55 ++---
>  drivers/crypto/cnxk/rte_pmd_cnxk_crypto.h |   2 +
>  drivers/crypto/cnxk/version.map           |   8 +
>  drivers/event/cnxk/cnxk_eventdev_adptr.c  |   4 +-
>  drivers/net/cnxk/cn10k_ethdev_sec.c       |   2 +-
>  drivers/net/cnxk/cnxk_ethdev_telemetry.c  |   3 +-
>  20 files changed, 275 insertions(+), 234 deletions(-)
> 
Updated patch titles and description of some of the patches. Please review.
Series applied to dpdk-next-crypto

Thanks.

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 15479 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2024-06-27  5:11 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-20 14:58 [PATCH 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
2024-06-20 14:58 ` [PATCH 01/12] common/cnxk: add comments to denote skipped entries Aakash Sasidharan
2024-06-20 14:58 ` [PATCH 02/12] crypto/cnxk: update version map file with PMD APIs Aakash Sasidharan
2024-06-20 14:58 ` [PATCH 03/12] common/cnxk: make inline dev PF func get as idev API Aakash Sasidharan
2024-06-20 14:58 ` [PATCH 04/12] crypto/cnxk: add flow control in Rx inject path Aakash Sasidharan
2024-06-20 14:58 ` [PATCH 05/12] crypto/cnxk: use SSO PF func of inline device in inst Aakash Sasidharan
2024-06-20 14:58 ` [PATCH 06/12] crypto/cnxk: use NEON for Rx inject inst preparation Aakash Sasidharan
2024-06-20 14:58 ` [PATCH 07/12] crypto/cnxk: remove init of CPT result field in packet Aakash Sasidharan
2024-06-20 14:58 ` [PATCH 08/12] crypto/cnxk: add dual submission in Rx inject Aakash Sasidharan
2024-06-20 14:58 ` [PATCH 09/12] crypto/cnxk: update sess pointer for next iteration Aakash Sasidharan
2024-06-20 14:58 ` [PATCH 10/12] crypto/cnxk: fix aes-gcm zero len input cases Aakash Sasidharan
2024-06-20 14:58 ` [PATCH 11/12] crypto/cnxk: make pack IV variable as const Aakash Sasidharan
2024-06-20 14:58 ` [PATCH 12/12] crypto/cnxk: enable dual submission to CPT Aakash Sasidharan
2024-06-24  6:23 ` [PATCH v2 00/12] fixes and improvements to CNXK crypto PMD Aakash Sasidharan
2024-06-24  6:23   ` [PATCH v2 01/12] common/cnxk: add comments to denote skipped entries Aakash Sasidharan
2024-06-24  6:23   ` [PATCH v2 02/12] crypto/cnxk: update version map file with PMD APIs Aakash Sasidharan
2024-06-24  6:23   ` [PATCH v2 03/12] common/cnxk: make inline dev PF func get as idev API Aakash Sasidharan
2024-06-24  6:23   ` [PATCH v2 04/12] crypto/cnxk: add flow control in Rx inject path Aakash Sasidharan
2024-06-24  6:23   ` [PATCH v2 05/12] crypto/cnxk: use SSO PF func of inline device in inst Aakash Sasidharan
2024-06-24  6:23   ` [PATCH v2 06/12] crypto/cnxk: use NEON for Rx inject inst preparation Aakash Sasidharan
2024-06-24  6:23   ` [PATCH v2 07/12] crypto/cnxk: remove init of CPT result field in packet Aakash Sasidharan
2024-06-24  6:23   ` [PATCH v2 08/12] crypto/cnxk: add dual submission in Rx inject Aakash Sasidharan
2024-06-26  6:41     ` Akhil Goyal
2024-06-24  6:23   ` [PATCH v2 09/12] crypto/cnxk: update sess pointer for next iteration Aakash Sasidharan
2024-06-24  6:23   ` [PATCH v2 10/12] crypto/cnxk: fix aes-gcm zero len input cases Aakash Sasidharan
2024-06-24  6:24   ` [PATCH v2 11/12] crypto/cnxk: make pack IV variable as const Aakash Sasidharan
2024-06-24  6:24   ` [PATCH v2 12/12] crypto/cnxk: enable dual submission to CPT Aakash Sasidharan
2024-06-26 10:55   ` [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD Aakash Sasidharan
2024-06-26 10:55     ` [PATCH v3 01/12] common/cnxk: add comments to denote skipped entries Aakash Sasidharan
2024-06-26 10:55     ` [PATCH v3 02/12] crypto/cnxk: update version map file with PMD APIs Aakash Sasidharan
2024-06-26 10:55     ` [PATCH v3 03/12] common/cnxk: make inline dev PF func get as idev API Aakash Sasidharan
2024-06-26 10:55     ` [PATCH v3 04/12] crypto/cnxk: add flow control in Rx inject path Aakash Sasidharan
2024-06-26 10:55     ` [PATCH v3 05/12] crypto/cnxk: use SSO PF func of inline device in inst Aakash Sasidharan
2024-06-26 10:55     ` [PATCH v3 06/12] crypto/cnxk: use NEON for Rx inject inst preparation Aakash Sasidharan
2024-06-26 10:55     ` [PATCH v3 07/12] crypto/cnxk: remove init of CPT result field in packet Aakash Sasidharan
2024-06-26 10:55     ` [PATCH v3 08/12] crypto/cnxk: add dual submission in Rx inject Aakash Sasidharan
2024-06-26 10:55     ` [PATCH v3 09/12] crypto/cnxk: update sess pointer for next iteration Aakash Sasidharan
2024-06-26 10:55     ` [PATCH v3 10/12] crypto/cnxk: fix aes-gcm zero len input cases Aakash Sasidharan
2024-06-26 10:55     ` [PATCH v3 11/12] crypto/cnxk: make pack IV variable as const Aakash Sasidharan
2024-06-26 10:55     ` [PATCH v3 12/12] crypto/cnxk: enable dual submission to CPT Aakash Sasidharan
2024-06-27  5:11     ` [PATCH v3 00/12] Fixes and improvements to CNXK crypto PMD Akhil Goyal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).