DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH 00/33] add Marvell cn20k SOC support for mempool and net
@ 2024-09-10  8:58 Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 01/33] mempool/cnxk: add cn20k PCI device ids Nithin Dabilpuram
                   ` (35 more replies)
  0 siblings, 36 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj; +Cc: dev, Nithin Dabilpuram

This series adds support for Marvell cn20k SOC for mempool and
net PMD's.

This series also adds few net/cnxk PMD updates to expose IPsec
features supported by HW that are very custom in nature and
some enhancements for cn10k.

Depends-on: series-32878 ("Marvell cn20K SOC base code")

Ashwin Sekhar T K (4):
  mempool/cnxk: add cn20k PCI device ids
  common/cnxk: accommodate change in aura field width
  common/cnxk: use new NPA aq enq mbox for cn20k
  mempool/cnxk: initialize mempool ops for cn20k

Nithin Dabilpuram (12):
  net/cnxk: move PMD function defines to common code
  net/cnxk: add cn20k base control path support
  net/cnxk: support Rx function select for cn20k
  net/cnxk: support Tx function select for cn20k
  net/cnxk: support Rx burst scalar for cn20k
  net/cnxk: support Rx burst vector for cn20k
  net/cnxk: support Tx burst scalar for cn20k
  net/cnxk: support Tx multi-seg in cn20k
  net/cnxk: support Tx burst vector for cn20k
  net/cnxk: support Tx multi-seg in vector for cn20k
  common/cnxk: add flush wait after write of inline ctx
  common/cnxk: fix CPT HW word size for outbound SA

Rakesh Kudurumalla (5):
  net/cnxk: added telemetry support do dump SA information
  net/cnxk: handle timestamp correctly for VF
  net/cnxk: update Rx offloads to handle timestamp
  event/cnxk: handle timestamp for event mode
  net/cnxk: update mbuf and rearm data for Rx inject packets

Satha Rao (5):
  common/cnxk: add cn20k NIX register definitions
  common/cnxk: support NIX queue config for cn20k
  common/cnxk: support bandwidth profile for cn20k
  common/cnxk: support NIX debug for cn20k
  common/cnxk: add RSS support for cn20k

Srujana Challa (5):
  net/cnxk: add PMD APIs for IPsec SA base and flush
  net/cnxk: add PMD APIs to submit CPT instruction
  net/cnxk: add PMD API to retrieve CPT queue statistics
  net/cnxk: add option to enable custom inbound sa usage
  net/cnxk: add PMD API to retrieve the model string

Sunil Kumar Kori (2):
  common/cnxk: remove restriction to clear RPM stats
  common/cnxk: allow MAC address set/add with active VFs

 doc/guides/nics/cnxk.rst                      |   25 +
 drivers/common/cnxk/cnxk_telemetry_nix.c      |  260 +-
 drivers/common/cnxk/hw/nix.h                  |  524 ++-
 drivers/common/cnxk/hw/npa.h                  |  164 +-
 drivers/common/cnxk/hw/rvu.h                  |    7 +-
 drivers/common/cnxk/roc_ie_ot.c               |    1 +
 drivers/common/cnxk/roc_mbox.h                |   84 +
 drivers/common/cnxk/roc_nix.c                 |   15 +-
 drivers/common/cnxk/roc_nix.h                 |    3 +
 drivers/common/cnxk/roc_nix_bpf.c             |  528 ++-
 drivers/common/cnxk/roc_nix_debug.c           |  243 +-
 drivers/common/cnxk/roc_nix_fc.c              |  106 +-
 drivers/common/cnxk/roc_nix_inl.c             |  105 +-
 drivers/common/cnxk/roc_nix_inl.h             |   29 +-
 drivers/common/cnxk/roc_nix_inl_dev.c         |   35 +
 drivers/common/cnxk/roc_nix_inl_priv.h        |    3 +
 drivers/common/cnxk/roc_nix_mac.c             |   15 -
 drivers/common/cnxk/roc_nix_priv.h            |    4 +-
 drivers/common/cnxk/roc_nix_queue.c           |  638 ++-
 drivers/common/cnxk/roc_nix_rss.c             |   74 +-
 drivers/common/cnxk/roc_nix_stats.c           |   55 +-
 drivers/common/cnxk/roc_nix_tm.c              |   22 +-
 drivers/common/cnxk/roc_nix_tm_ops.c          |   29 +-
 drivers/common/cnxk/roc_npa.c                 |  100 +-
 drivers/common/cnxk/roc_npa.h                 |   24 +-
 drivers/common/cnxk/roc_npa_debug.c           |   17 +-
 drivers/common/cnxk/version.map               |    2 +
 drivers/event/cnxk/cn10k_eventdev.c           |   32 +
 drivers/event/cnxk/cn9k_eventdev.c            |   31 +
 drivers/event/cnxk/cnxk_eventdev_adptr.c      |    2 +-
 drivers/mempool/cnxk/cnxk_mempool.c           |    2 +
 drivers/mempool/cnxk/cnxk_mempool_ops.c       |    2 +-
 drivers/net/cnxk/cn10k_ethdev.c               |   20 +-
 drivers/net/cnxk/cn10k_ethdev_sec.c           |  108 +-
 drivers/net/cnxk/cn10k_rx.h                   |   12 +-
 drivers/net/cnxk/cn20k_ethdev.c               |  943 +++++
 drivers/net/cnxk/cn20k_ethdev.h               |   15 +
 drivers/net/cnxk/cn20k_rx.h                   | 1100 ++++++
 drivers/net/cnxk/cn20k_rx_select.c            |  160 +
 drivers/net/cnxk/cn20k_rxtx.h                 |  245 ++
 drivers/net/cnxk/cn20k_tx.h                   | 3471 +++++++++++++++++
 drivers/net/cnxk/cn20k_tx_select.c            |  122 +
 drivers/net/cnxk/cn9k_ethdev.c                |   17 +-
 drivers/net/cnxk/cn9k_ethdev_sec.c            |   14 +
 drivers/net/cnxk/cnxk_ethdev.c                |   13 +-
 drivers/net/cnxk/cnxk_ethdev.h                |   52 +
 drivers/net/cnxk/cnxk_ethdev_devargs.c        |    4 +
 drivers/net/cnxk/cnxk_ethdev_dp.h             |    3 +
 drivers/net/cnxk/cnxk_ethdev_sec.c            |  138 +-
 drivers/net/cnxk/cnxk_ethdev_sec_telemetry.c  |  145 +-
 drivers/net/cnxk/meson.build                  |   92 +-
 drivers/net/cnxk/rte_pmd_cnxk.h               |  163 +-
 drivers/net/cnxk/rx/cn20k/rx_0_15.c           |   20 +
 drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c      |   20 +
 drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c       |   20 +
 drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c  |   20 +
 drivers/net/cnxk/rx/cn20k/rx_112_127.c        |   20 +
 drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c   |   20 +
 drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c    |   20 +
 .../net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c   |   20 +
 drivers/net/cnxk/rx/cn20k/rx_16_31.c          |   20 +
 drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c     |   20 +
 drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c      |   20 +
 drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c |   20 +
 drivers/net/cnxk/rx/cn20k/rx_32_47.c          |   20 +
 drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c     |   20 +
 drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c      |   20 +
 drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c |   20 +
 drivers/net/cnxk/rx/cn20k/rx_48_63.c          |   20 +
 drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c     |   20 +
 drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c      |   20 +
 drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c |   20 +
 drivers/net/cnxk/rx/cn20k/rx_64_79.c          |   20 +
 drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c     |   20 +
 drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c      |   20 +
 drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c |   20 +
 drivers/net/cnxk/rx/cn20k/rx_80_95.c          |   20 +
 drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c     |   20 +
 drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c      |   20 +
 drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c |   20 +
 drivers/net/cnxk/rx/cn20k/rx_96_111.c         |   20 +
 drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c    |   20 +
 drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c     |   20 +
 .../net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c    |   20 +
 drivers/net/cnxk/rx/cn20k/rx_all_offload.c    |   57 +
 drivers/net/cnxk/tx/cn20k/tx_0_15.c           |   18 +
 drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c      |   18 +
 drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c       |   18 +
 drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c  |   18 +
 drivers/net/cnxk/tx/cn20k/tx_112_127.c        |   18 +
 drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c   |   18 +
 drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c    |   18 +
 .../net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c   |   18 +
 drivers/net/cnxk/tx/cn20k/tx_16_31.c          |   18 +
 drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c     |   18 +
 drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c      |   18 +
 drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c |   18 +
 drivers/net/cnxk/tx/cn20k/tx_32_47.c          |   18 +
 drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c     |   18 +
 drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c      |   18 +
 drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c |   18 +
 drivers/net/cnxk/tx/cn20k/tx_48_63.c          |   18 +
 drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c     |   18 +
 drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c      |   18 +
 drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c |   18 +
 drivers/net/cnxk/tx/cn20k/tx_64_79.c          |   18 +
 drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c     |   18 +
 drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c      |   18 +
 drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c |   18 +
 drivers/net/cnxk/tx/cn20k/tx_80_95.c          |   18 +
 drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c     |   18 +
 drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c      |   18 +
 drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c |   18 +
 drivers/net/cnxk/tx/cn20k/tx_96_111.c         |   18 +
 drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c    |   18 +
 drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c     |   18 +
 .../net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c    |   18 +
 drivers/net/cnxk/tx/cn20k/tx_all_offload.c    |   39 +
 drivers/net/cnxk/version.map                  |    7 +
 119 files changed, 10826 insertions(+), 511 deletions(-)
 create mode 100644 drivers/net/cnxk/cn20k_ethdev.c
 create mode 100644 drivers/net/cnxk/cn20k_ethdev.h
 create mode 100644 drivers/net/cnxk/cn20k_rx.h
 create mode 100644 drivers/net/cnxk/cn20k_rx_select.c
 create mode 100644 drivers/net/cnxk/cn20k_rxtx.h
 create mode 100644 drivers/net/cnxk/cn20k_tx.h
 create mode 100644 drivers/net/cnxk/cn20k_tx_select.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_all_offload.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_all_offload.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 01/33] mempool/cnxk: add cn20k PCI device ids
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 02/33] common/cnxk: accommodate change in aura field width Nithin Dabilpuram
                   ` (34 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Ashwin Sekhar T K, Pavan Nikhilesh; +Cc: dev

From: Ashwin Sekhar T K <asekhar@marvell.com>

Add cn20k PCI device ids.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
---
 drivers/mempool/cnxk/cnxk_mempool.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/mempool/cnxk/cnxk_mempool.c b/drivers/mempool/cnxk/cnxk_mempool.c
index 1181b6f265..6ff11d8004 100644
--- a/drivers/mempool/cnxk/cnxk_mempool.c
+++ b/drivers/mempool/cnxk/cnxk_mempool.c
@@ -161,6 +161,7 @@ npa_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 }
 
 static const struct rte_pci_id npa_pci_map[] = {
+	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN20KA, PCI_DEVID_CNXK_RVU_NPA_PF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KA, PCI_DEVID_CNXK_RVU_NPA_PF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KAS, PCI_DEVID_CNXK_RVU_NPA_PF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KB, PCI_DEVID_CNXK_RVU_NPA_PF),
@@ -172,6 +173,7 @@ static const struct rte_pci_id npa_pci_map[] = {
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN9KD, PCI_DEVID_CNXK_RVU_NPA_PF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN9KE, PCI_DEVID_CNXK_RVU_NPA_PF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CNF9KA, PCI_DEVID_CNXK_RVU_NPA_PF),
+	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN20KA, PCI_DEVID_CNXK_RVU_NPA_VF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KA, PCI_DEVID_CNXK_RVU_NPA_VF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KAS, PCI_DEVID_CNXK_RVU_NPA_VF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KB, PCI_DEVID_CNXK_RVU_NPA_VF),
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 02/33] common/cnxk: accommodate change in aura field width
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 01/33] mempool/cnxk: add cn20k PCI device ids Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 03/33] common/cnxk: use new NPA aq enq mbox for cn20k Nithin Dabilpuram
                   ` (33 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Ashwin Sekhar T K

From: Ashwin Sekhar T K <asekhar@marvell.com>

Aura field width has changed from 20 bits to 17 bits in
cn20k. Adjust the bit fields accordingly for register
reads/writes.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
---
 drivers/common/cnxk/roc_npa.h | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/common/cnxk/roc_npa.h b/drivers/common/cnxk/roc_npa.h
index 4ad5f044b5..fbf75b2fca 100644
--- a/drivers/common/cnxk/roc_npa.h
+++ b/drivers/common/cnxk/roc_npa.h
@@ -16,6 +16,7 @@
 #else
 #include "roc_io_generic.h"
 #endif
+#include "roc_model.h"
 #include "roc_npa_dp.h"
 
 #define ROC_AURA_OP_LIMIT_MASK (BIT_ULL(36) - 1)
@@ -68,11 +69,12 @@ roc_npa_aura_op_alloc(uint64_t aura_handle, const int drop)
 static inline uint64_t
 roc_npa_aura_op_cnt_get(uint64_t aura_handle)
 {
-	uint64_t wdata;
+	uint64_t wdata, shift;
 	int64_t *addr;
 	uint64_t reg;
 
-	wdata = roc_npa_aura_handle_to_aura(aura_handle) << 44;
+	shift = roc_model_is_cn20k() ? 47 : 44;
+	wdata = roc_npa_aura_handle_to_aura(aura_handle) << shift;
 	addr = (int64_t *)(roc_npa_aura_handle_to_base(aura_handle) +
 			   NPA_LF_AURA_OP_CNT);
 	reg = roc_atomic64_add_nosync(wdata, addr);
@@ -87,11 +89,13 @@ static inline void
 roc_npa_aura_op_cnt_set(uint64_t aura_handle, const int sign, uint64_t count)
 {
 	uint64_t reg = count & (BIT_ULL(36) - 1);
+	uint64_t shift;
 
 	if (sign)
 		reg |= BIT_ULL(43); /* CNT_ADD */
 
-	reg |= (roc_npa_aura_handle_to_aura(aura_handle) << 44);
+	shift = roc_model_is_cn20k() ? 47 : 44;
+	reg |= (roc_npa_aura_handle_to_aura(aura_handle) << shift);
 
 	plt_write64(reg, roc_npa_aura_handle_to_base(aura_handle) +
 				 NPA_LF_AURA_OP_CNT);
@@ -100,11 +104,12 @@ roc_npa_aura_op_cnt_set(uint64_t aura_handle, const int sign, uint64_t count)
 static inline uint64_t
 roc_npa_aura_op_limit_get(uint64_t aura_handle)
 {
-	uint64_t wdata;
+	uint64_t wdata, shift;
 	int64_t *addr;
 	uint64_t reg;
 
-	wdata = roc_npa_aura_handle_to_aura(aura_handle) << 44;
+	shift = roc_model_is_cn20k() ? 47 : 44;
+	wdata = roc_npa_aura_handle_to_aura(aura_handle) << shift;
 	addr = (int64_t *)(roc_npa_aura_handle_to_base(aura_handle) +
 			   NPA_LF_AURA_OP_LIMIT);
 	reg = roc_atomic64_add_nosync(wdata, addr);
@@ -119,8 +124,10 @@ static inline void
 roc_npa_aura_op_limit_set(uint64_t aura_handle, uint64_t limit)
 {
 	uint64_t reg = limit & ROC_AURA_OP_LIMIT_MASK;
+	uint64_t shift;
 
-	reg |= (roc_npa_aura_handle_to_aura(aura_handle) << 44);
+	shift = roc_model_is_cn20k() ? 47 : 44;
+	reg |= (roc_npa_aura_handle_to_aura(aura_handle) << shift);
 
 	plt_write64(reg, roc_npa_aura_handle_to_base(aura_handle) +
 				 NPA_LF_AURA_OP_LIMIT);
@@ -129,11 +136,12 @@ roc_npa_aura_op_limit_set(uint64_t aura_handle, uint64_t limit)
 static inline uint64_t
 roc_npa_aura_op_available(uint64_t aura_handle)
 {
-	uint64_t wdata;
+	uint64_t wdata, shift;
 	uint64_t reg;
 	int64_t *addr;
 
-	wdata = roc_npa_aura_handle_to_aura(aura_handle) << 44;
+	shift = roc_model_is_cn20k() ? 47 : 44;
+	wdata = roc_npa_aura_handle_to_aura(aura_handle) << shift;
 	addr = (int64_t *)(roc_npa_aura_handle_to_base(aura_handle) +
 			   NPA_LF_POOL_OP_AVAILABLE);
 	reg = roc_atomic64_add_nosync(wdata, addr);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 03/33] common/cnxk: use new NPA aq enq mbox for cn20k
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 01/33] mempool/cnxk: add cn20k PCI device ids Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 02/33] common/cnxk: accommodate change in aura field width Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 04/33] mempool/cnxk: initialize mempool ops " Nithin Dabilpuram
                   ` (32 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Ashwin Sekhar T K

From: Ashwin Sekhar T K <asekhar@marvell.com>

A new mbox npa_cn20k_aq_enq_req has been added
for cn20k. Use this mbox for NPA configurations.

Note that the size of these new mbox request and
response remains same when compared to the older
mboxes. Also the new contexts npa_cn20k_aura_s/
npa_cn20k_pool_s which has been added for cn20k are
also same in size as older npa_aura_s/npa_pool_s.
So, we will be able to typecast these structures
into each other for most cases. Only the fields
that have changed in width/positions need to be
taken care of.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
---
 drivers/common/cnxk/hw/npa.h         | 164 ++++++++++++++++++++++++---
 drivers/common/cnxk/roc_mbox.h       |  32 ++++++
 drivers/common/cnxk/roc_nix_debug.c  |   9 +-
 drivers/common/cnxk/roc_nix_fc.c     |  54 ++++++---
 drivers/common/cnxk/roc_nix_tm_ops.c |  15 ++-
 drivers/common/cnxk/roc_npa.c        | 100 ++++++++++++++--
 drivers/common/cnxk/roc_npa_debug.c  |  17 ++-
 7 files changed, 339 insertions(+), 52 deletions(-)

diff --git a/drivers/common/cnxk/hw/npa.h b/drivers/common/cnxk/hw/npa.h
index 891a1b2b5f..4fd1f9a64b 100644
--- a/drivers/common/cnxk/hw/npa.h
+++ b/drivers/common/cnxk/hw/npa.h
@@ -216,10 +216,10 @@ struct npa_aura_op_wdata_s {
 	uint64_t drop : 1;
 };
 
-/* NPA aura context structure */
+/* NPA aura context structure [CN9K, CN10K] */
 struct npa_aura_s {
 	uint64_t pool_addr : 64; /* W0 */
-	uint64_t ena : 1;
+	uint64_t ena : 1; /* W1 */
 	uint64_t rsvd_66_65 : 2;
 	uint64_t pool_caching : 1;
 	uint64_t pool_way_mask : 16;
@@ -233,24 +233,24 @@ struct npa_aura_s {
 	uint64_t shift : 6;
 	uint64_t rsvd_119_118 : 2;
 	uint64_t avg_level : 8;
-	uint64_t count : 36;
+	uint64_t count : 36; /* W2 */
 	uint64_t rsvd_167_164 : 4;
 	uint64_t nix0_bpid : 9;
 	uint64_t rsvd_179_177 : 3;
 	uint64_t nix1_bpid : 9;
 	uint64_t rsvd_191_189 : 3;
-	uint64_t limit : 36;
+	uint64_t limit : 36; /* W3 */
 	uint64_t rsvd_231_228 : 4;
 	uint64_t bp : 8;
 	uint64_t rsvd_242_240 : 3;
-	uint64_t fc_be : 1; /* [CN10K, .) */
+	uint64_t fc_be : 1; /* [CN10K] */
 	uint64_t fc_ena : 1;
 	uint64_t fc_up_crossing : 1;
 	uint64_t fc_stype : 2;
 	uint64_t fc_hyst_bits : 4;
 	uint64_t rsvd_255_252 : 4;
 	uint64_t fc_addr : 64; /* W4 */
-	uint64_t pool_drop : 8;
+	uint64_t pool_drop : 8; /* W5 */
 	uint64_t update_time : 16;
 	uint64_t err_int : 8;
 	uint64_t err_int_ena : 8;
@@ -262,17 +262,17 @@ struct npa_aura_s {
 	uint64_t rsvd_371 : 1;
 	uint64_t err_qint_idx : 7;
 	uint64_t rsvd_383_379 : 5;
-	uint64_t thresh : 36;
+	uint64_t thresh : 36; /* W6 */
 	uint64_t rsvd_423_420 : 4;
-	uint64_t fc_msh_dst : 11; /* [CN10K, .) */
+	uint64_t fc_msh_dst : 11; /* [CN10K] */
 	uint64_t rsvd_447_435 : 13;
 	uint64_t rsvd_511_448 : 64; /* W7 */
 };
 
-/* NPA pool context structure */
+/* NPA pool context structure [CN9K, CN10K] */
 struct npa_pool_s {
 	uint64_t stack_base : 64; /* W0 */
-	uint64_t ena : 1;
+	uint64_t ena : 1; /* W1 */
 	uint64_t nat_align : 1;
 	uint64_t rsvd_67_66 : 2;
 	uint64_t stack_caching : 1;
@@ -282,11 +282,11 @@ struct npa_pool_s {
 	uint64_t rsvd_103_100 : 4;
 	uint64_t buf_size : 11;
 	uint64_t rsvd_127_115 : 13;
-	uint64_t stack_max_pages : 32;
+	uint64_t stack_max_pages : 32; /* W2 */
 	uint64_t stack_pages : 32;
-	uint64_t op_pc : 48;
+	uint64_t op_pc : 48; /* W3 */
 	uint64_t rsvd_255_240 : 16;
-	uint64_t stack_offset : 4;
+	uint64_t stack_offset : 4; /* W4 */
 	uint64_t rsvd_263_260 : 4;
 	uint64_t shift : 6;
 	uint64_t rsvd_271_270 : 2;
@@ -296,14 +296,14 @@ struct npa_pool_s {
 	uint64_t fc_stype : 2;
 	uint64_t fc_hyst_bits : 4;
 	uint64_t fc_up_crossing : 1;
-	uint64_t fc_be : 1; /* [CN10K, .) */
+	uint64_t fc_be : 1; /* [CN10K] */
 	uint64_t rsvd_299_298 : 2;
 	uint64_t update_time : 16;
 	uint64_t rsvd_319_316 : 4;
 	uint64_t fc_addr : 64;	 /* W5 */
 	uint64_t ptr_start : 64; /* W6 */
 	uint64_t ptr_end : 64;	 /* W7 */
-	uint64_t rsvd_535_512 : 24;
+	uint64_t rsvd_535_512 : 24; /* W8 */
 	uint64_t err_int : 8;
 	uint64_t err_int_ena : 8;
 	uint64_t thresh_int : 1;
@@ -314,9 +314,9 @@ struct npa_pool_s {
 	uint64_t rsvd_563 : 1;
 	uint64_t err_qint_idx : 7;
 	uint64_t rsvd_575_571 : 5;
-	uint64_t thresh : 36;
+	uint64_t thresh : 36; /* W9 */
 	uint64_t rsvd_615_612 : 4;
-	uint64_t fc_msh_dst : 11; /* [CN10K, .) */
+	uint64_t fc_msh_dst : 11; /* [CN10K] */
 	uint64_t rsvd_639_627 : 13;
 	uint64_t rsvd_703_640 : 64;  /* W10 */
 	uint64_t rsvd_767_704 : 64;  /* W11 */
@@ -326,6 +326,136 @@ struct npa_pool_s {
 	uint64_t rsvd_1023_960 : 64; /* W15 */
 };
 
+/* NPA aura context structure [CN20K] */
+struct npa_cn20k_aura_s {
+	uint64_t pool_addr : 64; /* W0 */
+	uint64_t ena : 1;   /* W1 */
+	uint64_t rsvd_66_65 : 2;
+	uint64_t pool_caching : 1;
+	uint64_t rsvd_68 : 16;
+	uint64_t avg_con : 9;
+	uint64_t rsvd_93 : 1;
+	uint64_t pool_drop_ena : 1;
+	uint64_t aura_drop_ena : 1;
+	uint64_t bp_ena : 1;
+	uint64_t rsvd_103_97 : 7;
+	uint64_t aura_drop : 8;
+	uint64_t shift : 6;
+	uint64_t rsvd_119_118 : 2;
+	uint64_t avg_level : 8;
+	uint64_t count : 36; /* W2 */
+	uint64_t rsvd_167_164 : 4;
+	uint64_t bpid : 12;
+	uint64_t rsvd_191_180 : 12;
+	uint64_t limit : 36; /* W3 */
+	uint64_t rsvd_231_228 : 4;
+	uint64_t bp : 7;
+	uint64_t rsvd_243_239 : 5;
+	uint64_t fc_ena : 1;
+	uint64_t fc_up_crossing : 1;
+	uint64_t fc_stype : 2;
+	uint64_t fc_hyst_bits : 4;
+	uint64_t rsvd_255_252 : 4;
+	uint64_t fc_addr : 64;  /* W4 */
+	uint64_t pool_drop : 8; /* W5 */
+	uint64_t update_time : 16;
+	uint64_t err_int : 8;
+	uint64_t err_int_ena : 8;
+	uint64_t thresh_int : 1;
+	uint64_t thresh_int_ena : 1;
+	uint64_t thresh_up : 1;
+	uint64_t rsvd_363 : 1;
+	uint64_t thresh_qint_idx : 7;
+	uint64_t rsvd_371 : 1;
+	uint64_t err_qint_idx : 7;
+	uint64_t rsvd_383_379 : 5;
+	uint64_t thresh : 36; /* W6*/
+	uint64_t rsvd_423_420 : 4;
+	uint64_t fc_msh_dst : 11;
+	uint64_t rsvd_438_435 : 4;
+	uint64_t op_dpc_ena : 1;
+	uint64_t op_dpc_set : 6;
+	uint64_t stream_ctx : 1;
+	uint64_t unified_ctx : 1;
+	uint64_t rsvd_511_448 : 64; /* W7 */
+};
+
+/* NPA pool context structure [CN20K] */
+struct npa_cn20k_pool_s {
+	uint64_t stack_base : 64; /* W0 */
+	uint64_t ena : 1; /* W1 */
+	uint64_t nat_align : 1;
+	uint64_t rsvd_67_66 : 2;
+	uint64_t stack_caching : 1;
+	uint64_t rsvd_87_69 : 19;
+	uint64_t buf_offset : 12;
+	uint64_t rsvd_103_100 : 4;
+	uint64_t buf_size : 12;
+	uint64_t rsvd_119_116 : 4;
+	uint64_t ref_cnt_prof : 3;
+	uint64_t rsvd_127_123 : 5;
+	uint64_t stack_max_pages : 32; /* W2 */
+	uint64_t stack_pages : 32;
+	uint64_t bp_0 : 7; /* W3 */
+	uint64_t bp_1 : 7;
+	uint64_t bp_2 : 7;
+	uint64_t bp_3 : 7;
+	uint64_t bp_4 : 7;
+	uint64_t bp_5 : 7;
+	uint64_t bp_6 : 7;
+	uint64_t bp_7 : 7;
+	uint64_t bp_ena_0 : 1;
+	uint64_t bp_ena_1 : 1;
+	uint64_t bp_ena_2 : 1;
+	uint64_t bp_ena_3 : 1;
+	uint64_t bp_ena_4 : 1;
+	uint64_t bp_ena_5 : 1;
+	uint64_t bp_ena_6 : 1;
+	uint64_t bp_ena_7 : 1;
+	uint64_t stack_offset : 4; /* W4 */
+	uint64_t rsvd_263_260 : 4;
+	uint64_t shift : 6;
+	uint64_t rsvd_271_270 : 2;
+	uint64_t avg_level : 8;
+	uint64_t avg_con : 9;
+	uint64_t fc_ena : 1;
+	uint64_t fc_stype : 2;
+	uint64_t fc_hyst_bits : 4;
+	uint64_t fc_up_crossing : 1;
+	uint64_t rsvd_299_297 : 3;
+	uint64_t update_time : 16;
+	uint64_t rsvd_319_316 : 4;
+	uint64_t fc_addr : 64;   /* W5 */
+	uint64_t ptr_start : 64; /* W6 */
+	uint64_t ptr_end : 64;   /* W7 */
+	uint64_t bpid_0 : 12; /* W8 */
+	uint64_t rsvd_535_524 : 12;
+	uint64_t err_int : 8;
+	uint64_t err_int_ena : 8;
+	uint64_t thresh_int : 1;
+	uint64_t thresh_int_ena : 1;
+	uint64_t thresh_up : 1;
+	uint64_t rsvd_555 : 1;
+	uint64_t thresh_qint_idx : 7;
+	uint64_t rsvd_563 : 1;
+	uint64_t err_qint_idx : 7;
+	uint64_t rsvd_575_571 : 5;
+	uint64_t thresh : 36; /* W9 */
+	uint64_t rsvd_615_612 : 4;
+	uint64_t fc_msh_dst : 11;
+	uint64_t rsvd_630_627 : 4;
+	uint64_t op_dpc_ena : 1;
+	uint64_t op_dpc_set : 6;
+	uint64_t stream_ctx : 1;
+	uint64_t rsvd_639 : 1;
+	uint64_t rsvd_703_640 : 64;  /* W10 */
+	uint64_t rsvd_767_704 : 64;  /* W11 */
+	uint64_t rsvd_831_768 : 64;  /* W12 */
+	uint64_t rsvd_895_832 : 64;  /* W13 */
+	uint64_t rsvd_959_896 : 64;  /* W14 */
+	uint64_t rsvd_1023_960 : 64; /* W15 */
+};
+
 /* NPA queue interrupt context hardware structure */
 struct npa_qint_hw_s {
 	uint32_t count : 22;
diff --git a/drivers/common/cnxk/roc_mbox.h b/drivers/common/cnxk/roc_mbox.h
index f1a3371ef9..9a9dcbdbda 100644
--- a/drivers/common/cnxk/roc_mbox.h
+++ b/drivers/common/cnxk/roc_mbox.h
@@ -119,6 +119,8 @@ struct mbox_msghdr {
 	M(NPA_AQ_ENQ, 0x402, npa_aq_enq, npa_aq_enq_req, npa_aq_enq_rsp)       \
 	M(NPA_HWCTX_DISABLE, 0x403, npa_hwctx_disable, hwctx_disable_req,      \
 	  msg_rsp)                                                             \
+	M(NPA_CN20K_AQ_ENQ, 0x404, npa_cn20k_aq_enq, npa_cn20k_aq_enq_req,     \
+	  npa_cn20k_aq_enq_rsp)                                                \
 	/* SSO/SSOW mbox IDs (range 0x600 - 0x7FF) */                          \
 	M(SSO_LF_ALLOC, 0x600, sso_lf_alloc, sso_lf_alloc_req,                 \
 	  sso_lf_alloc_rsp)                                                    \
@@ -1325,6 +1327,36 @@ struct npa_aq_enq_rsp {
 	};
 };
 
+struct npa_cn20k_aq_enq_req {
+	struct mbox_msghdr hdr;
+	uint32_t __io aura_id;
+	uint8_t __io ctype;
+	uint8_t __io op;
+	union {
+		/* Valid when op == WRITE/INIT and ctype == AURA */
+		__io struct npa_cn20k_aura_s aura;
+		/* Valid when op == WRITE/INIT and ctype == POOL */
+		__io struct npa_cn20k_pool_s pool;
+	};
+	/* Mask data when op == WRITE (1=write, 0=don't write) */
+	union {
+		/* Valid when op == WRITE and ctype == AURA */
+		__io struct npa_cn20k_aura_s aura_mask;
+		/* Valid when op == WRITE and ctype == POOL */
+		__io struct npa_cn20k_pool_s pool_mask;
+	};
+};
+
+struct npa_cn20k_aq_enq_rsp {
+	struct mbox_msghdr hdr;
+	union {
+		/* Valid when op == READ and ctype == AURA */
+		__io struct npa_cn20k_aura_s aura;
+		/* Valid when op == READ and ctype == POOL */
+		__io struct npa_cn20k_pool_s pool;
+	};
+};
+
 /* Disable all contexts of type 'ctype' */
 struct hwctx_disable_req {
 	struct mbox_msghdr hdr;
diff --git a/drivers/common/cnxk/roc_nix_debug.c b/drivers/common/cnxk/roc_nix_debug.c
index 26546f9297..2e91470c09 100644
--- a/drivers/common/cnxk/roc_nix_debug.c
+++ b/drivers/common/cnxk/roc_nix_debug.c
@@ -690,6 +690,7 @@ int
 roc_nix_queues_ctx_dump(struct roc_nix *roc_nix, FILE *file)
 {
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	struct npa_cn20k_aq_enq_req *npa_aq_cn20k;
 	int rc = -1, q, rq = nix->nb_rx_queues;
 	struct npa_aq_enq_rsp *npa_rsp;
 	struct npa_aq_enq_req *npa_aq;
@@ -772,8 +773,12 @@ roc_nix_queues_ctx_dump(struct roc_nix *roc_nix, FILE *file)
 			continue;
 		}
 
-		/* Dump SQB Aura minimal info */
-		npa_aq = mbox_alloc_msg_npa_aq_enq(mbox_get(npa_lf->mbox));
+		if (roc_model_is_cn20k()) {
+			npa_aq_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox_get(npa_lf->mbox));
+			npa_aq = (struct npa_aq_enq_req *)npa_aq_cn20k; /* Common fields */
+		} else {
+			npa_aq = mbox_alloc_msg_npa_aq_enq(mbox_get(npa_lf->mbox));
+		}
 		if (npa_aq == NULL) {
 			rc = -ENOSPC;
 			mbox_put(npa_lf->mbox);
diff --git a/drivers/common/cnxk/roc_nix_fc.c b/drivers/common/cnxk/roc_nix_fc.c
index 12bfb9816b..2f72e67993 100644
--- a/drivers/common/cnxk/roc_nix_fc.c
+++ b/drivers/common/cnxk/roc_nix_fc.c
@@ -158,6 +158,8 @@ static int
 nix_fc_rq_config_get(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 {
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	struct npa_cn20k_aq_enq_req *npa_req_cn20k;
+	struct npa_cn20k_aq_enq_rsp *npa_rsp_cn20k;
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = mbox_get(dev->mbox);
 	struct nix_aq_enq_rsp *rsp;
@@ -195,24 +197,44 @@ nix_fc_rq_config_get(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 	if (rc)
 		goto exit;
 
-	npa_req = mbox_alloc_msg_npa_aq_enq(mbox);
-	if (!npa_req) {
-		rc = -ENOSPC;
-		goto exit;
+	if (roc_model_is_cn20k()) {
+		npa_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		if (!npa_req_cn20k) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		npa_req_cn20k->aura_id = rsp->rq.lpb_aura;
+		npa_req_cn20k->ctype = NPA_AQ_CTYPE_AURA;
+		npa_req_cn20k->op = NPA_AQ_INSTOP_READ;
+
+		rc = mbox_process_msg(mbox, (void *)&npa_rsp_cn20k);
+		if (rc)
+			goto exit;
+
+		fc_cfg->cq_cfg.cq_drop = npa_rsp_cn20k->aura.bp;
+		fc_cfg->cq_cfg.enable = npa_rsp_cn20k->aura.bp_ena;
+		fc_cfg->type = ROC_NIX_FC_RQ_CFG;
+	} else {
+		npa_req = mbox_alloc_msg_npa_aq_enq(mbox);
+		if (!npa_req) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		npa_req->aura_id = rsp->rq.lpb_aura;
+		npa_req->ctype = NPA_AQ_CTYPE_AURA;
+		npa_req->op = NPA_AQ_INSTOP_READ;
+
+		rc = mbox_process_msg(mbox, (void *)&npa_rsp);
+		if (rc)
+			goto exit;
+
+		fc_cfg->cq_cfg.cq_drop = npa_rsp->aura.bp;
+		fc_cfg->cq_cfg.enable = npa_rsp->aura.bp_ena;
+		fc_cfg->type = ROC_NIX_FC_RQ_CFG;
 	}
 
-	npa_req->aura_id = rsp->rq.lpb_aura;
-	npa_req->ctype = NPA_AQ_CTYPE_AURA;
-	npa_req->op = NPA_AQ_INSTOP_READ;
-
-	rc = mbox_process_msg(mbox, (void *)&npa_rsp);
-	if (rc)
-		goto exit;
-
-	fc_cfg->cq_cfg.cq_drop = npa_rsp->aura.bp;
-	fc_cfg->cq_cfg.enable = npa_rsp->aura.bp_ena;
-	fc_cfg->type = ROC_NIX_FC_RQ_CFG;
-
 exit:
 	mbox_put(mbox);
 	return rc;
diff --git a/drivers/common/cnxk/roc_nix_tm_ops.c b/drivers/common/cnxk/roc_nix_tm_ops.c
index 9f3870a311..8144675f89 100644
--- a/drivers/common/cnxk/roc_nix_tm_ops.c
+++ b/drivers/common/cnxk/roc_nix_tm_ops.c
@@ -8,6 +8,7 @@
 int
 roc_nix_tm_sq_aura_fc(struct roc_nix_sq *sq, bool enable)
 {
+	struct npa_cn20k_aq_enq_req *req_cn20k;
 	struct npa_aq_enq_req *req;
 	struct npa_aq_enq_rsp *rsp;
 	uint64_t aura_handle;
@@ -25,7 +26,12 @@ roc_nix_tm_sq_aura_fc(struct roc_nix_sq *sq, bool enable)
 	mbox = mbox_get(lf->mbox);
 	/* Set/clear sqb aura fc_ena */
 	aura_handle = sq->aura_handle;
-	req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		req = (struct npa_aq_enq_req *)req_cn20k;
+	} else {
+		req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (req == NULL)
 		goto exit;
 
@@ -52,7 +58,12 @@ roc_nix_tm_sq_aura_fc(struct roc_nix_sq *sq, bool enable)
 
 	/* Read back npa aura ctx */
 	if (enable) {
-		req = mbox_alloc_msg_npa_aq_enq(mbox);
+		if (roc_model_is_cn20k()) {
+			req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+			req = (struct npa_aq_enq_req *)req_cn20k;
+		} else {
+			req = mbox_alloc_msg_npa_aq_enq(mbox);
+		}
 		if (req == NULL) {
 			rc = -ENOSPC;
 			goto exit;
diff --git a/drivers/common/cnxk/roc_npa.c b/drivers/common/cnxk/roc_npa.c
index 6c14c49901..934d7361a9 100644
--- a/drivers/common/cnxk/roc_npa.c
+++ b/drivers/common/cnxk/roc_npa.c
@@ -76,6 +76,7 @@ static int
 npa_aura_pool_init(struct mbox *m_box, uint32_t aura_id, struct npa_aura_s *aura,
 		   struct npa_pool_s *pool)
 {
+	struct npa_cn20k_aq_enq_req *aura_init_req_cn20k, *pool_init_req_cn20k;
 	struct npa_aq_enq_req *aura_init_req, *pool_init_req;
 	struct npa_aq_enq_rsp *aura_init_rsp, *pool_init_rsp;
 	struct mbox_dev *mdev = &m_box->dev[0];
@@ -83,7 +84,12 @@ npa_aura_pool_init(struct mbox *m_box, uint32_t aura_id, struct npa_aura_s *aura
 	struct mbox *mbox;
 
 	mbox = mbox_get(m_box);
-	aura_init_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		aura_init_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		aura_init_req = (struct npa_aq_enq_req *)aura_init_req_cn20k;
+	} else {
+		aura_init_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (aura_init_req == NULL)
 		goto exit;
 	aura_init_req->aura_id = aura_id;
@@ -91,6 +97,12 @@ npa_aura_pool_init(struct mbox *m_box, uint32_t aura_id, struct npa_aura_s *aura
 	aura_init_req->op = NPA_AQ_INSTOP_INIT;
 	mbox_memcpy(&aura_init_req->aura, aura, sizeof(*aura));
 
+	if (roc_model_is_cn20k()) {
+		pool_init_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		pool_init_req = (struct npa_aq_enq_req *)pool_init_req_cn20k;
+	} else {
+		pool_init_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	pool_init_req = mbox_alloc_msg_npa_aq_enq(mbox);
 	if (pool_init_req == NULL)
 		goto exit;
@@ -121,13 +133,19 @@ npa_aura_pool_init(struct mbox *m_box, uint32_t aura_id, struct npa_aura_s *aura
 static int
 npa_aura_init(struct mbox *m_box, uint32_t aura_id, struct npa_aura_s *aura)
 {
+	struct npa_cn20k_aq_enq_req *aura_init_req_cn20k;
 	struct npa_aq_enq_req *aura_init_req;
 	struct npa_aq_enq_rsp *aura_init_rsp;
 	struct mbox *mbox;
 	int rc = -ENOSPC;
 
 	mbox = mbox_get(m_box);
-	aura_init_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		aura_init_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		aura_init_req = (struct npa_aq_enq_req *)aura_init_req_cn20k;
+	} else {
+		aura_init_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (aura_init_req == NULL)
 		goto exit;
 	aura_init_req->aura_id = aura_id;
@@ -151,6 +169,7 @@ npa_aura_init(struct mbox *m_box, uint32_t aura_id, struct npa_aura_s *aura)
 static int
 npa_aura_pool_fini(struct mbox *m_box, uint32_t aura_id, uint64_t aura_handle)
 {
+	struct npa_cn20k_aq_enq_req *aura_req_cn20k, *pool_req_cn20k;
 	struct npa_aq_enq_req *aura_req, *pool_req;
 	struct npa_aq_enq_rsp *aura_rsp, *pool_rsp;
 	struct mbox_dev *mdev = &m_box->dev[0];
@@ -168,7 +187,12 @@ npa_aura_pool_fini(struct mbox *m_box, uint32_t aura_id, uint64_t aura_handle)
 	} while (ptr);
 
 	mbox = mbox_get(m_box);
-	pool_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		pool_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		pool_req = (struct npa_aq_enq_req *)pool_req_cn20k;
+	} else {
+		pool_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (pool_req == NULL)
 		goto exit;
 	pool_req->aura_id = aura_id;
@@ -177,7 +201,12 @@ npa_aura_pool_fini(struct mbox *m_box, uint32_t aura_id, uint64_t aura_handle)
 	pool_req->pool.ena = 0;
 	pool_req->pool_mask.ena = ~pool_req->pool_mask.ena;
 
-	aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		aura_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		aura_req = (struct npa_aq_enq_req *)aura_req_cn20k;
+	} else {
+		aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (aura_req == NULL)
 		goto exit;
 	aura_req->aura_id = aura_id;
@@ -185,8 +214,18 @@ npa_aura_pool_fini(struct mbox *m_box, uint32_t aura_id, uint64_t aura_handle)
 	aura_req->op = NPA_AQ_INSTOP_WRITE;
 	aura_req->aura.ena = 0;
 	aura_req->aura_mask.ena = ~aura_req->aura_mask.ena;
-	aura_req->aura.bp_ena = 0;
-	aura_req->aura_mask.bp_ena = ~aura_req->aura_mask.bp_ena;
+	if (roc_model_is_cn20k()) {
+		__io struct npa_cn20k_aura_s *aura_cn20k, *aura_mask_cn20k;
+
+		/* The bit positions/width of bp_ena has changed in cn20k */
+		aura_cn20k = (__io struct npa_cn20k_aura_s *)&aura_req->aura;
+		aura_cn20k->bp_ena = 0;
+		aura_mask_cn20k = (__io struct npa_cn20k_aura_s *)&aura_req->aura_mask;
+		aura_mask_cn20k->bp_ena = ~aura_mask_cn20k->bp_ena;
+	} else {
+		aura_req->aura.bp_ena = 0;
+		aura_req->aura_mask.bp_ena = ~aura_req->aura_mask.bp_ena;
+	}
 
 	rc = mbox_process(mbox);
 	if (rc < 0)
@@ -204,6 +243,12 @@ npa_aura_pool_fini(struct mbox *m_box, uint32_t aura_id, uint64_t aura_handle)
 		goto exit;
 	}
 
+	if (roc_model_is_cn20k()) {
+		/* In cn20k, NPA does not use NDC */
+		rc = 0;
+		goto exit;
+	}
+
 	/* Sync NDC-NPA for LF */
 	ndc_req = mbox_alloc_msg_ndc_sync_op(mbox);
 	if (ndc_req == NULL) {
@@ -226,6 +271,7 @@ npa_aura_pool_fini(struct mbox *m_box, uint32_t aura_id, uint64_t aura_handle)
 static int
 npa_aura_fini(struct mbox *m_box, uint32_t aura_id)
 {
+	struct npa_cn20k_aq_enq_req *aura_req_cn20k;
 	struct npa_aq_enq_req *aura_req;
 	struct npa_aq_enq_rsp *aura_rsp;
 	struct ndc_sync_op *ndc_req;
@@ -236,7 +282,12 @@ npa_aura_fini(struct mbox *m_box, uint32_t aura_id)
 	plt_delay_us(10);
 
 	mbox = mbox_get(m_box);
-	aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		aura_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		aura_req = (struct npa_aq_enq_req *)aura_req_cn20k;
+	} else {
+		aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (aura_req == NULL)
 		goto exit;
 	aura_req->aura_id = aura_id;
@@ -254,6 +305,12 @@ npa_aura_fini(struct mbox *m_box, uint32_t aura_id)
 		goto exit;
 	}
 
+	if (roc_model_is_cn20k()) {
+		/* In cn20k, NPA does not use NDC */
+		rc = 0;
+		goto exit;
+	}
+
 	/* Sync NDC-NPA for LF */
 	ndc_req = mbox_alloc_msg_ndc_sync_op(mbox);
 	if (ndc_req == NULL) {
@@ -335,6 +392,7 @@ roc_npa_pool_op_pc_reset(uint64_t aura_handle)
 int
 roc_npa_aura_drop_set(uint64_t aura_handle, uint64_t limit, bool ena)
 {
+	struct npa_cn20k_aq_enq_req *aura_req_cn20k;
 	struct npa_aq_enq_req *aura_req;
 	struct npa_lf *lf;
 	struct mbox *mbox;
@@ -344,7 +402,12 @@ roc_npa_aura_drop_set(uint64_t aura_handle, uint64_t limit, bool ena)
 	if (lf == NULL)
 		return NPA_ERR_DEVICE_NOT_BOUNDED;
 	mbox = mbox_get(lf->mbox);
-	aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		aura_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		aura_req = (struct npa_aq_enq_req *)aura_req_cn20k;
+	} else {
+		aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (aura_req == NULL) {
 		rc = -ENOMEM;
 		goto exit;
@@ -723,6 +786,7 @@ roc_npa_aura_create(uint64_t *aura_handle, uint32_t block_count,
 int
 roc_npa_aura_limit_modify(uint64_t aura_handle, uint16_t aura_limit)
 {
+	struct npa_cn20k_aq_enq_req *aura_req_cn20k;
 	struct npa_aq_enq_req *aura_req;
 	struct npa_lf *lf;
 	struct mbox *mbox;
@@ -733,7 +797,12 @@ roc_npa_aura_limit_modify(uint64_t aura_handle, uint16_t aura_limit)
 		return NPA_ERR_DEVICE_NOT_BOUNDED;
 
 	mbox = mbox_get(lf->mbox);
-	aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		aura_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		aura_req = (struct npa_aq_enq_req *)aura_req_cn20k;
+	} else {
+		aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (aura_req == NULL) {
 		rc = -ENOMEM;
 		goto exit;
@@ -834,12 +903,13 @@ int
 roc_npa_pool_range_update_check(uint64_t aura_handle)
 {
 	uint64_t aura_id = roc_npa_aura_handle_to_aura(aura_handle);
-	struct npa_lf *lf;
-	struct npa_aura_lim *lim;
+	struct npa_cn20k_aq_enq_req *req_cn20k;
 	__io struct npa_pool_s *pool;
 	struct npa_aq_enq_req *req;
 	struct npa_aq_enq_rsp *rsp;
+	struct npa_aura_lim *lim;
 	struct mbox *mbox;
+	struct npa_lf *lf;
 	int rc;
 
 	lf = idev_npa_obj_get();
@@ -849,7 +919,12 @@ roc_npa_pool_range_update_check(uint64_t aura_handle)
 	lim = lf->aura_lim;
 
 	mbox = mbox_get(lf->mbox);
-	req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		req = (struct npa_aq_enq_req *)req_cn20k;
+	} else {
+		req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (req == NULL) {
 		rc = -ENOSPC;
 		goto exit;
@@ -903,6 +978,7 @@ int
 roc_npa_aura_bp_configure(uint64_t aura_handle, uint16_t bpid, uint8_t bp_intf, uint8_t bp_thresh,
 			  bool enable)
 {
+	/* TODO: Add support for CN20K */
 	uint32_t aura_id = roc_npa_aura_handle_to_aura(aura_handle);
 	struct npa_lf *lf = idev_npa_obj_get();
 	struct npa_aq_enq_req *req;
diff --git a/drivers/common/cnxk/roc_npa_debug.c b/drivers/common/cnxk/roc_npa_debug.c
index 173d32cd9b..9a16f481a8 100644
--- a/drivers/common/cnxk/roc_npa_debug.c
+++ b/drivers/common/cnxk/roc_npa_debug.c
@@ -89,8 +89,9 @@ npa_aura_dump(__io struct npa_aura_s *aura)
 int
 roc_npa_ctx_dump(void)
 {
-	struct npa_aq_enq_req *aq;
+	struct npa_cn20k_aq_enq_req *aq_cn20k;
 	struct npa_aq_enq_rsp *rsp;
+	struct npa_aq_enq_req *aq;
 	struct mbox *mbox;
 	struct npa_lf *lf;
 	uint32_t q;
@@ -106,7 +107,12 @@ roc_npa_ctx_dump(void)
 		if (plt_bitmap_get(lf->npa_bmp, q))
 			continue;
 
-		aq = mbox_alloc_msg_npa_aq_enq(mbox);
+		if (roc_model_is_cn20k()) {
+			aq_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+			aq = (struct npa_aq_enq_req *)aq_cn20k;
+		} else {
+			aq = mbox_alloc_msg_npa_aq_enq(mbox);
+		}
 		if (aq == NULL) {
 			rc = -ENOSPC;
 			goto exit;
@@ -129,7 +135,12 @@ roc_npa_ctx_dump(void)
 		if (plt_bitmap_get(lf->npa_bmp, q))
 			continue;
 
-		aq = mbox_alloc_msg_npa_aq_enq(mbox);
+		if (roc_model_is_cn20k()) {
+			aq_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+			aq = (struct npa_aq_enq_req *)aq_cn20k;
+		} else {
+			aq = mbox_alloc_msg_npa_aq_enq(mbox);
+		}
 		if (aq == NULL) {
 			rc = -ENOSPC;
 			goto exit;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 04/33] mempool/cnxk: initialize mempool ops for cn20k
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (2 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 03/33] common/cnxk: use new NPA aq enq mbox for cn20k Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 05/33] net/cnxk: added telemetry support do dump SA information Nithin Dabilpuram
                   ` (31 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Ashwin Sekhar T K, Pavan Nikhilesh; +Cc: dev

From: Ashwin Sekhar T K <asekhar@marvell.com>

Initialize mempool ops for cn20k.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
---
 drivers/mempool/cnxk/cnxk_mempool_ops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mempool/cnxk/cnxk_mempool_ops.c b/drivers/mempool/cnxk/cnxk_mempool_ops.c
index a1aeaee746..bb35e2d1d2 100644
--- a/drivers/mempool/cnxk/cnxk_mempool_ops.c
+++ b/drivers/mempool/cnxk/cnxk_mempool_ops.c
@@ -192,7 +192,7 @@ cnxk_mempool_plt_init(void)
 
 	if (roc_model_is_cn9k()) {
 		rte_mbuf_set_platform_mempool_ops("cn9k_mempool_ops");
-	} else if (roc_model_is_cn10k()) {
+	} else if (roc_model_is_cn10k() || roc_model_is_cn20k()) {
 		rte_mbuf_set_platform_mempool_ops("cn10k_mempool_ops");
 		rc = cn10k_mempool_plt_init();
 	}
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 05/33] net/cnxk: added telemetry support do dump SA information
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (3 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 04/33] mempool/cnxk: initialize mempool ops " Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 06/33] net/cnxk: handle timestamp correctly for VF Nithin Dabilpuram
                   ` (30 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rakesh Kudurumalla

From: Rakesh Kudurumalla <rkudurumalla@marvell.com>

Added new telemetry command to dump SA
taking portid and SA index as parameters.

Ex: /cnxk/ipsec/sa_info,0,3 dumps inbound and
outbound SA information of SA index 3

Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
---
 drivers/net/cnxk/cnxk_ethdev_sec_telemetry.c | 145 ++++++++++++++++---
 1 file changed, 128 insertions(+), 17 deletions(-)

diff --git a/drivers/net/cnxk/cnxk_ethdev_sec_telemetry.c b/drivers/net/cnxk/cnxk_ethdev_sec_telemetry.c
index 386278cfc9..86c2453c09 100644
--- a/drivers/net/cnxk/cnxk_ethdev_sec_telemetry.c
+++ b/drivers/net/cnxk/cnxk_ethdev_sec_telemetry.c
@@ -207,17 +207,121 @@ copy_inb_sa_10k(struct rte_tel_data *d, uint32_t i, void *sa)
 	return 0;
 }
 
+/* n_vals is the number of params to be parsed. */
+static int
+parse_params(const char *params, uint32_t *vals, size_t n_vals)
+{
+	char dlim[2] = ",";
+	char *params_args;
+	size_t count = 0;
+	char *token;
+
+	if (vals == NULL || params == NULL || strlen(params) == 0)
+		return -1;
+
+	/* strtok expects char * and param is const char *. Hence on using
+	 * params as "const char *" compiler throws warning.
+	 */
+	params_args = strdup(params);
+	if (params_args == NULL)
+		return -1;
+
+	token = strtok(params_args, dlim);
+	while (token && isdigit(*token) && count < n_vals) {
+		vals[count++] = strtoul(token, NULL, 10);
+		token = strtok(NULL, dlim);
+	}
+
+	free(params_args);
+
+	if (count < n_vals)
+		return -1;
+
+	return 0;
+}
+
+static int
+ethdev_sec_tel_handle_sa_info(const char *cmd __rte_unused, const char *params,
+			      struct rte_tel_data *d)
+{
+	struct cnxk_eth_sec_sess *eth_sec, *tvar;
+	struct rte_eth_dev *eth_dev;
+	struct cnxk_eth_dev *dev;
+	uint32_t port_id, sa_idx;
+	uint32_t vals[2] = {0};
+	uint32_t i;
+	int ret;
+
+	if (params == NULL || strlen(params) == 0 || !isdigit(*params))
+		return -EINVAL;
+
+	if (parse_params(params, vals, RTE_DIM(vals)) < 0)
+		return -EINVAL;
+
+	port_id = vals[0];
+	sa_idx = vals[1];
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		plt_err("Invalid port id %u", port_id);
+		return -EINVAL;
+	}
+
+	eth_dev = &rte_eth_devices[port_id];
+	if (!eth_dev) {
+		plt_err("Ethdev not available");
+		return -EINVAL;
+	}
+	dev = cnxk_eth_pmd_priv(eth_dev);
+
+	rte_tel_data_start_dict(d);
+
+	i = 0;
+	if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_SECURITY) {
+		tvar = NULL;
+		RTE_TAILQ_FOREACH_SAFE(eth_sec, &dev->outb.list, entry, tvar) {
+			if (eth_sec->sa_idx == sa_idx) {
+				rte_tel_data_add_dict_int(d, "outb_sa", 1);
+				if (roc_model_is_cn10k())
+					ret = copy_outb_sa_10k(d, i, eth_sec->sa);
+				else
+					ret = copy_outb_sa_9k(d, i, eth_sec->sa);
+				if (ret < 0)
+					return ret;
+				break;
+			}
+		}
+	}
+
+	i = 0;
+	if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SECURITY) {
+		tvar = NULL;
+		RTE_TAILQ_FOREACH_SAFE(eth_sec, &dev->inb.list, entry, tvar) {
+			if (eth_sec->sa_idx == sa_idx) {
+				rte_tel_data_add_dict_int(d, "inb_sa", 1);
+				if (roc_model_is_cn10k())
+					ret = copy_inb_sa_10k(d, i, eth_sec->sa);
+				else
+					ret = copy_inb_sa_9k(d, i, eth_sec->sa);
+				if (ret < 0)
+					return ret;
+				break;
+			}
+		}
+	}
+	return 0;
+}
+
 static int
 ethdev_sec_tel_handle_info(const char *cmd __rte_unused, const char *params,
 			   struct rte_tel_data *d)
 {
+	uint32_t min_outb_sa = UINT32_MAX, max_outb_sa = 0;
+	uint32_t min_inb_sa = UINT32_MAX, max_inb_sa = 0;
 	struct cnxk_eth_sec_sess *eth_sec, *tvar;
 	struct rte_eth_dev *eth_dev;
 	struct cnxk_eth_dev *dev;
 	uint16_t port_id;
 	char *end_p;
-	uint32_t i;
-	int ret;
 
 	if (params == NULL || strlen(params) == 0 || !isdigit(*params))
 		return -EINVAL;
@@ -246,32 +350,36 @@ ethdev_sec_tel_handle_info(const char *cmd __rte_unused, const char *params,
 
 	rte_tel_data_add_dict_int(d, "nb_outb_sa", dev->outb.nb_sess);
 
-	i = 0;
+	if (!dev->outb.nb_sess)
+		min_outb_sa = 0;
+
 	if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_SECURITY) {
 		tvar = NULL;
 		RTE_TAILQ_FOREACH_SAFE(eth_sec, &dev->outb.list, entry, tvar) {
-			if (roc_model_is_cn10k())
-				ret = copy_outb_sa_10k(d, i++, eth_sec->sa);
-			else
-				ret = copy_outb_sa_9k(d, i++, eth_sec->sa);
-			if (ret < 0)
-				return ret;
+			if (eth_sec->sa_idx < min_outb_sa)
+				min_outb_sa = eth_sec->sa_idx;
+			if (eth_sec->sa_idx > max_outb_sa)
+				max_outb_sa = eth_sec->sa_idx;
 		}
+		rte_tel_data_add_dict_int(d, "min_outb_sa", min_outb_sa);
+		rte_tel_data_add_dict_int(d, "max_outb_sa", max_outb_sa);
 	}
 
 	rte_tel_data_add_dict_int(d, "nb_inb_sa", dev->inb.nb_sess);
 
-	i = 0;
+	if (!dev->inb.nb_sess)
+		min_inb_sa = 0;
+
 	if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SECURITY) {
 		tvar = NULL;
 		RTE_TAILQ_FOREACH_SAFE(eth_sec, &dev->inb.list, entry, tvar) {
-			if (roc_model_is_cn10k())
-				ret = copy_inb_sa_10k(d, i++, eth_sec->sa);
-			else
-				ret = copy_inb_sa_9k(d, i++, eth_sec->sa);
-			if (ret < 0)
-				return ret;
+			if (eth_sec->sa_idx < min_inb_sa)
+				min_inb_sa = eth_sec->sa_idx;
+			if (eth_sec->sa_idx > max_inb_sa)
+				max_inb_sa = eth_sec->sa_idx;
 		}
+		rte_tel_data_add_dict_int(d, "min_inb_sa", min_inb_sa);
+		rte_tel_data_add_dict_int(d, "max_inb_sa", max_inb_sa);
 	}
 
 	return 0;
@@ -281,5 +389,8 @@ RTE_INIT(cnxk_ipsec_init_telemetry)
 {
 	rte_telemetry_register_cmd("/cnxk/ipsec/info",
 				   ethdev_sec_tel_handle_info,
-				   "Returns ipsec info. Parameters: port id");
+				   "Returns number of SA's and Max and Min SA. Parameters: port id");
+	rte_telemetry_register_cmd("/cnxk/ipsec/sa_info",
+				   ethdev_sec_tel_handle_sa_info,
+				   "Returns ipsec info. Parameters: port id & sa_idx");
 }
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 06/33] net/cnxk: handle timestamp correctly for VF
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (4 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 05/33] net/cnxk: added telemetry support do dump SA information Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 07/33] net/cnxk: update Rx offloads to handle timestamp Nithin Dabilpuram
                   ` (29 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rakesh Kudurumalla

From: Rakesh Kudurumalla <rkudurumalla@marvell.com>

When timestamp is enabled on PF in kernel and respective
VF is attached to application in DPDK mbuf_addr is getting
corrupted in cnxk_nix_timestamp_dynfield() as
"tstamp_dynfield_offset" is zero for PTP enabled PF
This patch fixes the same

Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
---
 drivers/net/cnxk/cn10k_ethdev.c | 12 +++++++++++-
 drivers/net/cnxk/cn9k_ethdev.c  | 12 +++++++++++-
 drivers/net/cnxk/cnxk_ethdev.c  |  2 +-
 3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/net/cnxk/cn10k_ethdev.c b/drivers/net/cnxk/cn10k_ethdev.c
index 55ed54bb0f..099890e959 100644
--- a/drivers/net/cnxk/cn10k_ethdev.c
+++ b/drivers/net/cnxk/cn10k_ethdev.c
@@ -473,7 +473,7 @@ cn10k_nix_ptp_info_update_cb(struct roc_nix *nix, bool ptp_en)
 	struct cnxk_eth_dev *dev = (struct cnxk_eth_dev *)nix;
 	struct rte_eth_dev *eth_dev;
 	struct cn10k_eth_rxq *rxq;
-	int i;
+	int i, rc;
 
 	if (!dev)
 		return -EINVAL;
@@ -496,7 +496,17 @@ cn10k_nix_ptp_info_update_cb(struct roc_nix *nix, bool ptp_en)
 		 * and MTU setting also requires MBOX message to be
 		 * sent(VF->PF)
 		 */
+		if (dev->ptp_en) {
+			rc = rte_mbuf_dyn_rx_timestamp_register
+				(&dev->tstamp.tstamp_dynfield_offset,
+				 &dev->tstamp.rx_tstamp_dynflag);
+			if (rc != 0) {
+				plt_err("Failed to register Rx timestamp field/flag");
+				return -EINVAL;
+			}
+		}
 		eth_dev->rx_pkt_burst = nix_ptp_vf_burst;
+		rte_eth_fp_ops[eth_dev->data->port_id].rx_pkt_burst = eth_dev->rx_pkt_burst;
 		rte_mb();
 	}
 
diff --git a/drivers/net/cnxk/cn9k_ethdev.c b/drivers/net/cnxk/cn9k_ethdev.c
index ea92b1dcb6..4851e60f16 100644
--- a/drivers/net/cnxk/cn9k_ethdev.c
+++ b/drivers/net/cnxk/cn9k_ethdev.c
@@ -432,7 +432,7 @@ cn9k_nix_ptp_info_update_cb(struct roc_nix *nix, bool ptp_en)
 	struct cnxk_eth_dev *dev = (struct cnxk_eth_dev *)nix;
 	struct rte_eth_dev *eth_dev;
 	struct cn9k_eth_rxq *rxq;
-	int i;
+	int i, rc;
 
 	if (!dev)
 		return -EINVAL;
@@ -455,7 +455,17 @@ cn9k_nix_ptp_info_update_cb(struct roc_nix *nix, bool ptp_en)
 		 * and MTU setting also requires MBOX message to be
 		 * sent(VF->PF)
 		 */
+		if (dev->ptp_en) {
+			rc = rte_mbuf_dyn_rx_timestamp_register
+				(&dev->tstamp.tstamp_dynfield_offset,
+				 &dev->tstamp.rx_tstamp_dynflag);
+			if (rc != 0) {
+				plt_err("Failed to register Rx timestamp field/flag");
+				return -EINVAL;
+			}
+		}
 		eth_dev->rx_pkt_burst = nix_ptp_vf_burst;
+		rte_eth_fp_ops[eth_dev->data->port_id].rx_pkt_burst = eth_dev->rx_pkt_burst;
 		rte_mb();
 	}
 
diff --git a/drivers/net/cnxk/cnxk_ethdev.c b/drivers/net/cnxk/cnxk_ethdev.c
index 38746c81c5..dd065c8269 100644
--- a/drivers/net/cnxk/cnxk_ethdev.c
+++ b/drivers/net/cnxk/cnxk_ethdev.c
@@ -1751,7 +1751,7 @@ cnxk_nix_dev_start(struct rte_eth_dev *eth_dev)
 	else
 		cnxk_eth_dev_ops.timesync_disable(eth_dev);
 
-	if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) {
+	if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || dev->ptp_en) {
 		rc = rte_mbuf_dyn_rx_timestamp_register
 			(&dev->tstamp.tstamp_dynfield_offset,
 			 &dev->tstamp.rx_tstamp_dynflag);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 07/33] net/cnxk: update Rx offloads to handle timestamp
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (5 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 06/33] net/cnxk: handle timestamp correctly for VF Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 08/33] event/cnxk: handle timestamp for event mode Nithin Dabilpuram
                   ` (28 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rakesh Kudurumalla

From: Rakesh Kudurumalla <rkudurumalla@marvell.com>

RX offloads flags are updated to handle timestamp
in VF when ptp is enabled in respective PF in kernle

Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
---
 drivers/net/cnxk/cn10k_ethdev.c | 6 +++++-
 drivers/net/cnxk/cn9k_ethdev.c  | 5 ++++-
 drivers/net/cnxk/cnxk_ethdev.h  | 7 +++++++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/net/cnxk/cn10k_ethdev.c b/drivers/net/cnxk/cn10k_ethdev.c
index 099890e959..d335f3971b 100644
--- a/drivers/net/cnxk/cn10k_ethdev.c
+++ b/drivers/net/cnxk/cn10k_ethdev.c
@@ -30,7 +30,7 @@ nix_rx_offload_flags(struct rte_eth_dev *eth_dev)
 	if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER)
 		flags |= NIX_RX_MULTI_SEG_F;
 
-	if ((dev->rx_offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP))
+	if ((dev->rx_offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) || dev->ptp_en)
 		flags |= NIX_RX_OFFLOAD_TSTAMP_F;
 
 	if (!dev->ptype_disable)
@@ -508,6 +508,10 @@ cn10k_nix_ptp_info_update_cb(struct roc_nix *nix, bool ptp_en)
 		eth_dev->rx_pkt_burst = nix_ptp_vf_burst;
 		rte_eth_fp_ops[eth_dev->data->port_id].rx_pkt_burst = eth_dev->rx_pkt_burst;
 		rte_mb();
+		if (dev->cnxk_sso_ptp_tstamp_cb)
+			dev->cnxk_sso_ptp_tstamp_cb(eth_dev->data->port_id,
+						    NIX_RX_OFFLOAD_TSTAMP_F, dev->ptp_en);
+
 	}
 
 	return 0;
diff --git a/drivers/net/cnxk/cn9k_ethdev.c b/drivers/net/cnxk/cn9k_ethdev.c
index 4851e60f16..d1810e8f4d 100644
--- a/drivers/net/cnxk/cn9k_ethdev.c
+++ b/drivers/net/cnxk/cn9k_ethdev.c
@@ -30,7 +30,7 @@ nix_rx_offload_flags(struct rte_eth_dev *eth_dev)
 	if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER)
 		flags |= NIX_RX_MULTI_SEG_F;
 
-	if ((dev->rx_offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP))
+	if ((dev->rx_offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) || dev->ptp_en)
 		flags |= NIX_RX_OFFLOAD_TSTAMP_F;
 
 	if (!dev->ptype_disable)
@@ -467,6 +467,9 @@ cn9k_nix_ptp_info_update_cb(struct roc_nix *nix, bool ptp_en)
 		eth_dev->rx_pkt_burst = nix_ptp_vf_burst;
 		rte_eth_fp_ops[eth_dev->data->port_id].rx_pkt_burst = eth_dev->rx_pkt_burst;
 		rte_mb();
+		if (dev->cnxk_sso_ptp_tstamp_cb)
+			dev->cnxk_sso_ptp_tstamp_cb(eth_dev->data->port_id,
+						    NIX_RX_OFFLOAD_TSTAMP_F, dev->ptp_en);
 	}
 
 	return 0;
diff --git a/drivers/net/cnxk/cnxk_ethdev.h b/drivers/net/cnxk/cnxk_ethdev.h
index 687c60c27d..5920488e1a 100644
--- a/drivers/net/cnxk/cnxk_ethdev.h
+++ b/drivers/net/cnxk/cnxk_ethdev.h
@@ -433,6 +433,13 @@ struct cnxk_eth_dev {
 
 	/* Eswitch domain ID */
 	uint16_t switch_domain_id;
+
+	/* SSO event dev */
+	void *evdev_priv;
+
+	/* SSO event dev ptp  */
+	void (*cnxk_sso_ptp_tstamp_cb)
+	     (uint16_t port_id, uint16_t flags, bool ptp_en);
 };
 
 struct cnxk_eth_rxq_sp {
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 08/33] event/cnxk: handle timestamp for event mode
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (6 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 07/33] net/cnxk: update Rx offloads to handle timestamp Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 09/33] net/cnxk: update mbuf and rearm data for Rx inject packets Nithin Dabilpuram
                   ` (27 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Pavan Nikhilesh, Shijith Thotton; +Cc: dev, Rakesh Kudurumalla

From: Rakesh Kudurumalla <rkudurumalla@marvell.com>

handle timestamp correctly for VF when ptp is enabled
before running application in event mode by updating
RX offload flags in link up notification

Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
---
 drivers/event/cnxk/cn10k_eventdev.c      | 32 ++++++++++++++++++++++++
 drivers/event/cnxk/cn9k_eventdev.c       | 31 +++++++++++++++++++++++
 drivers/event/cnxk/cnxk_eventdev_adptr.c |  2 +-
 3 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/drivers/event/cnxk/cn10k_eventdev.c b/drivers/event/cnxk/cn10k_eventdev.c
index 2d7b169974..229d7a03fe 100644
--- a/drivers/event/cnxk/cn10k_eventdev.c
+++ b/drivers/event/cnxk/cn10k_eventdev.c
@@ -825,12 +825,40 @@ cn10k_sso_set_priv_mem(const struct rte_eventdev *event_dev, void *lookup_mem)
 	}
 }
 
+static void
+eventdev_fops_tstamp_update(struct rte_eventdev *event_dev)
+{
+	struct rte_event_fp_ops *fp_op =
+		rte_event_fp_ops + event_dev->data->dev_id;
+
+	fp_op->dequeue = event_dev->dequeue;
+	fp_op->dequeue_burst = event_dev->dequeue_burst;
+}
+
+static void
+cn10k_sso_tstamp_hdl_update(uint16_t port_id, uint16_t flags, bool ptp_en)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	struct cnxk_eth_dev *cnxk_eth_dev = dev->data->dev_private;
+	struct rte_eventdev *event_dev = cnxk_eth_dev->evdev_priv;
+	struct cnxk_sso_evdev *evdev = cnxk_sso_pmd_priv(event_dev);
+
+	evdev->rx_offloads |= flags;
+	if (ptp_en)
+		evdev->tstamp[port_id] = &cnxk_eth_dev->tstamp;
+	else
+		evdev->tstamp[port_id] = NULL;
+	cn10k_sso_fp_fns_set((struct rte_eventdev *)(uintptr_t)event_dev);
+	eventdev_fops_tstamp_update(event_dev);
+}
+
 static int
 cn10k_sso_rx_adapter_queue_add(
 	const struct rte_eventdev *event_dev, const struct rte_eth_dev *eth_dev,
 	int32_t rx_queue_id,
 	const struct rte_event_eth_rx_adapter_queue_conf *queue_conf)
 {
+	struct cnxk_eth_dev *cnxk_eth_dev = eth_dev->data->dev_private;
 	struct cnxk_sso_evdev *dev = cnxk_sso_pmd_priv(event_dev);
 	struct roc_sso_hwgrp_stash stash;
 	struct cn10k_eth_rxq *rxq;
@@ -845,6 +873,10 @@ cn10k_sso_rx_adapter_queue_add(
 					   queue_conf);
 	if (rc)
 		return -EINVAL;
+
+	cnxk_eth_dev->cnxk_sso_ptp_tstamp_cb = cn10k_sso_tstamp_hdl_update;
+	cnxk_eth_dev->evdev_priv = (struct rte_eventdev *)(uintptr_t)event_dev;
+
 	rxq = eth_dev->data->rx_queues[0];
 	lookup_mem = rxq->lookup_mem;
 	cn10k_sso_set_priv_mem(event_dev, lookup_mem);
diff --git a/drivers/event/cnxk/cn9k_eventdev.c b/drivers/event/cnxk/cn9k_eventdev.c
index 28350d1275..377e910837 100644
--- a/drivers/event/cnxk/cn9k_eventdev.c
+++ b/drivers/event/cnxk/cn9k_eventdev.c
@@ -911,12 +911,40 @@ cn9k_sso_set_priv_mem(const struct rte_eventdev *event_dev, void *lookup_mem)
 	}
 }
 
+static void
+eventdev_fops_tstamp_update(struct rte_eventdev *event_dev)
+{
+	struct rte_event_fp_ops *fp_op =
+		rte_event_fp_ops + event_dev->data->dev_id;
+
+	fp_op->dequeue = event_dev->dequeue;
+	fp_op->dequeue_burst = event_dev->dequeue_burst;
+}
+
+static void
+cn9k_sso_tstamp_hdl_update(uint16_t port_id, uint16_t flags, bool ptp_en)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	struct cnxk_eth_dev *cnxk_eth_dev = dev->data->dev_private;
+	struct rte_eventdev *event_dev = cnxk_eth_dev->evdev_priv;
+	struct cnxk_sso_evdev *evdev = cnxk_sso_pmd_priv(event_dev);
+
+	evdev->rx_offloads |= flags;
+	if (ptp_en)
+		evdev->tstamp[port_id] = &cnxk_eth_dev->tstamp;
+	else
+		evdev->tstamp[port_id] = NULL;
+	cn9k_sso_fp_fns_set((struct rte_eventdev *)(uintptr_t)event_dev);
+	eventdev_fops_tstamp_update(event_dev);
+}
+
 static int
 cn9k_sso_rx_adapter_queue_add(
 	const struct rte_eventdev *event_dev, const struct rte_eth_dev *eth_dev,
 	int32_t rx_queue_id,
 	const struct rte_event_eth_rx_adapter_queue_conf *queue_conf)
 {
+	struct cnxk_eth_dev *cnxk_eth_dev = eth_dev->data->dev_private;
 	struct cn9k_eth_rxq *rxq;
 	void *lookup_mem;
 	int rc;
@@ -930,6 +958,9 @@ cn9k_sso_rx_adapter_queue_add(
 	if (rc)
 		return -EINVAL;
 
+	cnxk_eth_dev->cnxk_sso_ptp_tstamp_cb = cn9k_sso_tstamp_hdl_update;
+	cnxk_eth_dev->evdev_priv = (struct rte_eventdev *)(uintptr_t)event_dev;
+
 	rxq = eth_dev->data->rx_queues[0];
 	lookup_mem = rxq->lookup_mem;
 	cn9k_sso_set_priv_mem(event_dev, lookup_mem);
diff --git a/drivers/event/cnxk/cnxk_eventdev_adptr.c b/drivers/event/cnxk/cnxk_eventdev_adptr.c
index 2c049e7041..3cac42111a 100644
--- a/drivers/event/cnxk/cnxk_eventdev_adptr.c
+++ b/drivers/event/cnxk/cnxk_eventdev_adptr.c
@@ -213,7 +213,7 @@ static void
 cnxk_sso_tstamp_cfg(uint16_t port_id, struct cnxk_eth_dev *cnxk_eth_dev,
 		    struct cnxk_sso_evdev *dev)
 {
-	if (cnxk_eth_dev->rx_offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
+	if (cnxk_eth_dev->rx_offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || cnxk_eth_dev->ptp_en)
 		dev->tstamp[port_id] = &cnxk_eth_dev->tstamp;
 }
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 09/33] net/cnxk: update mbuf and rearm data for Rx inject packets
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (7 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 08/33] event/cnxk: handle timestamp for event mode Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 10/33] common/cnxk: remove restriction to clear RPM stats Nithin Dabilpuram
                   ` (26 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rakesh Kudurumalla

From: Rakesh Kudurumalla <rkudurumalla@marvell.com>

When nix receives second pass packets injected to CPT
next segments of primary mbuf are accessed directly using
mbuf next pointer since we do not know at what offset mbuf
is available.To achieve this we do no update mbut next pointer
to NULL for Rx injected packets.

Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
---
 drivers/net/cnxk/cn10k_rx.h | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/cnxk/cn10k_rx.h b/drivers/net/cnxk/cn10k_rx.h
index 9dde2bea57..990dfbee3e 100644
--- a/drivers/net/cnxk/cn10k_rx.h
+++ b/drivers/net/cnxk/cn10k_rx.h
@@ -709,6 +709,7 @@ nix_cqe_xtract_mseg(const union nix_rx_parse_u *rx, struct rte_mbuf *mbuf,
 	uint16_t later_skip = 0;
 	struct rte_mbuf *head;
 	const rte_iova_t *eol;
+	bool rx_inj = false;
 	uint64_t cq_w5 = 0;
 	uint16_t ihl = 0;
 	uint64_t fsz = 0;
@@ -729,7 +730,9 @@ nix_cqe_xtract_mseg(const union nix_rx_parse_u *rx, struct rte_mbuf *mbuf,
 		/* Rx Inject packet must have Match ID 0xFFFF and for this
 		 * wqe will get from address stored at mbuf+1 location
 		 */
-		if ((flags & NIX_RX_REAS_F) && hdr->w0.match_id == 0xFFFFU)
+		rx_inj = ((flags & NIX_RX_REAS_F) && ((hdr->w0.match_id == 0xFFFFU) ||
+					       (hdr->w0.cookie == 0xFFFFFFFFU)));
+		if (rx_inj)
 			wqe = (const uint64_t *)*((uint64_t *)(mbuf + 1));
 		else
 			wqe = (const uint64_t *)(mbuf + 1);
@@ -786,7 +789,8 @@ nix_cqe_xtract_mseg(const union nix_rx_parse_u *rx, struct rte_mbuf *mbuf,
 	later_skip = (uintptr_t)mbuf->buf_addr - (uintptr_t)mbuf;
 
 	while (nb_segs) {
-		mbuf->next = (struct rte_mbuf *)(*iova_list - later_skip);
+		if (!(flags & NIX_RX_REAS_F) || !rx_inj)
+			mbuf->next = (struct rte_mbuf *)(*iova_list - later_skip);
 		mbuf = mbuf->next;
 
 		RTE_MEMPOOL_CHECK_COOKIES(mbuf->pool, (void **)&mbuf, 1, 1);
@@ -804,7 +808,8 @@ nix_cqe_xtract_mseg(const union nix_rx_parse_u *rx, struct rte_mbuf *mbuf,
 		mbuf->data_len = sg_len;
 		sg = sg >> 16;
 		p = (uintptr_t)&mbuf->rearm_data;
-		*(uint64_t *)p = rearm & ~0xFFFF;
+		if (!(flags & NIX_RX_REAS_F) || !rx_inj)
+			*(uint64_t *)p = rearm & ~0xFFFF;
 		nb_segs--;
 		iova_list++;
 
@@ -1259,7 +1264,6 @@ cn10k_nix_rx_inj_prepare_mseg(struct rte_mbuf *m, uint64_t *cmd)
 			slist++;
 		}
 		m_next = m->next;
-		m->next = NULL;
 		m = m_next;
 	} while (nb_segs);
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 10/33] common/cnxk: remove restriction to clear RPM stats
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (8 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 09/33] net/cnxk: update mbuf and rearm data for Rx inject packets Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 11/33] common/cnxk: allow MAC address set/add with active VFs Nithin Dabilpuram
                   ` (25 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

From: Sunil Kumar Kori <skori@marvell.com>

Linux does not support clearing RPM stats on cn10k platform.
Hence restriction is added that when user requests to clear
xstats then request is discarded silently.

Hence removing restriction for cn10k.

Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
---
 drivers/common/cnxk/roc_nix_mac.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/common/cnxk/roc_nix_mac.c b/drivers/common/cnxk/roc_nix_mac.c
index f79aaec4a5..0ffd05e4d4 100644
--- a/drivers/common/cnxk/roc_nix_mac.c
+++ b/drivers/common/cnxk/roc_nix_mac.c
@@ -363,11 +363,6 @@ roc_nix_mac_stats_reset(struct roc_nix *roc_nix)
 	struct msg_req *req;
 	int rc = -ENOSPC;
 
-	if (roc_model_is_cn10k()) {
-		rc = 0;
-		goto exit;
-	}
-
 	if (roc_nix_is_vf_or_sdp(roc_nix)) {
 		rc = 0;
 		goto exit;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 11/33] common/cnxk: allow MAC address set/add with active VFs
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (9 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 10/33] common/cnxk: remove restriction to clear RPM stats Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 12/33] net/cnxk: move PMD function defines to common code Nithin Dabilpuram
                   ` (24 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

From: Sunil Kumar Kori <skori@marvell.com>

If device is in reconfigure state then it throws error while
changing default MAC or adding new MAC in LMAC filter table
if there are active VFs on a PF.

Allowing MAC address set/add even active VFs are present on
PF.

Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
---
 drivers/common/cnxk/roc_nix_mac.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/drivers/common/cnxk/roc_nix_mac.c b/drivers/common/cnxk/roc_nix_mac.c
index 0ffd05e4d4..54db1adf17 100644
--- a/drivers/common/cnxk/roc_nix_mac.c
+++ b/drivers/common/cnxk/roc_nix_mac.c
@@ -91,11 +91,6 @@ roc_nix_mac_addr_set(struct roc_nix *roc_nix, const uint8_t addr[])
 		goto exit;
 	}
 
-	if (dev_active_vfs(&nix->dev)) {
-		rc = NIX_ERR_OP_NOTSUP;
-		goto exit;
-	}
-
 	req = mbox_alloc_msg_cgx_mac_addr_set(mbox);
 	if (req == NULL)
 		goto exit;
@@ -152,11 +147,6 @@ roc_nix_mac_addr_add(struct roc_nix *roc_nix, uint8_t addr[])
 		goto exit;
 	}
 
-	if (dev_active_vfs(&nix->dev)) {
-		rc = NIX_ERR_OP_NOTSUP;
-		goto exit;
-	}
-
 	req = mbox_alloc_msg_cgx_mac_addr_add(mbox);
 	mbox_memcpy(req->mac_addr, addr, PLT_ETHER_ADDR_LEN);
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 12/33] net/cnxk: move PMD function defines to common code
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (10 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 11/33] common/cnxk: allow MAC address set/add with active VFs Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 13/33] common/cnxk: add cn20k NIX register definitions Nithin Dabilpuram
                   ` (23 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

Move PMD function definitions to common code for
cn9k/cn10k since they are declared commonly.

Also remove the reference to 'struct rte_security_session'
since it is now a driver internal structure and not
exported to application code.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
---
 drivers/net/cnxk/cn10k_ethdev_sec.c | 61 ----------------------------
 drivers/net/cnxk/cnxk_ethdev_sec.c  | 63 +++++++++++++++++++++++++++++
 drivers/net/cnxk/rte_pmd_cnxk.h     |  8 ++--
 3 files changed, 67 insertions(+), 65 deletions(-)

diff --git a/drivers/net/cnxk/cn10k_ethdev_sec.c b/drivers/net/cnxk/cn10k_ethdev_sec.c
index 5e509e97d4..074bb09822 100644
--- a/drivers/net/cnxk/cn10k_ethdev_sec.c
+++ b/drivers/net/cnxk/cn10k_ethdev_sec.c
@@ -1208,67 +1208,6 @@ cn10k_eth_sec_session_update(void *device, struct rte_security_session *sess,
 	return 0;
 }
 
-int
-rte_pmd_cnxk_hw_sa_read(void *device, struct rte_security_session *sess,
-			union rte_pmd_cnxk_ipsec_hw_sa *data, uint32_t len)
-{
-	struct rte_eth_dev *eth_dev = (struct rte_eth_dev *)device;
-	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
-	struct cnxk_eth_sec_sess *eth_sec;
-	int rc;
-
-	eth_sec = cnxk_eth_sec_sess_get_by_sess(dev, sess);
-	if (eth_sec == NULL)
-		return -EINVAL;
-
-	rc = roc_nix_inl_sa_sync(&dev->nix, eth_sec->sa, eth_sec->inb,
-			    ROC_NIX_INL_SA_OP_FLUSH);
-	if (rc)
-		return -EINVAL;
-	rte_delay_ms(1);
-	memcpy(data, eth_sec->sa, len);
-
-	return 0;
-}
-
-int
-rte_pmd_cnxk_hw_sa_write(void *device, struct rte_security_session *sess,
-			 union rte_pmd_cnxk_ipsec_hw_sa *data, uint32_t len)
-{
-	struct rte_eth_dev *eth_dev = (struct rte_eth_dev *)device;
-	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
-	struct cnxk_eth_sec_sess *eth_sec;
-	int rc = -EINVAL;
-
-	eth_sec = cnxk_eth_sec_sess_get_by_sess(dev, sess);
-	if (eth_sec == NULL)
-		return rc;
-	rc = roc_nix_inl_ctx_write(&dev->nix, data, eth_sec->sa, eth_sec->inb,
-				   len);
-	if (rc)
-		return rc;
-
-	return 0;
-}
-
-union rte_pmd_cnxk_cpt_res_s *
-rte_pmd_cnxk_inl_ipsec_res(struct rte_mbuf *mbuf)
-{
-	const union nix_rx_parse_u *rx;
-	uint16_t desc_size;
-	uintptr_t wqe;
-
-	if (!mbuf || !(mbuf->ol_flags & RTE_MBUF_F_RX_SEC_OFFLOAD))
-		return NULL;
-
-	wqe = (uintptr_t)(mbuf + 1);
-	rx = (const union nix_rx_parse_u *)(wqe + 8);
-	desc_size = (rx->desc_sizem1 + 1) * 16;
-
-	/* rte_pmd_cnxk_cpt_res_s sits after SG list at 16B aligned address */
-	return (void *)(wqe + 64 + desc_size);
-}
-
 static int
 cn10k_eth_sec_session_stats_get(void *device, struct rte_security_session *sess,
 			    struct rte_security_stats *stats)
diff --git a/drivers/net/cnxk/cnxk_ethdev_sec.c b/drivers/net/cnxk/cnxk_ethdev_sec.c
index 6f5319e534..cdd5656817 100644
--- a/drivers/net/cnxk/cnxk_ethdev_sec.c
+++ b/drivers/net/cnxk/cnxk_ethdev_sec.c
@@ -2,6 +2,8 @@
  * Copyright(C) 2021 Marvell.
  */
 
+#include <rte_pmd_cnxk.h>
+
 #include <cnxk_ethdev.h>
 #include <cnxk_mempool.h>
 
@@ -295,6 +297,67 @@ cnxk_eth_sec_sess_get_by_sess(struct cnxk_eth_dev *dev,
 	return NULL;
 }
 
+int
+rte_pmd_cnxk_hw_sa_read(void *device, void *__sess, union rte_pmd_cnxk_ipsec_hw_sa *data,
+			uint32_t len)
+{
+	struct rte_security_session *sess = __sess;
+	struct rte_eth_dev *eth_dev = (struct rte_eth_dev *)device;
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct cnxk_eth_sec_sess *eth_sec;
+	int rc;
+
+	eth_sec = cnxk_eth_sec_sess_get_by_sess(dev, sess);
+	if (eth_sec == NULL)
+		return -EINVAL;
+
+	rc = roc_nix_inl_sa_sync(&dev->nix, eth_sec->sa, eth_sec->inb, ROC_NIX_INL_SA_OP_FLUSH);
+	if (rc)
+		return -EINVAL;
+	rte_delay_ms(1);
+	memcpy(data, eth_sec->sa, len);
+
+	return 0;
+}
+
+int
+rte_pmd_cnxk_hw_sa_write(void *device, void *__sess, union rte_pmd_cnxk_ipsec_hw_sa *data,
+			 uint32_t len)
+{
+	struct rte_security_session *sess = __sess;
+	struct rte_eth_dev *eth_dev = (struct rte_eth_dev *)device;
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct cnxk_eth_sec_sess *eth_sec;
+	int rc = -EINVAL;
+
+	eth_sec = cnxk_eth_sec_sess_get_by_sess(dev, sess);
+	if (eth_sec == NULL)
+		return rc;
+	rc = roc_nix_inl_ctx_write(&dev->nix, data, eth_sec->sa, eth_sec->inb, len);
+	if (rc)
+		return rc;
+
+	return 0;
+}
+
+union rte_pmd_cnxk_cpt_res_s *
+rte_pmd_cnxk_inl_ipsec_res(struct rte_mbuf *mbuf)
+{
+	const union nix_rx_parse_u *rx;
+	uint16_t desc_size;
+	uintptr_t wqe;
+
+	if (!mbuf || !(mbuf->ol_flags & RTE_MBUF_F_RX_SEC_OFFLOAD))
+		return NULL;
+
+	wqe = (uintptr_t)(mbuf + 1);
+	rx = (const union nix_rx_parse_u *)(wqe + 8);
+	desc_size = (rx->desc_sizem1 + 1) * 16;
+
+	/* rte_pmd_cnxk_cpt_res_s sits after SG list at 16B aligned address */
+	return (void *)(wqe + 64 + desc_size);
+}
+
 static unsigned int
 cnxk_eth_sec_session_get_size(void *device __rte_unused)
 {
diff --git a/drivers/net/cnxk/rte_pmd_cnxk.h b/drivers/net/cnxk/rte_pmd_cnxk.h
index 88030046db..70f2f96fd4 100644
--- a/drivers/net/cnxk/rte_pmd_cnxk.h
+++ b/drivers/net/cnxk/rte_pmd_cnxk.h
@@ -495,7 +495,7 @@ union rte_pmd_cnxk_cpt_res_s {
  * @param device
  *   Port identifier of Ethernet device.
  * @param sess
- *   Handle of the security session.
+ *   Handle of the security session as void *.
  * @param[out] data
  *   Destination pointer to copy SA context for application.
  * @param len
@@ -505,7 +505,7 @@ union rte_pmd_cnxk_cpt_res_s {
  *   0 on success, a negative errno value otherwise.
  */
 __rte_experimental
-int rte_pmd_cnxk_hw_sa_read(void *device, struct rte_security_session *sess,
+int rte_pmd_cnxk_hw_sa_read(void *device, void *sess,
 			    union rte_pmd_cnxk_ipsec_hw_sa *data, uint32_t len);
 /**
  * Write HW SA context to session.
@@ -513,7 +513,7 @@ int rte_pmd_cnxk_hw_sa_read(void *device, struct rte_security_session *sess,
  * @param device
  *   Port identifier of Ethernet device.
  * @param sess
- *   Handle of the security session.
+ *   Handle of the security session as void *.
  * @param[in] data
  *   Source data pointer from application to copy SA context into session.
  * @param len
@@ -523,7 +523,7 @@ int rte_pmd_cnxk_hw_sa_read(void *device, struct rte_security_session *sess,
  *   0 on success, a negative errno value otherwise.
  */
 __rte_experimental
-int rte_pmd_cnxk_hw_sa_write(void *device, struct rte_security_session *sess,
+int rte_pmd_cnxk_hw_sa_write(void *device, void *sess,
 			     union rte_pmd_cnxk_ipsec_hw_sa *data, uint32_t len);
 
 /**
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 13/33] common/cnxk: add cn20k NIX register definitions
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (11 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 12/33] net/cnxk: move PMD function defines to common code Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 14/33] common/cnxk: support NIX queue config for cn20k Nithin Dabilpuram
                   ` (22 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

From: Satha Rao <skoteshwar@marvell.com>

Add cn20k NIX register definitions.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
---
 drivers/common/cnxk/hw/nix.h   | 524 +++++++++++++++++++++++++++++----
 drivers/common/cnxk/hw/rvu.h   |   7 +-
 drivers/common/cnxk/roc_mbox.h |  52 ++++
 drivers/common/cnxk/roc_nix.c  |  15 +-
 4 files changed, 533 insertions(+), 65 deletions(-)

diff --git a/drivers/common/cnxk/hw/nix.h b/drivers/common/cnxk/hw/nix.h
index 1720eb3815..dd629a2080 100644
--- a/drivers/common/cnxk/hw/nix.h
+++ b/drivers/common/cnxk/hw/nix.h
@@ -32,7 +32,7 @@
 #define NIX_AF_RX_CFG			(0xd0ull)
 #define NIX_AF_AVG_DELAY		(0xe0ull)
 #define NIX_AF_CINT_DELAY		(0xf0ull)
-#define NIX_AF_VWQE_TIMER		(0xf8ull) /* [CN10K, .) */
+#define NIX_AF_VWQE_TIMER		(0xf8ull) /* [CN10K, CN20K) */
 #define NIX_AF_RX_MCAST_BASE		(0x100ull)
 #define NIX_AF_RX_MCAST_CFG		(0x110ull)
 #define NIX_AF_RX_MCAST_BUF_BASE	(0x120ull)
@@ -82,9 +82,11 @@
 #define NIX_AF_RX_DEF_IIP6_DSCP		(0x2f0ull) /* [CN10K, .) */
 #define NIX_AF_RX_DEF_OIP6_DSCP		(0x2f8ull) /* [CN10K, .) */
 #define NIX_AF_RX_IPSEC_GEN_CFG		(0x300ull)
-#define NIX_AF_RX_IPSEC_VWQE_GEN_CFG	(0x310ull) /* [CN10K, .) */
-#define NIX_AF_RX_CPTX_INST_QSEL(a)	(0x320ull | (uint64_t)(a) << 3)
-#define NIX_AF_RX_CPTX_CREDIT(a)	(0x360ull | (uint64_t)(a) << 3)
+#define NIX_AF_RX_IPSEC_VWQE_GEN_CFG	(0x310ull) /* [CN10K, CN20K) */
+#define NIX_AF_RX_CPTX_INST_QSEL(a)	(0x340ull | (uint64_t)(a) << 16) /* [CN20K, .) */
+#define NIX_AF_RX_CPTX_CREDIT(a)	(0x380ull | (uint64_t)(a) << 16) /* [CN20K, .) */
+#define NIX_AF_CN9K_RX_CPTX_INST_QSEL(a)(0x320ull | (uint64_t)(a) << 3) /* [CN9K, CN20K) */
+#define NIX_AF_CN9K_RX_CPTX_CREDIT(a)	(0x360ull | (uint64_t)(a) << 3) /* [CN9K, CN20K) */
 #define NIX_AF_NDC_RX_SYNC		(0x3e0ull)
 #define NIX_AF_NDC_TX_SYNC		(0x3f0ull)
 #define NIX_AF_AQ_CFG			(0x400ull)
@@ -100,12 +102,14 @@
 #define NIX_AF_RX_LINKX_CFG(a)		(0x540ull | (uint64_t)(a) << 16)
 #define NIX_AF_RX_SW_SYNC		(0x550ull)
 #define NIX_AF_RX_LINKX_WRR_CFG(a)	(0x560ull | (uint64_t)(a) << 16)
+#define NIX_AF_RQM_ECO                  (0x5a0ull)
 #define NIX_AF_SEB_CFG			(0x5f0ull) /* [CN10K, .) */
 #define NIX_AF_EXPR_TX_FIFO_STATUS	(0x640ull) /* [CN9K, CN10K) */
 #define NIX_AF_NORM_TX_FIFO_STATUS	(0x648ull)
 #define NIX_AF_SDP_TX_FIFO_STATUS	(0x650ull)
 #define NIX_AF_TX_NPC_CAPTURE_CONFIG	(0x660ull)
 #define NIX_AF_TX_NPC_CAPTURE_INFO	(0x668ull)
+#define NIX_AF_SEB_COALESCE_DBGX(a)             (0x670ull | (uint64_t)(a) << 3)
 #define NIX_AF_TX_NPC_CAPTURE_RESPX(a)	(0x680ull | (uint64_t)(a) << 3)
 #define NIX_AF_SEB_ACTIVE_CYCLES_PCX(a) (0x6c0ull | (uint64_t)(a) << 3)
 #define NIX_AF_SMQX_CFG(a)		(0x700ull | (uint64_t)(a) << 16)
@@ -115,6 +119,7 @@
 #define NIX_AF_SMQX_NXT_HEAD(a)		(0x740ull | (uint64_t)(a) << 16)
 #define NIX_AF_SQM_ACTIVE_CYCLES_PC	(0x770ull)
 #define NIX_AF_SQM_SCLK_CNT		(0x780ull) /* [CN10K, .) */
+#define NIX_AF_DWRR_MTUX(a)             (0x790ull | (uint64_t)(a) << 16)
 #define NIX_AF_DWRR_SDP_MTU		(0x790ull) /* [CN10K, .) */
 #define NIX_AF_DWRR_RPM_MTU		(0x7a0ull) /* [CN10K, .) */
 #define NIX_AF_PSE_CHANNEL_LEVEL	(0x800ull)
@@ -131,6 +136,7 @@
 #define NIX_AF_TX_LINKX_HW_XOFF(a)	(0xa30ull | (uint64_t)(a) << 16)
 #define NIX_AF_SDP_LINK_CREDIT		(0xa40ull)
 #define NIX_AF_SDP_LINK_CDT_ADJ		(0xa50ull) /* [CN10K, .) */
+#define NIX_AF_LINK_CDT_ADJ_ERR		(0xaa0ull) /* [CN10K, .) */
 /* [CN9K, CN10K) */
 #define NIX_AF_SDP_SW_XOFFX(a)	    (0xa60ull | (uint64_t)(a) << 3)
 #define NIX_AF_SDP_HW_XOFFX(a)	    (0xac0ull | (uint64_t)(a) << 3)
@@ -226,7 +232,7 @@
 #define NIX_AF_TL4X_CIR(a)		 (0x1220ull | (uint64_t)(a) << 16)
 #define NIX_AF_TL4X_PIR(a)		 (0x1230ull | (uint64_t)(a) << 16)
 #define NIX_AF_TL4X_SCHED_STATE(a)	 (0x1240ull | (uint64_t)(a) << 16)
-#define NIX_AF_TL4X_SHAPE_STATE(a)	 (0x1250ull | (uint64_t)(a) << 16)
+#define NIX_AF_TL4X_SHAPE_STATE_PIR(a)	 (0x1250ull | (uint64_t)(a) << 16)
 #define NIX_AF_TL4X_SW_XOFF(a)		 (0x1270ull | (uint64_t)(a) << 16)
 #define NIX_AF_TL4X_TOPOLOGY(a)		 (0x1280ull | (uint64_t)(a) << 16)
 #define NIX_AF_TL4X_PARENT(a)		 (0x1288ull | (uint64_t)(a) << 16)
@@ -272,6 +278,18 @@
 #define NIX_AF_CINT_TIMERX(a)	    (0x1a40ull | (uint64_t)(a) << 18)
 #define NIX_AF_LSO_FORMATX_FIELDX(a, b)                                        \
 	(0x1b00ull | (uint64_t)(a) << 16 | (uint64_t)(b) << 3)
+/* [CN10K, .) */
+#define NIX_AF_SPI_TO_SA_KEYX_WAYX(a, b)    (0x1c00ull | (uint64_t)(a) << 16 | (uint64_t)(b) << 3)
+#define NIX_AF_SPI_TO_SA_VALUEX_WAYX(a, b)  (0x1c40ull | (uint64_t)(a) << 16 | (uint64_t)(b) << 3)
+#define NIX_AF_SPI_TO_SA_CFG		    (0x1c80ull)
+#define NIX_AF_SPI_TO_SA_CFG1		    (0x1c88ull)
+#define NIX_AF_SPI_TO_SA_HASH_KEY	    (0x1c90ull)
+#define NIX_AF_SPI_TO_SA_HASH_VALUE	    (0x1ca0ull)
+/* CN20K, .) */
+#define NIX_AF_RX_IPSEC_VLAN_CFGX(a)	    (0x1d00ull | (uint64_t)(a) << 3)
+#define NIX_AF_RX_IPSEC_QMAPX_DSCPX(a, b)   (0x1e00ull | (uint64_t)(a) << 6 | (uint64_t)(b) << 3)
+#define NIX_AF_RX_SSO_GRPX_BP_CFG(a)	    (0x2000ull | (uint64_t)(a) << 3)
+#define NIX_AF_RX_SSO_GRPX_BP_LEVEL(a)	    (0x3000ull | (uint64_t)(a) << 3)
 #define NIX_AF_LFX_CFG(a) (0x4000ull | (uint64_t)(a) << 17)
 /* [CN10K, .) */
 #define NIX_AF_LINKX_CFG(a)		 (0x4010ull | (uint64_t)(a) << 17)
@@ -348,6 +366,7 @@
 #define NIX_LF_TX_STATX(a)	 (0x300ull | (uint64_t)(a) << 3)
 #define NIX_LF_RX_STATX(a)	 (0x400ull | (uint64_t)(a) << 3)
 #define NIX_LF_OP_SENDX(a)	 (0x800ull | (uint64_t)(a) << 3)
+#define NIX_LF_PTP_CLOCK	 (0x8f8ull) /* [CN20K, .) */
 #define NIX_LF_RQ_OP_INT	 (0x900ull)
 #define NIX_LF_RQ_OP_OCTS	 (0x910ull)
 #define NIX_LF_RQ_OP_PKTS	 (0x920ull)
@@ -355,7 +374,7 @@
 #define NIX_LF_RQ_OP_DROP_PKTS	 (0x940ull)
 #define NIX_LF_RQ_OP_RE_PKTS	 (0x950ull)
 #define NIX_LF_OP_IPSEC_DYNO_CNT (0x980ull)
-#define NIX_LF_OP_VWQE_FLUSH	 (0x9a0ull) /* [CN10K, .) */
+#define NIX_LF_OP_VWQE_FLUSH	 (0x9a0ull) /* [CN10K, CN20K) */
 #define NIX_LF_PL_OP_BAND_PROF	 (0x9c0ull) /* [CN10K, .) */
 #define NIX_LF_SQ_OP_INT	 (0xa00ull)
 #define NIX_LF_SQ_OP_OCTS	 (0xa10ull)
@@ -368,6 +387,9 @@
 #define NIX_LF_CQ_OP_INT	 (0xb00ull)
 #define NIX_LF_CQ_OP_DOOR	 (0xb30ull)
 #define NIX_LF_CQ_OP_STATUS	 (0xb40ull)
+#define NIX_LF_SSO_BP_OP_DOOR	 (0xb50ull) /* [CN20K, .) */
+#define NIX_LF_SSO_BP_OP_LEVEL	 (0xb58ull) /* [CN20K, .) */
+#define NIX_LF_SSO_BP_OP_INT	 (0xb60ull) /* [CN20K, .) */
 #define NIX_LF_QINTX_CNT(a)	 (0xc00ull | (uint64_t)(a) << 12)
 #define NIX_LF_QINTX_INT(a)	 (0xc10ull | (uint64_t)(a) << 12)
 #define NIX_LF_QINTX_ENA_W1S(a)	 (0xc20ull | (uint64_t)(a) << 12)
@@ -389,6 +411,8 @@
 
 /* Enum offsets */
 
+#define NIX_SSOERRINT_DOOR_ERR	(0x0ull) /*[CN20K, .) */
+
 #define NIX_STAT_LF_TX_TX_UCAST (0x0ull)
 #define NIX_STAT_LF_TX_TX_BCAST (0x1ull)
 #define NIX_STAT_LF_TX_TX_MCAST (0x2ull)
@@ -572,6 +596,7 @@
 #define NIX_SEND_STATUS_NPC_VTAG_SIZE_ERR  (0x26ull)
 #define NIX_SEND_STATUS_SEND_MEM_FAULT	   (0x27ull)
 #define NIX_SEND_STATUS_SEND_STATS_ERR	   (0x28ull)
+#define NIX_SEND_STATUS_SEND_HDR_DROP	   (0x29ull) /* [CN20K, .) */
 
 #define NIX_SENDSTATSALG_NOP			     (0x0ull)
 #define NIX_SENDSTATSALG_ADD_PKT_CNT		     (0x1ull)
@@ -606,6 +631,7 @@
 #define NIX_SUBDC_WORK		(0x7ull)
 #define NIX_SUBDC_SG2		(0x8ull) /* [CN10K, .) */
 #define NIX_SUBDC_AGE_AND_STATS (0x9ull) /* [CN10K, .) */
+#define NIX_SUBDC_COMPID	(0xaull) /* [CN20K, .) */
 #define NIX_SUBDC_SOD		(0xfull)
 
 #define NIX_STYPE_STF (0x0ull)
@@ -644,6 +670,18 @@
 #define NIX_LSOALG_ADD_PAYLEN (0x2ull)
 #define NIX_LSOALG_ADD_OFFSET (0x3ull)
 #define NIX_LSOALG_TCP_FLAGS  (0x4ull)
+#define NIX_LSOALG_ALT_FLAGS  (0x5ull) /* [CN20K, .) */
+
+#define NIX_METER_CFG_RFC_2698 (0x0ull) /* [CN20K, .) */
+#define NIX_METER_CFG_RFC_2697 (0x1ull) /* [CN20K, .) */
+#define NIX_METER_CFG_RFC_4115 (0x2ull) /* [CN20K, .) */
+
+#define NIX_NDC_RX_PORT_AQ	(0x0ull)
+#define NIX_NDC_RX_PORT_C	(0x1ull)
+#define NIX_NDC_RX_PORT_CINT	(0x2ull)
+#define NIX_NDC_RX_PORT_MC	(0x3ull)
+#define NIX_NDC_RX_PORT_PKT	(0x4ull)
+#define NIX_NDC_RX_PORT_RQ	(0x5ull)
 
 #define NIX_MNQERR_SQ_CTX_FAULT	    (0x0ull)
 #define NIX_MNQERR_SQ_CTX_POISON    (0x1ull)
@@ -732,12 +770,14 @@
 #define NIX_RX_PERRCODE_IL4_PORT       (0x23ull)
 
 #define NIX_SA_ALG_NON_MS     (0x0ull) /* [CN10K, .) */
-#define NIX_SA_ALG_MS_CISCO   (0x1ull) /* [CN10K, .) */
-#define NIX_SA_ALG_MS_VIPTELA (0x2ull) /* [CN10K, .) */
+#define NIX_SA_ALG_MS_31_28   (0x1ull) /* [CN10K, .) */
+#define NIX_SA_ALG_MS_27_25   (0x2ull) /* [CN10K, .) */
+#define NIX_SA_ALG_MS_28_25   (0x3ull) /* [CN10K, .) */
 
 #define NIX_SENDCRCALG_CRC32  (0x0ull)
 #define NIX_SENDCRCALG_CRC32C (0x1ull)
 #define NIX_SENDCRCALG_ONES16 (0x2ull)
+#define NIX_SENDCRCALG_INVCRC (0x3ull) /* [CN10K, .) */
 
 #define NIX_SENDL3TYPE_NONE	 (0x0ull)
 #define NIX_SENDL3TYPE_IP4	 (0x2ull)
@@ -761,7 +801,7 @@
 #define NIX_XQE_TYPE_RX_IPSECS (0x2ull)
 #define NIX_XQE_TYPE_RX_IPSECH (0x3ull)
 #define NIX_XQE_TYPE_RX_IPSECD (0x4ull)
-#define NIX_XQE_TYPE_RX_VWQE   (0x5ull) /* [CN10K, .) */
+#define NIX_XQE_TYPE_RX_VWQE   (0x5ull) /* [CN10K, CN20K) */
 #define NIX_XQE_TYPE_RES_6     (0x6ull)
 #define NIX_XQE_TYPE_RES_7     (0x7ull)
 #define NIX_XQE_TYPE_SEND      (0x8ull)
@@ -825,6 +865,11 @@
 #define NIX_AQ_CTYPE_DYNO      (0x5ull)
 #define NIX_AQ_CTYPE_BAND_PROF (0x6ull) /* [CN10K, .) */
 
+#define NIX_CQERRINT_DOOR_ERR  (0x0ull)
+#define NIX_CQERRINT_WR_FULL   (0x1ull)
+#define NIX_CQERRINT_CQE_FAULT (0x2ull)
+#define NIX_CQERRINT_CPT_DROP  (0x3ull) /* [CN10KB, .) */
+
 #define NIX_COLORRESULT_GREEN	 (0x0ull)
 #define NIX_COLORRESULT_YELLOW	 (0x1ull)
 #define NIX_COLORRESULT_RED_SEND (0x2ull)
@@ -846,11 +891,6 @@
 #define NIX_CHAN_RPMX_LMACX_CHX(a, b, c)                                       \
 	(0x800ull | ((uint64_t)(a) << 8) | ((uint64_t)(b) << 4) | (uint64_t)(c))
 
-/* The mask is to extract lower 10-bits of channel number
- * which CPT will pass to X2P.
- */
-#define NIX_CHAN_CPT_X2P_MASK (0x3ffull)
-
 #define NIX_INTF_SDP  (0x4ull)
 #define NIX_INTF_CGX0 (0x0ull) /* [CN9K, CN10K) */
 #define NIX_INTF_CGX1 (0x1ull) /* [CN9K, CN10K) */
@@ -861,11 +901,6 @@
 #define NIX_INTF_LBK0 (0x3ull)
 #define NIX_INTF_CPT0 (0x5ull) /* [CN10K, .) */
 
-#define NIX_CQERRINT_DOOR_ERR  (0x0ull)
-#define NIX_CQERRINT_WR_FULL   (0x1ull)
-#define NIX_CQERRINT_CQE_FAULT (0x2ull)
-#define NIX_CQERRINT_CPT_DROP  (0x3ull) /* [CN10KB, .) */
-
 #define NIX_LINK_SDP (0xdull) /* [CN10K, .) */
 #define NIX_LINK_CPT (0xeull) /* [CN10K, .) */
 #define NIX_LINK_MC  (0xfull) /* [CN10K, .) */
@@ -894,7 +929,7 @@ struct nix_age_and_send_stats_s {
 	uint64_t threshold : 29;
 	uint64_t latency_drop : 1;
 	uint64_t aging : 1;
-	uint64_t wmem : 1;
+	uint64_t coas_en : 1;
 	uint64_t ooffset : 12;
 	uint64_t ioffset : 12;
 	uint64_t sel : 1;
@@ -907,8 +942,8 @@ struct nix_age_and_send_stats_s {
 struct nix_aq_inst_s {
 	uint64_t op : 4;
 	uint64_t ctype : 4;
-	uint64_t lf : 7;
-	uint64_t rsvd_23_15 : 9;
+	uint64_t lf : 9;
+	uint64_t rsvd_23_17 : 7;
 	uint64_t cindex : 20;
 	uint64_t rsvd_62_44 : 19;
 	uint64_t doneint : 1;
@@ -927,7 +962,7 @@ struct nix_aq_res_s {
 
 /* NIX bandwidth profile structure */
 struct nix_band_prof_s {
-	uint64_t pc_mode : 2;
+	uint64_t pc_mode : 2; /* W0 */
 	uint64_t icolor : 2;
 	uint64_t tnl_ena : 1;
 	uint64_t rsvd_7_5 : 3;
@@ -942,7 +977,7 @@ struct nix_band_prof_s {
 	uint64_t peir_mantissa : 8;
 	uint64_t pebs_mantissa : 8;
 	uint64_t cir_mantissa : 8;
-	uint64_t cbs_mantissa : 8;
+	uint64_t cbs_mantissa : 8; /* W1 */
 	uint64_t lmode : 1;
 	uint64_t l_sellect : 3;
 	uint64_t rdiv : 4;
@@ -953,37 +988,37 @@ struct nix_band_prof_s {
 	uint64_t yc_action : 2;
 	uint64_t rc_action : 2;
 	uint64_t meter_algo : 2;
-	uint64_t band_prof_id : 7;
-	uint64_t rsvd_118_111 : 8;
+	uint64_t band_prof_id : 11;
+	uint64_t rsvd_118_115 : 4;
 	uint64_t hl_en : 1;
 	uint64_t rsvd_127_120 : 8;
-	uint64_t ts : 48;
+	uint64_t ts : 48; /* W2 */
 	uint64_t rsvd_191_176 : 16;
-	uint64_t pe_accum : 32;
+	uint64_t pe_accum : 32; /* W3 */
 	uint64_t c_accum : 32;
-	uint64_t green_pkt_pass : 48;
+	uint64_t green_pkt_pass : 48; /* W4 */
 	uint64_t rsvd_319_304 : 16;
-	uint64_t yellow_pkt_pass : 48;
+	uint64_t yellow_pkt_pass : 48; /* W5 */
 	uint64_t rsvd_383_368 : 16;
-	uint64_t red_pkt_pass : 48;
+	uint64_t red_pkt_pass : 48; /* W6 */
 	uint64_t rsvd_447_432 : 16;
-	uint64_t green_octs_pass : 48;
+	uint64_t green_octs_pass : 48; /* W7 */
 	uint64_t rsvd_511_496 : 16;
-	uint64_t yellow_octs_pass : 48;
+	uint64_t yellow_octs_pass : 48; /* W8 */
 	uint64_t rsvd_575_560 : 16;
-	uint64_t red_octs_pass : 48;
+	uint64_t red_octs_pass : 48; /* W9 */
 	uint64_t rsvd_639_624 : 16;
-	uint64_t green_pkt_drop : 48;
+	uint64_t green_pkt_drop : 48; /* W10 */
 	uint64_t rsvd_703_688 : 16;
-	uint64_t yellow_pkt_drop : 48;
+	uint64_t yellow_pkt_drop : 48; /* W11 */
 	uint64_t rsvd_767_752 : 16;
-	uint64_t red_pkt_drop : 48;
+	uint64_t red_pkt_drop : 48; /* W12 */
 	uint64_t rsvd_831_816 : 16;
-	uint64_t green_octs_drop : 48;
+	uint64_t green_octs_drop : 48; /* W13 */
 	uint64_t rsvd_895_880 : 16;
-	uint64_t yellow_octs_drop : 48;
+	uint64_t yellow_octs_drop : 48; /* W14 */
 	uint64_t rsvd_959_944 : 16;
-	uint64_t red_octs_drop : 48;
+	uint64_t red_octs_drop : 48; /* W15 */
 	uint64_t rsvd_1023_1008 : 16;
 };
 
@@ -1005,11 +1040,55 @@ struct nix_cint_hw_s {
 struct nix_cqe_hdr_s {
 	uint64_t tag : 32;
 	uint64_t q : 20;
-	uint64_t rsvd_57_52 : 6;
+	uint64_t long_send_comp : 1;
+	uint64_t rsvd_57_53 : 5;
 	uint64_t node : 2;
 	uint64_t cqe_type : 4;
 };
 
+/* [CN20K, .) NIX Completion queue context structure */
+struct nix_cn20k_cq_ctx_s {
+	uint64_t base : 64; /* W0 */
+	uint64_t lbp_ena : 1; /* W1 */
+	uint64_t lbpid_low : 3;
+	uint64_t bp_ena : 1;
+	uint64_t lbpid_med : 3;
+	uint64_t bpid : 9;
+	uint64_t lbpid_high : 3;
+	uint64_t qint_idx : 7;
+	uint64_t cq_err : 1;
+	uint64_t cint_idx : 7;
+	uint64_t avg_con : 9;
+	uint64_t wrptr : 20;
+	uint64_t tail : 20; /* W2 */
+	uint64_t head : 20;
+	uint64_t avg_level : 8;
+	uint64_t update_time : 16;
+	uint64_t bp : 8; /* W3 */
+	uint64_t drop : 8;
+	uint64_t drop_ena : 1;
+	uint64_t ena : 1;
+	uint64_t cpt_drop_err_en  : 1;
+	uint64_t reserved_211_211 : 1;
+	uint64_t msh_dst : 11;
+	uint64_t msh_valid : 1;
+	uint64_t stash_thresh : 4;
+	uint64_t lbp_frac : 4;
+	uint64_t caching : 1;
+	uint64_t stashing : 1;
+	uint64_t reserved_234_235 : 2;
+	uint64_t qsize : 4;
+	uint64_t cq_err_int : 8;
+	uint64_t cq_err_int_ena   : 8;
+	uint64_t bpid_ext : 2; /* W4 */
+	uint64_t reserved_258_259 : 2;
+	uint64_t lbpid_ext : 2;
+	uint64_t reserved_262_319 : 58;
+	uint64_t reserved_320_383 : 64; /* W5 */
+	uint64_t reserved_384_447 : 64; /* W6 */
+	uint64_t reserved_448_511 : 64; /* W7 */
+};
+
 /* NIX completion queue context structure */
 struct nix_cq_ctx_s {
 	uint64_t base : 64; /* W0 */
@@ -1083,6 +1162,184 @@ struct nix_qint_hw_s {
 	uint32_t ena : 1;
 };
 
+/* [CN20K, .) NIX receive queue context structure */
+struct nix_cn20k_rq_ctx_hw_s {
+	uint64_t ena : 1; /* W0 */
+	uint64_t sso_ena : 1;
+	uint64_t ipsech_ena : 1;
+	uint64_t ena_wqwd : 1;
+	uint64_t cq : 20;
+	uint64_t rsvd_34_24 : 11;
+	uint64_t port_il4_dis : 1;
+	uint64_t port_ol4_dis : 1;
+	uint64_t lenerr_dis : 1;
+	uint64_t csum_il4_dis : 1;
+	uint64_t csum_ol4_dis : 1;
+	uint64_t len_il4_dis : 1;
+	uint64_t len_il3_dis : 1;
+	uint64_t len_ol4_dis : 1;
+	uint64_t len_ol3_dis : 1;
+	uint64_t wqe_aura : 20;
+	uint64_t spb_aura : 20; /* W1 */
+	uint64_t lpb_aura : 20;
+	uint64_t sso_grp : 10;
+	uint64_t sso_tt : 2;
+	uint64_t pb_caching : 2;
+	uint64_t wqe_caching : 1;
+	uint64_t xqe_drop_ena : 1;
+	uint64_t spb_drop_ena : 1;
+	uint64_t lpb_drop_ena : 1;
+	uint64_t pb_stashing : 1;
+	uint64_t ipsecd_drop_en : 1;
+	uint64_t chi_ena : 1;
+	uint64_t rsvd_127_125 : 3;
+	uint64_t band_prof_id_l : 10; /* W2 */
+	uint64_t sso_drop_ena : 1;
+	uint64_t policer_ena : 1;
+	uint64_t spb_sizem1 : 6;
+	uint64_t wqe_skip : 2;
+	uint64_t spb_high_sizem1 : 3;
+	uint64_t spb_ena : 1;
+	uint64_t lpb_sizem1 : 12;
+	uint64_t first_skip : 7;
+	uint64_t sso_bp_ena : 1;
+	uint64_t later_skip : 6;
+	uint64_t xqe_imm_size : 6;
+	uint64_t band_prof_id_h : 4;
+	uint64_t rsvd_189_188 : 2;
+	uint64_t xqe_imm_copy : 1;
+	uint64_t xqe_hdr_split : 1;
+	uint64_t xqe_drop : 8; /* W3 */
+	uint64_t xqe_pass : 8;
+	uint64_t wqe_pool_drop : 8;
+	uint64_t wqe_pool_pass : 8;
+	uint64_t spb_aura_drop : 8;
+	uint64_t spb_aura_pass : 8;
+	uint64_t spb_pool_drop : 8;
+	uint64_t spb_pool_pass : 8;
+	uint64_t lpb_aura_drop : 8; /* W4 */
+	uint64_t lpb_aura_pass : 8;
+	uint64_t lpb_pool_drop : 8;
+	uint64_t lpb_pool_pass : 8;
+	uint64_t rsvd_319_288 : 32;
+	uint64_t ltag : 24; /* W5 */
+	uint64_t good_utag : 8;
+	uint64_t bad_utag : 8;
+	uint64_t flow_tagw : 6;
+	uint64_t rsvd_366  : 1;
+	uint64_t rsvd_367  : 1;
+	uint64_t rsvd_375_368 : 8;
+	uint64_t rsvd_379_376 : 4;
+	uint64_t rsvd_381_380 : 2;
+	uint64_t rsvd_383_382 : 2;
+	uint64_t octs : 48; /* W6 */
+	uint64_t rsvd_447_432 : 16;
+	uint64_t pkts : 48; /* W7 */
+	uint64_t rsvd_511_496 : 16;
+	uint64_t drop_octs : 48; /* W8 */
+	uint64_t rsvd_575_560 : 16;
+	uint64_t drop_pkts : 48; /* W9 */
+	uint64_t rsvd_639_624 : 16;
+	uint64_t re_pkts : 48; /* W10 */
+	uint64_t rsvd_702_688 : 15;
+	uint64_t ena_copy : 1;
+	uint64_t rsvd_739_704 : 36; /* W11 */
+	uint64_t rq_int : 8;
+	uint64_t rq_int_ena : 8;
+	uint64_t qint_idx : 7;
+	uint64_t rsvd_767_763 : 5;
+	uint64_t rsvd_831_768 : 64;  /* W12 */
+	uint64_t rsvd_895_832 : 64;  /* W13 */
+	uint64_t rsvd_959_896 : 64;  /* W14 */
+	uint64_t rsvd_1023_960 : 64; /* W15 */
+};
+
+/* [CN20K, .) NIX Receive queue context structure */
+struct nix_cn20k_rq_ctx_s {
+	uint64_t ena : 1; /* W0 */
+	uint64_t sso_ena : 1;
+	uint64_t ipsech_ena : 1;
+	uint64_t ena_wqwd : 1;
+	uint64_t cq : 20;
+	uint64_t reserved_24_34 : 11;
+	uint64_t port_il4_dis : 1;
+	uint64_t port_ol4_dis : 1;
+	uint64_t lenerr_dis : 1;
+	uint64_t csum_il4_dis : 1;
+	uint64_t csum_ol4_dis : 1;
+	uint64_t len_il4_dis : 1;
+	uint64_t len_il3_dis : 1;
+	uint64_t len_ol4_dis : 1;
+	uint64_t len_ol3_dis : 1;
+	uint64_t wqe_aura : 20;
+	uint64_t spb_aura : 20; /* W1 */
+	uint64_t lpb_aura : 20;
+	uint64_t sso_grp : 10;
+	uint64_t sso_tt : 2;
+	uint64_t pb_caching : 2;
+	uint64_t wqe_caching : 1;
+	uint64_t xqe_drop_ena : 1;
+	uint64_t spb_drop_ena : 1;
+	uint64_t lpb_drop_ena : 1;
+	uint64_t pb_stashing : 1;
+	uint64_t ipsecd_drop_en : 1;
+	uint64_t chi_ena : 1;
+	uint64_t reserved_125_127 : 3;
+	uint64_t band_prof_id_l : 10; /* W2 */
+	uint64_t sso_fc_ena : 1;
+	uint64_t policer_ena : 1;
+	uint64_t spb_sizem1 : 6;
+	uint64_t wqe_skip : 2;
+	uint64_t spb_high_sizem1 : 3;
+	uint64_t spb_ena : 1;
+	uint64_t lpb_sizem1 : 12;
+	uint64_t first_skip : 7;
+	uint64_t sso_bp_ena : 1;
+	uint64_t later_skip : 6;
+	uint64_t xqe_imm_size : 6;
+	uint64_t band_prof_id_h : 4;
+	uint64_t reserved_188_189 : 2;
+	uint64_t xqe_imm_copy : 1;
+	uint64_t xqe_hdr_split : 1;
+	uint64_t xqe_drop : 8; /* W3 */
+	uint64_t xqe_pass : 8;
+	uint64_t wqe_pool_drop : 8;
+	uint64_t wqe_pool_pass : 8;
+	uint64_t spb_aura_drop : 8;
+	uint64_t spb_aura_pass : 8;
+	uint64_t spb_pool_drop : 8;
+	uint64_t spb_pool_pass : 8;
+	uint64_t lpb_aura_drop : 8; /* W4 */
+	uint64_t lpb_aura_pass : 8;
+	uint64_t lpb_pool_drop : 8;
+	uint64_t lpb_pool_pass : 8;
+	uint64_t reserved_288_291 : 4;
+	uint64_t rq_int : 8;
+	uint64_t rq_int_ena : 8;
+	uint64_t qint_idx : 7;
+	uint64_t reserved_315_319 : 5;
+	uint64_t ltag : 24; /* W5 */
+	uint64_t good_utag : 8;
+	uint64_t bad_utag : 8;
+	uint64_t flow_tagw : 6;
+	uint64_t reserved_366_383 : 18;
+	uint64_t octs : 48; /* W6 */
+	uint64_t reserved_432_447 : 16;
+	uint64_t pkts : 48; /* W7 */
+	uint64_t reserved_496_511 : 16;
+	uint64_t drop_octs : 48; /* W8 */
+	uint64_t reserved_560_575 : 16;
+	uint64_t drop_pkts : 48; /* W9 */
+	uint64_t reserved_624_639 : 16;
+	uint64_t re_pkts : 48; /* W10 */
+	uint64_t reserved_688_703 : 16;
+	uint64_t reserved_704_767 : 64; /* W11 */
+	uint64_t reserved_768_831 : 64; /* W12 */
+	uint64_t reserved_832_895 : 64; /* W13 */
+	uint64_t reserved_896_959 : 64; /* W14 */
+	uint64_t reserved_960_1023 : 64; /* W15 */
+};
+
 /* [CN10K, .) NIX receive queue context structure */
 struct nix_cn10k_rq_ctx_hw_s {
 	uint64_t ena : 1;
@@ -1493,13 +1750,13 @@ union nix_rx_parse_u {
 		uint64_t lhptr : 8;
 		uint64_t vtag0_ptr : 8;
 		uint64_t vtag1_ptr : 8;
-		uint64_t flow_key_alg : 5;
-		uint64_t rsvd_341 : 1;
+		uint64_t flow_key_alg : 6;
 		uint64_t rsvd_349_342 : 8;
 		uint64_t rsvd_353_350 : 4;
 		uint64_t rsvd_359_354 : 6;
 		uint64_t color : 2;
-		uint64_t rsvd_381_362 : 20;
+		uint64_t mcs_mdata    : 14;
+		uint64_t rsvd_381_376 : 6;
 		uint64_t rsvd_382 : 1;
 		uint64_t rsvd_383 : 1;
 		uint64_t rsvd_447_384 : 64; /* W6 */
@@ -1652,7 +1909,9 @@ union nix_send_ext_w1_u {
 		uint64_t vlan0_ins_ena : 1;
 		uint64_t vlan1_ins_ena : 1;
 		uint64_t init_color : 2;
-		uint64_t rsvd_127_116 : 12;
+		uint64_t flow_id       : 7;
+		uint64_t flow_override : 1;
+		uint64_t rsvd_127_124 : 4;
 	};
 	struct {
 		uint64_t vlan0_ins_ptr : 8;
@@ -1675,7 +1934,7 @@ union nix_send_hdr_w0_u {
 	uint64_t u;
 	struct {
 		uint64_t total : 18;
-		uint64_t rsvd_18 : 1;
+		uint64_t cpt_error : 1;
 		uint64_t df : 1;
 		uint64_t aura : 20;
 		uint64_t sizem1 : 3;
@@ -1718,7 +1977,8 @@ struct nix_send_jump_s {
 	uint64_t rsvd_13_7 : 7;
 	uint64_t ld_type : 2;
 	uint64_t aura : 20;
-	uint64_t rsvd_58_36 : 23;
+	uint64_t refcnt_en  : 1;
+	uint64_t rsvd_58_37 : 22;
 	uint64_t f : 1;
 	uint64_t subdc : 4;
 	uint64_t addr : 64; /* W1 */
@@ -1729,7 +1989,10 @@ union nix_send_mem_w0_u {
 	uint64_t u;
 	struct {
 		uint64_t offset : 16;
-		uint64_t rsvd_51_16 : 36;
+		uint64_t base_ns     : 32;
+		uint64_t step_type   : 1;
+		uint64_t rsvd_50_49  : 2;
+		uint64_t coas_en     : 1;
 		uint64_t per_lso_seg : 1;
 		uint64_t wmem : 1;
 		uint64_t dsz : 2;
@@ -1760,7 +2023,8 @@ union nix_send_sg2_s {
 		uint64_t i1 : 1;
 		uint64_t fabs : 1;
 		uint64_t foff : 8;
-		uint64_t rsvd_57_46 : 12;
+		uint64_t refcnt_en1 : 1;
+		uint64_t rsvd_57_47 : 11;
 		uint64_t ld_type : 2;
 		uint64_t subdc : 4;
 	};
@@ -1773,7 +2037,10 @@ union nix_send_sg_s {
 		uint64_t seg2_size : 16;
 		uint64_t seg3_size : 16;
 		uint64_t segs : 2;
-		uint64_t rsvd_54_50 : 5;
+		uint64_t rsvd_51_50 : 2;
+		uint64_t refcnt_en1 : 1;
+		uint64_t refcnt_en2 : 1;
+		uint64_t refcnt_en3 : 1;
 		uint64_t i1 : 1;
 		uint64_t i2 : 1;
 		uint64_t i3 : 1;
@@ -1792,6 +2059,133 @@ struct nix_send_work_s {
 	uint64_t addr : 64; /* W1 */
 };
 
+/* [CN20K, .) NIX sq context hardware structure */
+struct nix_cn20k_sq_ctx_hw_s {
+	uint64_t ena : 1;
+	uint64_t substream : 20;
+	uint64_t max_sqe_size : 2;
+	uint64_t sqe_way_mask : 16;
+	uint64_t sqb_aura : 20;
+	uint64_t gbl_rsvd1 : 5;
+	uint64_t cq_id : 20; /* W1 */
+	uint64_t cq_ena : 1;
+	uint64_t qint_idx : 6;
+	uint64_t gbl_rsvd2 : 1;
+	uint64_t sq_int : 8;
+	uint64_t sq_int_ena : 8;
+	uint64_t xoff : 1;
+	uint64_t sqe_stype : 2;
+	uint64_t gbl_rsvd : 17;
+	uint64_t head_sqb : 64; /* W2 */
+	uint64_t head_offset : 6; /* W3 */
+	uint64_t sqb_dequeue_count : 16;
+	uint64_t default_chan : 12;
+	uint64_t sdp_mcast : 1;
+	uint64_t sso_ena : 1;
+	uint64_t dse_rsvd1 : 28;
+	uint64_t sqb_enqueue_count : 16; /* W4 */
+	uint64_t tail_offset : 6;
+	uint64_t lmt_dis : 1;
+	uint64_t smq_rr_weight : 14;
+	uint64_t dnq_rsvd1 : 27;
+	uint64_t tail_sqb : 64; /* W5 */
+	uint64_t next_sqb : 64; /* W6 */
+	uint64_t smq : 11; /* W7 */
+	uint64_t smq_pend : 1;
+	uint64_t smq_next_sq : 20;
+	uint64_t smq_next_sq_vld : 1;
+	uint64_t mnq_dis : 1;
+	uint64_t scm1_rsvd2 : 30;
+	uint64_t smenq_sqb : 64; /* W8 */
+	uint64_t smenq_offset : 6; /* W9 */
+	uint64_t cq_limit : 8;
+	uint64_t smq_rr_count : 32;
+	uint64_t scm_lso_rem : 18;
+	uint64_t smq_lso_segnum : 8; /* W10 */
+	uint64_t vfi_lso_total : 18;
+	uint64_t vfi_lso_sizem1 : 3;
+	uint64_t vfi_lso_sb : 8;
+	uint64_t vfi_lso_mps : 14;
+	uint64_t vfi_lso_vlan0_ins_ena : 1;
+	uint64_t vfi_lso_vlan1_ins_ena : 1;
+	uint64_t vfi_lso_vld : 1;
+	uint64_t smenq_next_sqb_vld : 1;
+	uint64_t scm_dq_rsvd1 : 9;
+	uint64_t smenq_next_sqb : 64; /* W11 */
+	uint64_t age_drop_octs : 32; /* W12 */
+	uint64_t age_drop_pkts : 32;
+	uint64_t drop_pkts : 48; /* W13 */
+	uint64_t drop_octs_lsw : 16;
+	uint64_t drop_octs_msw : 32; /* W14 */
+	uint64_t pkts_lsw : 32;
+	uint64_t pkts_msw : 16; /* W15 */
+	uint64_t octs : 48;
+};
+
+/* [CN20K, .) NIX Send queue context structure */
+struct nix_cn20k_sq_ctx_s {
+	uint64_t ena : 1; /* W0 */
+	uint64_t qint_idx : 6;
+	uint64_t substream : 20;
+	uint64_t sdp_mcast :  1;
+	uint64_t cq : 20;
+	uint64_t sqe_way_mask : 16;
+	uint64_t smq : 11; /* W1 */
+	uint64_t cq_ena : 1;
+	uint64_t xoff : 1;
+	uint64_t sso_ena : 1;
+	uint64_t smq_rr_weight : 14;
+	uint64_t default_chan : 12;
+	uint64_t sqb_count : 16;
+	uint64_t reserved_120_120 : 1;
+	uint64_t smq_rr_count_lb : 7;
+	uint64_t smq_rr_count_ub : 25; /* W2 */
+	uint64_t sqb_aura : 20;
+	uint64_t sq_int : 8;
+	uint64_t sq_int_ena : 8;
+	uint64_t sqe_stype : 2;
+	uint64_t reserved_191_191 : 1;
+	uint64_t max_sqe_size : 2; /* W3 */
+	uint64_t cq_limit : 8;
+	uint64_t lmt_dis : 1;
+	uint64_t mnq_dis : 1;
+	uint64_t smq_next_sq : 20;
+	uint64_t smq_lso_segnum :  8;
+	uint64_t tail_offset :  6;
+	uint64_t smenq_offset :  6;
+	uint64_t head_offset :  6;
+	uint64_t smenq_next_sqb_vld :  1;
+	uint64_t smq_pend :  1;
+	uint64_t smq_next_sq_vld :  1;
+	uint64_t reserved_253_255 :  3;
+	uint64_t next_sqb : 64; /* W4 */
+	uint64_t tail_sqb : 64; /* W5 */
+	uint64_t smenq_sqb : 64; /* W6 */
+	uint64_t smenq_next_sqb : 64; /* W7 */
+	uint64_t head_sqb : 64; /* W8 */
+	uint64_t reserved_576_583 : 8; /* W9 */
+	uint64_t vfi_lso_total : 18;
+	uint64_t vfi_lso_sizem1 : 3;
+	uint64_t vfi_lso_sb : 8;
+	uint64_t vfi_lso_mps : 14;
+	uint64_t vfi_lso_vlan0_ins_ena : 1;
+	uint64_t vfi_lso_vlan1_ins_ena : 1;
+	uint64_t vfi_lso_vld : 1;
+	uint64_t reserved_630_639 : 10;
+	uint64_t scm_lso_rem : 18; /* W10 */
+	uint64_t reserved_658_703 : 46;
+	uint64_t octs : 48; /* W11 */
+	uint64_t reserved_752_767 : 16;
+	uint64_t pkts : 48; /* W12 */
+	uint64_t reserved_816_831 : 16;
+	uint64_t aged_drop_octs : 32; /* W13 */
+	uint64_t aged_drop_pkts : 32;
+	uint64_t drop_octs : 48; /* W14 */
+	uint64_t reserved_944_959 : 16;
+	uint64_t drop_pkts : 48; /* W15 */
+	uint64_t reserved_1008_1023 : 16;
+};
+
 /* [CN10K, .) NIX sq context hardware structure */
 struct nix_cn10k_sq_ctx_hw_s {
 	uint64_t ena : 1;
@@ -2234,17 +2628,24 @@ struct nix_lso_format {
 #define NIX_CN9K_TM_RR_QUANTUM_MAX (BIT_ULL(24) - 1)
 #define NIX_TM_RR_WEIGHT_MAX	   (BIT_ULL(14) - 1)
 
-/* [CN9K, CN10K) */
-#define NIX_CN9K_TXSCH_LVL_SMQ_MAX 512
-
-/* [CN10K, .) */
-#define NIX_TXSCH_LVL_SMQ_MAX 832
-
 /* [CN9K, .) */
-#define NIX_TXSCH_LVL_TL4_MAX 512
-#define NIX_TXSCH_LVL_TL3_MAX 256
-#define NIX_TXSCH_LVL_TL2_MAX 256
 #define NIX_TXSCH_LVL_TL1_MAX 28
+#define NIX_TXSCH_LVL_TL2_MAX 256
+
+/* CN9K */
+#define NIX_CN9K_TXSCH_LVL_TL3_MAX 256
+#define NIX_CN9K_TXSCH_LVL_TL4_MAX 512
+#define NIX_CN9K_TXSCH_LVL_SMQ_MAX 512
+
+/* CN10K */
+#define NIX_CN10K_TXSCH_LVL_TL3_MAX 256
+#define NIX_CN10K_TXSCH_LVL_TL4_MAX 512
+#define NIX_CN10K_TXSCH_LVL_SMQ_MAX 832
+
+/* [CN20K, .) */
+#define NIX_TXSCH_LVL_TL3_MAX 512
+#define NIX_TXSCH_LVL_TL4_MAX 1280
+#define NIX_TXSCH_LVL_SMQ_MAX 2048
 
 #define NIX_CQ_OP_STAT_OP_ERR 63
 #define NIX_CQ_OP_STAT_CQ_ERR 46
@@ -2265,4 +2666,9 @@ struct nix_lso_format {
 #define NIX_SENDSTAT_IOFFSET_MASK 0xFFF
 #define NIX_SENDSTAT_OOFFSET_MASK 0xFFF
 
+/* The mask is to extract lower 10-bits of channel number
+ * which CPT will pass to X2P.
+ */
+#define NIX_CHAN_CPT_X2P_MASK (0x3ffull)
+
 #endif /* __NIX_HW_H__ */
diff --git a/drivers/common/cnxk/hw/rvu.h b/drivers/common/cnxk/hw/rvu.h
index ee6cf30c5d..ed2ba996e0 100644
--- a/drivers/common/cnxk/hw/rvu.h
+++ b/drivers/common/cnxk/hw/rvu.h
@@ -67,7 +67,9 @@
 #define RVU_PF_VFX_PFVF_MBOXX(a, b)                                            \
 	(0x0ull | (uint64_t)(a) << 12 | (uint64_t)(b) << 3)
 #define RVU_PF_VF_BAR4_ADDR		 (0x10ull)
-#define RVU_PF_BLOCK_ADDRX_DISC(a)	 (0x200ull | (uint64_t)(a) << 3)
+
+#define RVU_PF_DISC			 (0x0ull)  /* [CN20K, .) */
+#define RVU_PF_BLOCK_ADDRX_DISC(a)	 (0x200ull | (uint64_t)(a) << 3)  /* [CN9K, CN20K) */
 #define RVU_PF_VFME_STATUSX(a)		 (0x800ull | (uint64_t)(a) << 3)
 #define RVU_PF_VFTRPENDX(a)		 (0x820ull | (uint64_t)(a) << 3)
 #define RVU_PF_VFTRPEND_W1SX(a)		 (0x840ull | (uint64_t)(a) << 3)
@@ -91,7 +93,8 @@
 #define RVU_PF_MSIX_VECX_ADDR(a)	 (0x80000ull | (uint64_t)(a) << 4)
 #define RVU_PF_MSIX_VECX_CTL(a)		 (0x80008ull | (uint64_t)(a) << 4)
 #define RVU_PF_MSIX_PBAX(a)		 (0xf0000ull | (uint64_t)(a) << 3)
-#define RVU_VF_VFPF_MBOXX(a)		 (0x0ull | (uint64_t)(a) << 3)
+#define RVU_VF_DISC			 (0x0ull)  /* [CN20K, .) */
+#define RVU_VF_VFPF_MBOXX(a)		 (0x0ull | (uint64_t)(a) << 3) /* [CN9K, CN20K) */
 #define RVU_VF_INT			 (0x20ull)
 #define RVU_VF_INT_W1S			 (0x28ull)
 #define RVU_VF_INT_ENA_W1S		 (0x30ull)
diff --git a/drivers/common/cnxk/roc_mbox.h b/drivers/common/cnxk/roc_mbox.h
index 9a9dcbdbda..dd65946e9e 100644
--- a/drivers/common/cnxk/roc_mbox.h
+++ b/drivers/common/cnxk/roc_mbox.h
@@ -309,6 +309,7 @@ struct mbox_msghdr {
 	M(NIX_MCAST_GRP_UPDATE, 0x802d, nix_mcast_grp_update, nix_mcast_grp_update_req,            \
 	  nix_mcast_grp_update_rsp)                                                                \
 	M(NIX_GET_LF_STATS,    0x802e, nix_get_lf_stats, nix_get_lf_stats_req, nix_lf_stats_rsp)   \
+	M(NIX_CN20K_AQ_ENQ, 0x802f, nix_cn20k_aq_enq, nix_cn20k_aq_enq_req, nix_cn20k_aq_enq_rsp)  \
 	/* MCS mbox IDs (range 0xa000 - 0xbFFF) */                                                 \
 	M(MCS_ALLOC_RESOURCES, 0xa000, mcs_alloc_resources, mcs_alloc_rsrc_req,                    \
 	  mcs_alloc_rsrc_rsp)                                                                      \
@@ -1442,6 +1443,57 @@ struct nix_lf_free_req {
 	uint64_t __io flags;
 };
 
+/* CN20x NIX AQ enqueue msg */
+struct nix_cn20k_aq_enq_req {
+	struct mbox_msghdr hdr;
+	uint32_t __io qidx;
+	uint8_t __io ctype;
+	uint8_t __io op;
+	union {
+		/* Valid when op == WRITE/INIT and ctype == NIX_AQ_CTYPE_RQ */
+		__io struct nix_cn20k_rq_ctx_s rq;
+		/* Valid when op == WRITE/INIT and ctype == NIX_AQ_CTYPE_SQ */
+		__io struct nix_cn20k_sq_ctx_s sq;
+		/* Valid when op == WRITE/INIT and ctype == NIX_AQ_CTYPE_CQ */
+		__io struct nix_cn20k_cq_ctx_s cq;
+		/* Valid when op == WRITE/INIT and ctype == NIX_AQ_CTYPE_RSS */
+		__io struct nix_rsse_s rss;
+		/* Valid when op == WRITE/INIT and ctype == NIX_AQ_CTYPE_MCE */
+		__io struct nix_rx_mce_s mce;
+		/* Valid when op == WRITE/INIT and
+		 * ctype == NIX_AQ_CTYPE_BAND_PROF
+		 */
+		__io struct nix_band_prof_s prof;
+	};
+	/* Mask data when op == WRITE (1=write, 0=don't write) */
+	union {
+		/* Valid when op == WRITE and ctype == NIX_AQ_CTYPE_RQ */
+		__io struct nix_cn20k_rq_ctx_s rq_mask;
+		/* Valid when op == WRITE and ctype == NIX_AQ_CTYPE_SQ */
+		__io struct nix_cn20k_sq_ctx_s sq_mask;
+		/* Valid when op == WRITE and ctype == NIX_AQ_CTYPE_CQ */
+		__io struct nix_cn20k_cq_ctx_s cq_mask;
+		/* Valid when op == WRITE and ctype == NIX_AQ_CTYPE_RSS */
+		__io struct nix_rsse_s rss_mask;
+		/* Valid when op == WRITE and ctype == NIX_AQ_CTYPE_MCE */
+		__io struct nix_rx_mce_s mce_mask;
+		/* Valid when op == WRITE and ctype == NIX_AQ_CTYPE_BAND_PROF */
+		__io struct nix_band_prof_s prof_mask;
+	};
+};
+
+struct nix_cn20k_aq_enq_rsp {
+	struct mbox_msghdr hdr;
+	union {
+		__io struct nix_cn20k_rq_ctx_s rq;
+		__io struct nix_cn20k_sq_ctx_s sq;
+		__io struct nix_cn20k_cq_ctx_s cq;
+		__io struct nix_rsse_s rss;
+		__io struct nix_rx_mce_s mce;
+		__io struct nix_band_prof_s prof;
+	};
+};
+
 /* CN10x NIX AQ enqueue msg */
 struct nix_cn10k_aq_enq_req {
 	struct mbox_msghdr hdr;
diff --git a/drivers/common/cnxk/roc_nix.c b/drivers/common/cnxk/roc_nix.c
index 041621dfaa..e4d7e11121 100644
--- a/drivers/common/cnxk/roc_nix.c
+++ b/drivers/common/cnxk/roc_nix.c
@@ -398,15 +398,22 @@ sdp_lbk_id_update(struct plt_pci_device *pci_dev, struct nix *nix)
 uint64_t
 nix_get_blkaddr(struct dev *dev)
 {
+	uint64_t blkaddr;
 	uint64_t reg;
 
 	/* Reading the discovery register to know which NIX is the LF
 	 * attached to.
 	 */
-	reg = plt_read64(dev->bar2 +
-			 RVU_PF_BLOCK_ADDRX_DISC(RVU_BLOCK_ADDR_NIX0));
-
-	return reg & 0x1FFULL ? RVU_BLOCK_ADDR_NIX0 : RVU_BLOCK_ADDR_NIX1;
+	if (roc_model_is_cn9k() || roc_model_is_cn10k()) {
+		reg = plt_read64(dev->bar2 + RVU_PF_BLOCK_ADDRX_DISC(RVU_BLOCK_ADDR_NIX0));
+		blkaddr = reg & 0x1FFULL ? RVU_BLOCK_ADDR_NIX0 : RVU_BLOCK_ADDR_NIX1;
+	} else {
+		reg = plt_read64(dev->bar2 + RVU_PF_DISC);
+		blkaddr = reg & BIT_ULL(RVU_BLOCK_ADDR_NIX0) ? RVU_BLOCK_ADDR_NIX0 :
+			RVU_BLOCK_ADDR_NIX1;
+		blkaddr = RVU_BLOCK_ADDR_NIX0;
+	}
+	return blkaddr;
 }
 
 int
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 14/33] common/cnxk: support NIX queue config for cn20k
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (12 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 13/33] common/cnxk: add cn20k NIX register definitions Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 15/33] common/cnxk: support bandwidth profile " Nithin Dabilpuram
                   ` (21 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

From: Satha Rao <skoteshwar@marvell.com>

Add support to setup NIX RQ, SQ, CQ for cn20k.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
---
 drivers/common/cnxk/roc_nix_fc.c     |  52 ++-
 drivers/common/cnxk/roc_nix_inl.c    |   2 +
 drivers/common/cnxk/roc_nix_priv.h   |   1 +
 drivers/common/cnxk/roc_nix_queue.c  | 532 ++++++++++++++++++++++++++-
 drivers/common/cnxk/roc_nix_stats.c  |  55 ++-
 drivers/common/cnxk/roc_nix_tm.c     |  22 +-
 drivers/common/cnxk/roc_nix_tm_ops.c |  14 +-
 7 files changed, 650 insertions(+), 28 deletions(-)

diff --git a/drivers/common/cnxk/roc_nix_fc.c b/drivers/common/cnxk/roc_nix_fc.c
index 2f72e67993..0676363c58 100644
--- a/drivers/common/cnxk/roc_nix_fc.c
+++ b/drivers/common/cnxk/roc_nix_fc.c
@@ -127,7 +127,7 @@ nix_fc_cq_config_get(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 		aq->qidx = fc_cfg->cq_cfg.rq;
 		aq->ctype = NIX_AQ_CTYPE_CQ;
 		aq->op = NIX_AQ_INSTOP_READ;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -136,6 +136,18 @@ nix_fc_cq_config_get(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 			goto exit;
 		}
 
+		aq->qidx = fc_cfg->cq_cfg.rq;
+		aq->ctype = NIX_AQ_CTYPE_CQ;
+		aq->op = NIX_AQ_INSTOP_READ;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
 		aq->qidx = fc_cfg->cq_cfg.rq;
 		aq->ctype = NIX_AQ_CTYPE_CQ;
 		aq->op = NIX_AQ_INSTOP_READ;
@@ -179,7 +191,7 @@ nix_fc_rq_config_get(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 		aq->qidx = fc_cfg->rq_cfg.rq;
 		aq->ctype = NIX_AQ_CTYPE_RQ;
 		aq->op = NIX_AQ_INSTOP_READ;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -188,6 +200,18 @@ nix_fc_rq_config_get(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 			goto exit;
 		}
 
+		aq->qidx = fc_cfg->rq_cfg.rq;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_READ;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
 		aq->qidx = fc_cfg->rq_cfg.rq;
 		aq->ctype = NIX_AQ_CTYPE_RQ;
 		aq->op = NIX_AQ_INSTOP_READ;
@@ -270,7 +294,7 @@ nix_fc_cq_config_set(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 
 		aq->cq.bp_ena = !!(fc_cfg->cq_cfg.enable);
 		aq->cq_mask.bp_ena = ~(aq->cq_mask.bp_ena);
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -290,6 +314,28 @@ nix_fc_cq_config_set(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 			aq->cq_mask.bp = ~(aq->cq_mask.bp);
 		}
 
+		aq->cq.bp_ena = !!(fc_cfg->cq_cfg.enable);
+		aq->cq_mask.bp_ena = ~(aq->cq_mask.bp_ena);
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = fc_cfg->cq_cfg.rq;
+		aq->ctype = NIX_AQ_CTYPE_CQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+
+		if (fc_cfg->cq_cfg.enable) {
+			aq->cq.bpid = nix->bpid[fc_cfg->cq_cfg.tc];
+			aq->cq_mask.bpid = ~(aq->cq_mask.bpid);
+			aq->cq.bp = fc_cfg->cq_cfg.cq_drop;
+			aq->cq_mask.bp = ~(aq->cq_mask.bp);
+		}
+
 		aq->cq.bp_ena = !!(fc_cfg->cq_cfg.enable);
 		aq->cq_mask.bp_ena = ~(aq->cq_mask.bp_ena);
 	}
diff --git a/drivers/common/cnxk/roc_nix_inl.c b/drivers/common/cnxk/roc_nix_inl.c
index a984ac56d9..a759052973 100644
--- a/drivers/common/cnxk/roc_nix_inl.c
+++ b/drivers/common/cnxk/roc_nix_inl.c
@@ -1385,6 +1385,8 @@ roc_nix_inl_dev_rq_get(struct roc_nix_rq *rq, bool enable)
 	mbox = mbox_get(dev->mbox);
 	if (roc_model_is_cn9k())
 		rc = nix_rq_cn9k_cfg(dev, inl_rq, inl_dev->qints, false, enable);
+	else if (roc_model_is_cn10k())
+		rc = nix_rq_cn10k_cfg(dev, inl_rq, inl_dev->qints, false, enable);
 	else
 		rc = nix_rq_cfg(dev, inl_rq, inl_dev->qints, false, enable);
 	if (rc) {
diff --git a/drivers/common/cnxk/roc_nix_priv.h b/drivers/common/cnxk/roc_nix_priv.h
index 275ffc8ea3..ade42c1878 100644
--- a/drivers/common/cnxk/roc_nix_priv.h
+++ b/drivers/common/cnxk/roc_nix_priv.h
@@ -409,6 +409,7 @@ int nix_tm_sq_sched_conf(struct nix *nix, struct nix_tm_node *node,
 
 int nix_rq_cn9k_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints,
 		    bool cfg, bool ena);
+int nix_rq_cn10k_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cfg, bool ena);
 int nix_rq_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cfg,
 	       bool ena);
 int nix_rq_ena_dis(struct dev *dev, struct roc_nix_rq *rq, bool enable);
diff --git a/drivers/common/cnxk/roc_nix_queue.c b/drivers/common/cnxk/roc_nix_queue.c
index f5441e0e6b..bb1b70424f 100644
--- a/drivers/common/cnxk/roc_nix_queue.c
+++ b/drivers/common/cnxk/roc_nix_queue.c
@@ -69,7 +69,7 @@ nix_rq_ena_dis(struct dev *dev, struct roc_nix_rq *rq, bool enable)
 
 		aq->rq.ena = enable;
 		aq->rq_mask.ena = ~(aq->rq_mask.ena);
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -82,6 +82,21 @@ nix_rq_ena_dis(struct dev *dev, struct roc_nix_rq *rq, bool enable)
 		aq->ctype = NIX_AQ_CTYPE_RQ;
 		aq->op = NIX_AQ_INSTOP_WRITE;
 
+		aq->rq.ena = enable;
+		aq->rq_mask.ena = ~(aq->rq_mask.ena);
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = rq->qid;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+
 		aq->rq.ena = enable;
 		aq->rq_mask.ena = ~(aq->rq_mask.ena);
 	}
@@ -150,7 +165,7 @@ roc_nix_rq_is_sso_enable(struct roc_nix *roc_nix, uint32_t qid)
 			goto exit;
 
 		sso_enable = rsp->rq.sso_ena;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_rsp *rsp;
 		struct nix_cn10k_aq_enq_req *aq;
 
@@ -164,6 +179,25 @@ roc_nix_rq_is_sso_enable(struct roc_nix *roc_nix, uint32_t qid)
 		aq->ctype = NIX_AQ_CTYPE_RQ;
 		aq->op = NIX_AQ_INSTOP_READ;
 
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc)
+			goto exit;
+
+		sso_enable = rsp->rq.sso_ena;
+	} else {
+		struct nix_cn20k_aq_enq_rsp *rsp;
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = qid;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_READ;
+
 		rc = mbox_process_msg(mbox, (void *)&rsp);
 		if (rc)
 			goto exit;
@@ -222,7 +256,7 @@ nix_rq_aura_buf_type_update(struct roc_nix_rq *rq, bool set)
 		if (rsp->rq.spb_ena)
 			spb_aura = roc_npa_aura_handle_gen(rsp->rq.spb_aura, aura_base);
 		mbox_put(mbox);
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_rsp *rsp;
 		struct nix_cn10k_aq_enq_req *aq;
 
@@ -249,6 +283,32 @@ nix_rq_aura_buf_type_update(struct roc_nix_rq *rq, bool set)
 		if (rsp->rq.vwqe_ena)
 			vwqe_aura = roc_npa_aura_handle_gen(rsp->rq.wqe_aura, aura_base);
 
+		mbox_put(mbox);
+	} else {
+		struct nix_cn20k_aq_enq_rsp *rsp;
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox_get(mbox));
+		if (!aq) {
+			mbox_put(mbox);
+			return -ENOSPC;
+		}
+
+		aq->qidx = rq->qid;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_READ;
+
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc) {
+			mbox_put(mbox);
+			return rc;
+		}
+
+		/* Get aura handle from aura */
+		lpb_aura = roc_npa_aura_handle_gen(rsp->rq.lpb_aura, aura_base);
+		if (rsp->rq.spb_ena)
+			spb_aura = roc_npa_aura_handle_gen(rsp->rq.spb_aura, aura_base);
+
 		mbox_put(mbox);
 	}
 
@@ -443,8 +503,7 @@ nix_rq_cn9k_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints,
 }
 
 int
-nix_rq_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cfg,
-	   bool ena)
+nix_rq_cn10k_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cfg, bool ena)
 {
 	struct nix_cn10k_aq_enq_req *aq;
 	struct mbox *mbox = dev->mbox;
@@ -667,6 +726,171 @@ nix_rq_cman_cfg(struct dev *dev, struct roc_nix_rq *rq)
 	return rc;
 }
 
+int
+nix_rq_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cfg, bool ena)
+{
+	struct nix_cn20k_aq_enq_req *aq;
+	struct mbox *mbox = dev->mbox;
+
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+	if (!aq)
+		return -ENOSPC;
+
+	aq->qidx = rq->qid;
+	aq->ctype = NIX_AQ_CTYPE_RQ;
+	aq->op = cfg ? NIX_AQ_INSTOP_WRITE : NIX_AQ_INSTOP_INIT;
+
+	if (rq->sso_ena) {
+		/* SSO mode */
+		aq->rq.sso_ena = 1;
+		aq->rq.sso_tt = rq->tt;
+		aq->rq.sso_grp = rq->hwgrp;
+		aq->rq.ena_wqwd = 1;
+		aq->rq.wqe_skip = rq->wqe_skip;
+		aq->rq.wqe_caching = 1;
+
+		aq->rq.good_utag = rq->tag_mask >> 24;
+		aq->rq.bad_utag = rq->tag_mask >> 24;
+		aq->rq.ltag = rq->tag_mask & BITMASK_ULL(24, 0);
+
+		if (rq->vwqe_ena)
+			aq->rq.wqe_aura = roc_npa_aura_handle_to_aura(rq->vwqe_aura_handle);
+	} else {
+		/* CQ mode */
+		aq->rq.sso_ena = 0;
+		aq->rq.good_utag = rq->tag_mask >> 24;
+		aq->rq.bad_utag = rq->tag_mask >> 24;
+		aq->rq.ltag = rq->tag_mask & BITMASK_ULL(24, 0);
+		aq->rq.cq = rq->cqid;
+	}
+
+	if (rq->ipsech_ena) {
+		aq->rq.ipsech_ena = 1;
+		aq->rq.ipsecd_drop_en = 1;
+		aq->rq.ena_wqwd = 1;
+		aq->rq.wqe_skip = rq->wqe_skip;
+		aq->rq.wqe_caching = 1;
+	}
+
+	aq->rq.lpb_aura = roc_npa_aura_handle_to_aura(rq->aura_handle);
+
+	/* Sizes must be aligned to 8 bytes */
+	if (rq->first_skip & 0x7 || rq->later_skip & 0x7 || rq->lpb_size & 0x7)
+		return -EINVAL;
+
+	/* Expressed in number of dwords */
+	aq->rq.first_skip = rq->first_skip / 8;
+	aq->rq.later_skip = rq->later_skip / 8;
+	aq->rq.flow_tagw = rq->flow_tag_width; /* 32-bits */
+	aq->rq.lpb_sizem1 = rq->lpb_size / 8;
+	aq->rq.lpb_sizem1 -= 1; /* Expressed in size minus one */
+	aq->rq.ena = ena;
+
+	if (rq->spb_ena) {
+		uint32_t spb_sizem1;
+
+		aq->rq.spb_ena = 1;
+		aq->rq.spb_aura =
+			roc_npa_aura_handle_to_aura(rq->spb_aura_handle);
+
+		if (rq->spb_size & 0x7 ||
+		    rq->spb_size > NIX_RQ_CN10K_SPB_MAX_SIZE)
+			return -EINVAL;
+
+		spb_sizem1 = rq->spb_size / 8; /* Expressed in no. of dwords */
+		spb_sizem1 -= 1;	       /* Expressed in size minus one */
+		aq->rq.spb_sizem1 = spb_sizem1 & 0x3F;
+		aq->rq.spb_high_sizem1 = (spb_sizem1 >> 6) & 0x7;
+	} else {
+		aq->rq.spb_ena = 0;
+	}
+
+	aq->rq.pb_caching = 0x2; /* First cache aligned block to LLC */
+	aq->rq.xqe_imm_size = 0; /* No pkt data copy to CQE */
+	aq->rq.rq_int_ena = 0;
+	/* Many to one reduction */
+	aq->rq.qint_idx = rq->qid % qints;
+	aq->rq.xqe_drop_ena = 0;
+	aq->rq.lpb_drop_ena = rq->lpb_drop_ena;
+	aq->rq.spb_drop_ena = rq->spb_drop_ena;
+
+	/* If RED enabled, then fill enable for all cases */
+	if (rq->red_pass && (rq->red_pass >= rq->red_drop)) {
+		aq->rq.spb_pool_pass = rq->spb_red_pass;
+		aq->rq.lpb_pool_pass = rq->red_pass;
+		aq->rq.wqe_pool_pass = rq->red_pass;
+		aq->rq.xqe_pass = rq->red_pass;
+
+		aq->rq.spb_pool_drop = rq->spb_red_drop;
+		aq->rq.lpb_pool_drop = rq->red_drop;
+		aq->rq.wqe_pool_drop = rq->red_drop;
+		aq->rq.xqe_drop = rq->red_drop;
+	}
+
+	if (cfg) {
+		if (rq->sso_ena) {
+			/* SSO mode */
+			aq->rq_mask.sso_ena = ~aq->rq_mask.sso_ena;
+			aq->rq_mask.sso_tt = ~aq->rq_mask.sso_tt;
+			aq->rq_mask.sso_grp = ~aq->rq_mask.sso_grp;
+			aq->rq_mask.ena_wqwd = ~aq->rq_mask.ena_wqwd;
+			aq->rq_mask.wqe_skip = ~aq->rq_mask.wqe_skip;
+			aq->rq_mask.wqe_caching = ~aq->rq_mask.wqe_caching;
+			aq->rq_mask.good_utag = ~aq->rq_mask.good_utag;
+			aq->rq_mask.bad_utag = ~aq->rq_mask.bad_utag;
+			aq->rq_mask.ltag = ~aq->rq_mask.ltag;
+			if (rq->vwqe_ena)
+				aq->rq_mask.wqe_aura = ~aq->rq_mask.wqe_aura;
+		} else {
+			/* CQ mode */
+			aq->rq_mask.sso_ena = ~aq->rq_mask.sso_ena;
+			aq->rq_mask.good_utag = ~aq->rq_mask.good_utag;
+			aq->rq_mask.bad_utag = ~aq->rq_mask.bad_utag;
+			aq->rq_mask.ltag = ~aq->rq_mask.ltag;
+			aq->rq_mask.cq = ~aq->rq_mask.cq;
+		}
+
+		if (rq->ipsech_ena)
+			aq->rq_mask.ipsech_ena = ~aq->rq_mask.ipsech_ena;
+
+		if (rq->spb_ena) {
+			aq->rq_mask.spb_aura = ~aq->rq_mask.spb_aura;
+			aq->rq_mask.spb_sizem1 = ~aq->rq_mask.spb_sizem1;
+			aq->rq_mask.spb_high_sizem1 =
+				~aq->rq_mask.spb_high_sizem1;
+		}
+
+		aq->rq_mask.spb_ena = ~aq->rq_mask.spb_ena;
+		aq->rq_mask.lpb_aura = ~aq->rq_mask.lpb_aura;
+		aq->rq_mask.first_skip = ~aq->rq_mask.first_skip;
+		aq->rq_mask.later_skip = ~aq->rq_mask.later_skip;
+		aq->rq_mask.flow_tagw = ~aq->rq_mask.flow_tagw;
+		aq->rq_mask.lpb_sizem1 = ~aq->rq_mask.lpb_sizem1;
+		aq->rq_mask.ena = ~aq->rq_mask.ena;
+		aq->rq_mask.pb_caching = ~aq->rq_mask.pb_caching;
+		aq->rq_mask.xqe_imm_size = ~aq->rq_mask.xqe_imm_size;
+		aq->rq_mask.rq_int_ena = ~aq->rq_mask.rq_int_ena;
+		aq->rq_mask.qint_idx = ~aq->rq_mask.qint_idx;
+		aq->rq_mask.xqe_drop_ena = ~aq->rq_mask.xqe_drop_ena;
+		aq->rq_mask.lpb_drop_ena = ~aq->rq_mask.lpb_drop_ena;
+		aq->rq_mask.spb_drop_ena = ~aq->rq_mask.spb_drop_ena;
+
+		if (rq->red_pass && (rq->red_pass >= rq->red_drop)) {
+			aq->rq_mask.spb_pool_pass = ~aq->rq_mask.spb_pool_pass;
+			aq->rq_mask.lpb_pool_pass = ~aq->rq_mask.lpb_pool_pass;
+			aq->rq_mask.wqe_pool_pass = ~aq->rq_mask.wqe_pool_pass;
+			aq->rq_mask.xqe_pass = ~aq->rq_mask.xqe_pass;
+
+			aq->rq_mask.spb_pool_drop = ~aq->rq_mask.spb_pool_drop;
+			aq->rq_mask.lpb_pool_drop = ~aq->rq_mask.lpb_pool_drop;
+			aq->rq_mask.wqe_pool_drop = ~aq->rq_mask.wqe_pool_drop;
+			aq->rq_mask.xqe_drop = ~aq->rq_mask.xqe_drop;
+		}
+	}
+
+	return 0;
+}
+
 int
 roc_nix_rq_init(struct roc_nix *roc_nix, struct roc_nix_rq *rq, bool ena)
 {
@@ -691,6 +915,8 @@ roc_nix_rq_init(struct roc_nix *roc_nix, struct roc_nix_rq *rq, bool ena)
 
 	if (is_cn9k)
 		rc = nix_rq_cn9k_cfg(dev, rq, nix->qints, false, ena);
+	else if (roc_model_is_cn10k())
+		rc = nix_rq_cn10k_cfg(dev, rq, nix->qints, false, ena);
 	else
 		rc = nix_rq_cfg(dev, rq, nix->qints, false, ena);
 
@@ -745,6 +971,8 @@ roc_nix_rq_modify(struct roc_nix *roc_nix, struct roc_nix_rq *rq, bool ena)
 	mbox = mbox_get(m_box);
 	if (is_cn9k)
 		rc = nix_rq_cn9k_cfg(dev, rq, nix->qints, true, ena);
+	else if (roc_model_is_cn10k())
+		rc = nix_rq_cn10k_cfg(dev, rq, nix->qints, true, ena);
 	else
 		rc = nix_rq_cfg(dev, rq, nix->qints, true, ena);
 
@@ -817,12 +1045,121 @@ roc_nix_rq_fini(struct roc_nix_rq *rq)
 	return 0;
 }
 
+static inline int
+roc_nix_cn20k_cq_init(struct roc_nix *roc_nix, struct roc_nix_cq *cq)
+{
+	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	struct mbox *mbox = (&nix->dev)->mbox;
+	volatile struct nix_cn20k_cq_ctx_s *cq_ctx;
+	uint16_t drop_thresh = NIX_CQ_THRESH_LEVEL;
+	uint16_t cpt_lbpid = nix->cpt_lbpid;
+	struct nix_cn20k_aq_enq_req *aq;
+	enum nix_q_size qsize;
+	size_t desc_sz;
+	int rc;
+
+	if (cq == NULL)
+		return NIX_ERR_PARAM;
+
+	qsize = nix_qsize_clampup(cq->nb_desc);
+	cq->nb_desc = nix_qsize_to_val(qsize);
+	cq->qmask = cq->nb_desc - 1;
+	cq->door = nix->base + NIX_LF_CQ_OP_DOOR;
+	cq->status = (int64_t *)(nix->base + NIX_LF_CQ_OP_STATUS);
+	cq->wdata = (uint64_t)cq->qid << 32;
+	cq->roc_nix = roc_nix;
+
+	/* CQE of W16 */
+	desc_sz = cq->nb_desc * NIX_CQ_ENTRY_SZ;
+	cq->desc_base = plt_zmalloc(desc_sz, NIX_CQ_ALIGN);
+	if (cq->desc_base == NULL) {
+		rc = NIX_ERR_NO_MEM;
+		goto fail;
+	}
+
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox_get(mbox));
+	if (!aq) {
+		mbox_put(mbox);
+		return -ENOSPC;
+	}
+
+	aq->qidx = cq->qid;
+	aq->ctype = NIX_AQ_CTYPE_CQ;
+	aq->op = NIX_AQ_INSTOP_INIT;
+	cq_ctx = &aq->cq;
+
+	cq_ctx->ena = 1;
+	cq_ctx->caching = 1;
+	cq_ctx->qsize = qsize;
+	cq_ctx->base = (uint64_t)cq->desc_base;
+	cq_ctx->avg_level = 0xff;
+	cq_ctx->cq_err_int_ena = BIT(NIX_CQERRINT_CQE_FAULT);
+	cq_ctx->cq_err_int_ena |= BIT(NIX_CQERRINT_DOOR_ERR);
+	if (roc_feature_nix_has_late_bp() && roc_nix_inl_inb_is_enabled(roc_nix)) {
+		cq_ctx->cq_err_int_ena |= BIT(NIX_CQERRINT_CPT_DROP);
+		cq_ctx->cpt_drop_err_en = 1;
+		/* Enable Late BP only when non zero CPT BPID */
+		if (cpt_lbpid) {
+			cq_ctx->lbp_ena = 1;
+			cq_ctx->lbpid_low = cpt_lbpid & 0x7;
+			cq_ctx->lbpid_med = (cpt_lbpid >> 3) & 0x7;
+			cq_ctx->lbpid_high = (cpt_lbpid >> 6) & 0x7;
+			cq_ctx->lbp_frac = NIX_CQ_LPB_THRESH_FRAC;
+		}
+		drop_thresh = NIX_CQ_SEC_THRESH_LEVEL;
+	}
+
+	/* Many to one reduction */
+	cq_ctx->qint_idx = cq->qid % nix->qints;
+	/* Map CQ0 [RQ0] to CINT0 and so on till max 64 irqs */
+	cq_ctx->cint_idx = cq->qid;
+
+	if (roc_errata_nix_has_cq_min_size_4k()) {
+		const float rx_cq_skid = NIX_CQ_FULL_ERRATA_SKID;
+		uint16_t min_rx_drop;
+
+		min_rx_drop = ceil(rx_cq_skid / (float)cq->nb_desc);
+		cq_ctx->drop = min_rx_drop;
+		cq_ctx->drop_ena = 1;
+		cq->drop_thresh = min_rx_drop;
+	} else {
+		cq->drop_thresh = drop_thresh;
+		/* Drop processing or red drop cannot be enabled due to
+		 * due to packets coming for second pass from CPT.
+		 */
+		if (!roc_nix_inl_inb_is_enabled(roc_nix)) {
+			cq_ctx->drop = cq->drop_thresh;
+			cq_ctx->drop_ena = 1;
+		}
+	}
+	cq_ctx->bp = cq->drop_thresh;
+
+	if (roc_feature_nix_has_cqe_stash()) {
+		if (cq_ctx->caching) {
+			cq_ctx->stashing = 1;
+			cq_ctx->stash_thresh = cq->stash_thresh;
+		}
+	}
+
+	rc = mbox_process(mbox);
+	mbox_put(mbox);
+	if (rc)
+		goto free_mem;
+
+	return nix_tel_node_add_cq(cq);
+
+free_mem:
+	plt_free(cq->desc_base);
+fail:
+	return rc;
+}
+
 int
 roc_nix_cq_init(struct roc_nix *roc_nix, struct roc_nix_cq *cq)
 {
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
 	struct mbox *mbox = (&nix->dev)->mbox;
-	volatile struct nix_cq_ctx_s *cq_ctx;
+	volatile struct nix_cq_ctx_s *cq_ctx = NULL;
 	uint16_t drop_thresh = NIX_CQ_THRESH_LEVEL;
 	uint16_t cpt_lbpid = nix->cpt_lbpid;
 	enum nix_q_size qsize;
@@ -832,6 +1169,9 @@ roc_nix_cq_init(struct roc_nix *roc_nix, struct roc_nix_cq *cq)
 	if (cq == NULL)
 		return NIX_ERR_PARAM;
 
+	if (roc_model_is_cn20k())
+		return roc_nix_cn20k_cq_init(roc_nix, cq);
+
 	qsize = nix_qsize_clampup(cq->nb_desc);
 	cq->nb_desc = nix_qsize_to_val(qsize);
 	cq->qmask = cq->nb_desc - 1;
@@ -861,7 +1201,7 @@ roc_nix_cq_init(struct roc_nix *roc_nix, struct roc_nix_cq *cq)
 		aq->ctype = NIX_AQ_CTYPE_CQ;
 		aq->op = NIX_AQ_INSTOP_INIT;
 		cq_ctx = &aq->cq;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox_get(mbox));
@@ -972,7 +1312,7 @@ roc_nix_cq_fini(struct roc_nix_cq *cq)
 		aq->cq.bp_ena = 0;
 		aq->cq_mask.ena = ~aq->cq_mask.ena;
 		aq->cq_mask.bp_ena = ~aq->cq_mask.bp_ena;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -981,6 +1321,26 @@ roc_nix_cq_fini(struct roc_nix_cq *cq)
 			return -ENOSPC;
 		}
 
+		aq->qidx = cq->qid;
+		aq->ctype = NIX_AQ_CTYPE_CQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		aq->cq.ena = 0;
+		aq->cq.bp_ena = 0;
+		aq->cq_mask.ena = ~aq->cq_mask.ena;
+		aq->cq_mask.bp_ena = ~aq->cq_mask.bp_ena;
+		if (roc_feature_nix_has_late_bp() && roc_nix_inl_inb_is_enabled(cq->roc_nix)) {
+			aq->cq.lbp_ena = 0;
+			aq->cq_mask.lbp_ena = ~aq->cq_mask.lbp_ena;
+		}
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			mbox_put(mbox);
+			return -ENOSPC;
+		}
+
 		aq->qidx = cq->qid;
 		aq->ctype = NIX_AQ_CTYPE_CQ;
 		aq->op = NIX_AQ_INSTOP_WRITE;
@@ -1227,14 +1587,152 @@ sq_cn9k_fini(struct nix *nix, struct roc_nix_sq *sq)
 	return 0;
 }
 
+static int
+sq_cn10k_init(struct nix *nix, struct roc_nix_sq *sq, uint32_t rr_quantum, uint16_t smq)
+{
+	struct roc_nix *roc_nix = nix_priv_to_roc_nix(nix);
+	struct mbox *mbox = (&nix->dev)->mbox;
+	struct nix_cn10k_aq_enq_req *aq;
+
+	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	if (!aq)
+		return -ENOSPC;
+
+	aq->qidx = sq->qid;
+	aq->ctype = NIX_AQ_CTYPE_SQ;
+	aq->op = NIX_AQ_INSTOP_INIT;
+	aq->sq.max_sqe_size = sq->max_sqe_sz;
+
+	aq->sq.max_sqe_size = sq->max_sqe_sz;
+	aq->sq.smq = smq;
+	aq->sq.smq_rr_weight = rr_quantum;
+	if (roc_nix_is_sdp(roc_nix))
+		aq->sq.default_chan = nix->tx_chan_base + (sq->qid % nix->tx_chan_cnt);
+	else
+		aq->sq.default_chan = nix->tx_chan_base;
+	aq->sq.sqe_stype = NIX_STYPE_STF;
+	aq->sq.ena = 1;
+	aq->sq.sso_ena = !!sq->sso_ena;
+	aq->sq.cq_ena = !!sq->cq_ena;
+	aq->sq.cq = sq->cqid;
+	aq->sq.cq_limit = sq->cq_drop_thresh;
+	if (aq->sq.max_sqe_size == NIX_MAXSQESZ_W8)
+		aq->sq.sqe_stype = NIX_STYPE_STP;
+	aq->sq.sqb_aura = roc_npa_aura_handle_to_aura(sq->aura_handle);
+	aq->sq.sq_int_ena = BIT(NIX_SQINT_LMT_ERR);
+	aq->sq.sq_int_ena |= BIT(NIX_SQINT_SQB_ALLOC_FAIL);
+	aq->sq.sq_int_ena |= BIT(NIX_SQINT_SEND_ERR);
+	aq->sq.sq_int_ena |= BIT(NIX_SQINT_MNQ_ERR);
+
+	/* Many to one reduction */
+	aq->sq.qint_idx = sq->qid % nix->qints;
+	if (roc_errata_nix_assign_incorrect_qint()) {
+		/* Assigning QINT 0 to all the SQs, an errata exists where NIXTX can
+		 * send incorrect QINT_IDX when reporting queue interrupt (QINT). This
+		 * might result in software missing the interrupt.
+		 */
+		aq->sq.qint_idx = 0;
+	}
+	return 0;
+}
+
+static int
+sq_cn10k_fini(struct nix *nix, struct roc_nix_sq *sq)
+{
+	struct mbox *mbox = mbox_get((&nix->dev)->mbox);
+	struct nix_cn10k_aq_enq_rsp *rsp;
+	struct nix_cn10k_aq_enq_req *aq;
+	uint16_t sqes_per_sqb;
+	void *sqb_buf;
+	int rc, count;
+
+	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	if (!aq) {
+		mbox_put(mbox);
+		return -ENOSPC;
+	}
+
+	aq->qidx = sq->qid;
+	aq->ctype = NIX_AQ_CTYPE_SQ;
+	aq->op = NIX_AQ_INSTOP_READ;
+	rc = mbox_process_msg(mbox, (void *)&rsp);
+	if (rc) {
+		mbox_put(mbox);
+		return rc;
+	}
+
+	/* Check if sq is already cleaned up */
+	if (!rsp->sq.ena) {
+		mbox_put(mbox);
+		return 0;
+	}
+
+	/* Disable sq */
+	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	if (!aq) {
+		mbox_put(mbox);
+		return -ENOSPC;
+	}
+
+	aq->qidx = sq->qid;
+	aq->ctype = NIX_AQ_CTYPE_SQ;
+	aq->op = NIX_AQ_INSTOP_WRITE;
+	aq->sq_mask.ena = ~aq->sq_mask.ena;
+	aq->sq.ena = 0;
+	rc = mbox_process(mbox);
+	if (rc) {
+		mbox_put(mbox);
+		return rc;
+	}
+
+	/* Read SQ and free sqb's */
+	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	if (!aq) {
+		mbox_put(mbox);
+		return -ENOSPC;
+	}
+
+	aq->qidx = sq->qid;
+	aq->ctype = NIX_AQ_CTYPE_SQ;
+	aq->op = NIX_AQ_INSTOP_READ;
+	rc = mbox_process_msg(mbox, (void *)&rsp);
+	if (rc) {
+		mbox_put(mbox);
+		return rc;
+	}
+
+	if (aq->sq.smq_pend)
+		plt_err("SQ has pending SQE's");
+
+	count = aq->sq.sqb_count;
+	sqes_per_sqb = 1 << sq->sqes_per_sqb_log2;
+	/* Free SQB's that are used */
+	sqb_buf = (void *)rsp->sq.head_sqb;
+	while (count) {
+		void *next_sqb;
+
+		next_sqb = *(void **)((uint64_t *)sqb_buf +
+				      (uint32_t)((sqes_per_sqb - 1) * (0x2 >> sq->max_sqe_sz) * 8));
+		roc_npa_aura_op_free(sq->aura_handle, 1, (uint64_t)sqb_buf);
+		sqb_buf = next_sqb;
+		count--;
+	}
+
+	/* Free next to use sqb */
+	if (rsp->sq.next_sqb)
+		roc_npa_aura_op_free(sq->aura_handle, 1, rsp->sq.next_sqb);
+	mbox_put(mbox);
+	return 0;
+}
+
 static int
 sq_init(struct nix *nix, struct roc_nix_sq *sq, uint32_t rr_quantum, uint16_t smq)
 {
 	struct roc_nix *roc_nix = nix_priv_to_roc_nix(nix);
 	struct mbox *mbox = (&nix->dev)->mbox;
-	struct nix_cn10k_aq_enq_req *aq;
+	struct nix_cn20k_aq_enq_req *aq;
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
 	if (!aq)
 		return -ENOSPC;
 
@@ -1280,13 +1778,13 @@ static int
 sq_fini(struct nix *nix, struct roc_nix_sq *sq)
 {
 	struct mbox *mbox = mbox_get((&nix->dev)->mbox);
-	struct nix_cn10k_aq_enq_rsp *rsp;
-	struct nix_cn10k_aq_enq_req *aq;
+	struct nix_cn20k_aq_enq_rsp *rsp;
+	struct nix_cn20k_aq_enq_req *aq;
 	uint16_t sqes_per_sqb;
 	void *sqb_buf;
 	int rc, count;
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
 	if (!aq) {
 		mbox_put(mbox);
 		return -ENOSPC;
@@ -1308,7 +1806,7 @@ sq_fini(struct nix *nix, struct roc_nix_sq *sq)
 	}
 
 	/* Disable sq */
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
 	if (!aq) {
 		mbox_put(mbox);
 		return -ENOSPC;
@@ -1326,7 +1824,7 @@ sq_fini(struct nix *nix, struct roc_nix_sq *sq)
 	}
 
 	/* Read SQ and free sqb's */
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
 	if (!aq) {
 		mbox_put(mbox);
 		return -ENOSPC;
@@ -1408,6 +1906,8 @@ roc_nix_sq_init(struct roc_nix *roc_nix, struct roc_nix_sq *sq)
 	/* Init SQ context */
 	if (roc_model_is_cn9k())
 		rc = sq_cn9k_init(nix, sq, rr_quantum, smq);
+	else if (roc_model_is_cn10k())
+		rc = sq_cn10k_init(nix, sq, rr_quantum, smq);
 	else
 		rc = sq_init(nix, sq, rr_quantum, smq);
 
@@ -1464,6 +1964,8 @@ roc_nix_sq_fini(struct roc_nix_sq *sq)
 	/* Release SQ context */
 	if (roc_model_is_cn9k())
 		rc |= sq_cn9k_fini(roc_nix_to_nix_priv(sq->roc_nix), sq);
+	else if (roc_model_is_cn10k())
+		rc |= sq_cn10k_fini(roc_nix_to_nix_priv(sq->roc_nix), sq);
 	else
 		rc |= sq_fini(roc_nix_to_nix_priv(sq->roc_nix), sq);
 
diff --git a/drivers/common/cnxk/roc_nix_stats.c b/drivers/common/cnxk/roc_nix_stats.c
index 7a9619b39d..6f241c72de 100644
--- a/drivers/common/cnxk/roc_nix_stats.c
+++ b/drivers/common/cnxk/roc_nix_stats.c
@@ -173,7 +173,7 @@ nix_stat_rx_queue_reset(struct nix *nix, uint16_t qid)
 		aq->rq_mask.drop_octs = ~(aq->rq_mask.drop_octs);
 		aq->rq_mask.drop_pkts = ~(aq->rq_mask.drop_pkts);
 		aq->rq_mask.re_pkts = ~(aq->rq_mask.re_pkts);
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -192,6 +192,30 @@ nix_stat_rx_queue_reset(struct nix *nix, uint16_t qid)
 		aq->rq.drop_pkts = 0;
 		aq->rq.re_pkts = 0;
 
+		aq->rq_mask.octs = ~(aq->rq_mask.octs);
+		aq->rq_mask.pkts = ~(aq->rq_mask.pkts);
+		aq->rq_mask.drop_octs = ~(aq->rq_mask.drop_octs);
+		aq->rq_mask.drop_pkts = ~(aq->rq_mask.drop_pkts);
+		aq->rq_mask.re_pkts = ~(aq->rq_mask.re_pkts);
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = qid;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+
+		aq->rq.octs = 0;
+		aq->rq.pkts = 0;
+		aq->rq.drop_octs = 0;
+		aq->rq.drop_pkts = 0;
+		aq->rq.re_pkts = 0;
+
 		aq->rq_mask.octs = ~(aq->rq_mask.octs);
 		aq->rq_mask.pkts = ~(aq->rq_mask.pkts);
 		aq->rq_mask.drop_octs = ~(aq->rq_mask.drop_octs);
@@ -233,7 +257,7 @@ nix_stat_tx_queue_reset(struct nix *nix, uint16_t qid)
 		aq->sq_mask.pkts = ~(aq->sq_mask.pkts);
 		aq->sq_mask.drop_octs = ~(aq->sq_mask.drop_octs);
 		aq->sq_mask.drop_pkts = ~(aq->sq_mask.drop_pkts);
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -250,6 +274,29 @@ nix_stat_tx_queue_reset(struct nix *nix, uint16_t qid)
 		aq->sq.drop_octs = 0;
 		aq->sq.drop_pkts = 0;
 
+		aq->sq_mask.octs = ~(aq->sq_mask.octs);
+		aq->sq_mask.pkts = ~(aq->sq_mask.pkts);
+		aq->sq_mask.drop_octs = ~(aq->sq_mask.drop_octs);
+		aq->sq_mask.drop_pkts = ~(aq->sq_mask.drop_pkts);
+		aq->sq_mask.aged_drop_octs = ~(aq->sq_mask.aged_drop_octs);
+		aq->sq_mask.aged_drop_pkts = ~(aq->sq_mask.aged_drop_pkts);
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = qid;
+		aq->ctype = NIX_AQ_CTYPE_SQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		aq->sq.octs = 0;
+		aq->sq.pkts = 0;
+		aq->sq.drop_octs = 0;
+		aq->sq.drop_pkts = 0;
+
 		aq->sq_mask.octs = ~(aq->sq_mask.octs);
 		aq->sq_mask.pkts = ~(aq->sq_mask.pkts);
 		aq->sq_mask.drop_octs = ~(aq->sq_mask.drop_octs);
@@ -375,7 +422,7 @@ roc_nix_xstats_get(struct roc_nix *roc_nix, struct roc_nix_xstat *xstats,
 	xstats[count].id = count;
 	count++;
 
-	if (roc_model_is_cn10k()) {
+	if (roc_model_is_cn10k() || roc_model_is_cn20k()) {
 		for (i = 0; i < CNXK_NIX_NUM_CN10K_RX_XSTATS; i++) {
 			xstats[count].value =
 				NIX_RX_STATS(nix_cn10k_rx_xstats[i].offset);
@@ -492,7 +539,7 @@ roc_nix_xstats_names_get(struct roc_nix *roc_nix,
 		count++;
 	}
 
-	if (roc_model_is_cn10k()) {
+	if (roc_model_is_cn10k() || roc_model_is_cn20k()) {
 		for (i = 0; i < CNXK_NIX_NUM_CN10K_RX_XSTATS; i++) {
 			NIX_XSTATS_NAME_PRINT(xstats_names, count,
 					      nix_cn10k_rx_xstats, i);
diff --git a/drivers/common/cnxk/roc_nix_tm.c b/drivers/common/cnxk/roc_nix_tm.c
index ac522f8235..5725ef568a 100644
--- a/drivers/common/cnxk/roc_nix_tm.c
+++ b/drivers/common/cnxk/roc_nix_tm.c
@@ -1058,7 +1058,7 @@ nix_tm_sq_sched_conf(struct nix *nix, struct nix_tm_node *node,
 		}
 		aq->sq.smq_rr_quantum = rr_quantum;
 		aq->sq_mask.smq_rr_quantum = ~aq->sq_mask.smq_rr_quantum;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -1071,6 +1071,26 @@ nix_tm_sq_sched_conf(struct nix *nix, struct nix_tm_node *node,
 		aq->ctype = NIX_AQ_CTYPE_SQ;
 		aq->op = NIX_AQ_INSTOP_WRITE;
 
+		/* smq update only when needed */
+		if (!rr_quantum_only) {
+			aq->sq.smq = smq;
+			aq->sq_mask.smq = ~aq->sq_mask.smq;
+		}
+		aq->sq.smq_rr_weight = rr_quantum;
+		aq->sq_mask.smq_rr_weight = ~aq->sq_mask.smq_rr_weight;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = qid;
+		aq->ctype = NIX_AQ_CTYPE_SQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+
 		/* smq update only when needed */
 		if (!rr_quantum_only) {
 			aq->sq.smq = smq;
diff --git a/drivers/common/cnxk/roc_nix_tm_ops.c b/drivers/common/cnxk/roc_nix_tm_ops.c
index 8144675f89..a9cfadd1b0 100644
--- a/drivers/common/cnxk/roc_nix_tm_ops.c
+++ b/drivers/common/cnxk/roc_nix_tm_ops.c
@@ -1294,15 +1294,19 @@ roc_nix_tm_rsrc_max(bool pf, uint16_t schq[ROC_TM_LVL_MAX])
 
 		switch (hw_lvl) {
 		case NIX_TXSCH_LVL_SMQ:
-			max = (roc_model_is_cn9k() ?
-					     NIX_CN9K_TXSCH_LVL_SMQ_MAX :
-					     NIX_TXSCH_LVL_SMQ_MAX);
+			max = (roc_model_is_cn9k() ? NIX_CN9K_TXSCH_LVL_SMQ_MAX :
+				(roc_model_is_cn10k() ? NIX_CN10K_TXSCH_LVL_SMQ_MAX :
+				 NIX_TXSCH_LVL_SMQ_MAX));
 			break;
 		case NIX_TXSCH_LVL_TL4:
-			max = NIX_TXSCH_LVL_TL4_MAX;
+			max = (roc_model_is_cn9k() ? NIX_CN9K_TXSCH_LVL_TL4_MAX :
+				(roc_model_is_cn10k() ? NIX_CN10K_TXSCH_LVL_TL4_MAX :
+							NIX_TXSCH_LVL_TL4_MAX));
 			break;
 		case NIX_TXSCH_LVL_TL3:
-			max = NIX_TXSCH_LVL_TL3_MAX;
+			max = (roc_model_is_cn9k() ? NIX_CN9K_TXSCH_LVL_TL3_MAX :
+				(roc_model_is_cn10k() ? NIX_CN10K_TXSCH_LVL_TL3_MAX :
+							NIX_TXSCH_LVL_TL3_MAX));
 			break;
 		case NIX_TXSCH_LVL_TL2:
 			max = pf ? NIX_TXSCH_LVL_TL2_MAX : 1;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 15/33] common/cnxk: support bandwidth profile for cn20k
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (13 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 14/33] common/cnxk: support NIX queue config for cn20k Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 16/33] common/cnxk: support NIX debug " Nithin Dabilpuram
                   ` (20 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

From: Satha Rao <skoteshwar@marvell.com>

Add support to setup bandwidth profile config for cn20k
for Rx policier.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
---
 drivers/common/cnxk/roc_nix_bpf.c   | 528 ++++++++++++++++++----------
 drivers/common/cnxk/roc_nix_queue.c | 136 ++++---
 2 files changed, 425 insertions(+), 239 deletions(-)

diff --git a/drivers/common/cnxk/roc_nix_bpf.c b/drivers/common/cnxk/roc_nix_bpf.c
index d60396289b..98c9855a5b 100644
--- a/drivers/common/cnxk/roc_nix_bpf.c
+++ b/drivers/common/cnxk/roc_nix_bpf.c
@@ -547,9 +547,9 @@ roc_nix_bpf_config(struct roc_nix *roc_nix, uint16_t id,
 {
 	uint64_t exponent_p = 0, mantissa_p = 0, div_exp_p = 0;
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	volatile struct nix_band_prof_s *prof, *prof_mask;
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = dev->mbox;
-	struct nix_cn10k_aq_enq_req *aq;
 	uint32_t policer_timeunit;
 	uint8_t level_idx;
 	int rc;
@@ -568,103 +568,122 @@ roc_nix_bpf_config(struct roc_nix *roc_nix, uint16_t id,
 	if (level_idx == ROC_NIX_BPF_LEVEL_IDX_INVALID)
 		return NIX_ERR_PARAM;
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox_get(mbox));
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox_get(mbox));
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | id;
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		prof = &aq->prof;
+		prof_mask = &aq->prof_mask;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox_get(mbox));
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | id;
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		prof = &aq->prof;
+		prof_mask = &aq->prof_mask;
 	}
-	aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | id;
-	aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
-	aq->op = NIX_AQ_INSTOP_WRITE;
 
-	aq->prof.adjust_exponent = NIX_BPF_DEFAULT_ADJUST_EXPONENT;
-	aq->prof.adjust_mantissa = NIX_BPF_DEFAULT_ADJUST_MANTISSA;
+	prof->adjust_exponent = NIX_BPF_DEFAULT_ADJUST_EXPONENT;
+	prof->adjust_mantissa = NIX_BPF_DEFAULT_ADJUST_MANTISSA;
 	if (cfg->lmode == ROC_NIX_BPF_LMODE_BYTE)
-		aq->prof.adjust_mantissa = NIX_BPF_DEFAULT_ADJUST_MANTISSA / 2;
+		prof->adjust_mantissa = NIX_BPF_DEFAULT_ADJUST_MANTISSA / 2;
 
-	aq->prof_mask.adjust_exponent = ~(aq->prof_mask.adjust_exponent);
-	aq->prof_mask.adjust_mantissa = ~(aq->prof_mask.adjust_mantissa);
+	prof_mask->adjust_exponent = ~(prof_mask->adjust_exponent);
+	prof_mask->adjust_mantissa = ~(prof_mask->adjust_mantissa);
 
 	switch (cfg->alg) {
 	case ROC_NIX_BPF_ALGO_2697:
 		meter_rate_to_nix(cfg->algo2697.cir, &exponent_p, &mantissa_p,
 				  &div_exp_p, policer_timeunit);
-		aq->prof.cir_mantissa = mantissa_p;
-		aq->prof.cir_exponent = exponent_p;
+		prof->cir_mantissa = mantissa_p;
+		prof->cir_exponent = exponent_p;
 
 		meter_burst_to_nix(cfg->algo2697.cbs, &exponent_p, &mantissa_p);
-		aq->prof.cbs_mantissa = mantissa_p;
-		aq->prof.cbs_exponent = exponent_p;
+		prof->cbs_mantissa = mantissa_p;
+		prof->cbs_exponent = exponent_p;
 
 		meter_burst_to_nix(cfg->algo2697.ebs, &exponent_p, &mantissa_p);
-		aq->prof.pebs_mantissa = mantissa_p;
-		aq->prof.pebs_exponent = exponent_p;
+		prof->pebs_mantissa = mantissa_p;
+		prof->pebs_exponent = exponent_p;
 
-		aq->prof_mask.cir_mantissa = ~(aq->prof_mask.cir_mantissa);
-		aq->prof_mask.cbs_mantissa = ~(aq->prof_mask.cbs_mantissa);
-		aq->prof_mask.pebs_mantissa = ~(aq->prof_mask.pebs_mantissa);
-		aq->prof_mask.cir_exponent = ~(aq->prof_mask.cir_exponent);
-		aq->prof_mask.cbs_exponent = ~(aq->prof_mask.cbs_exponent);
-		aq->prof_mask.pebs_exponent = ~(aq->prof_mask.pebs_exponent);
+		prof_mask->cir_mantissa = ~(prof_mask->cir_mantissa);
+		prof_mask->cbs_mantissa = ~(prof_mask->cbs_mantissa);
+		prof_mask->pebs_mantissa = ~(prof_mask->pebs_mantissa);
+		prof_mask->cir_exponent = ~(prof_mask->cir_exponent);
+		prof_mask->cbs_exponent = ~(prof_mask->cbs_exponent);
+		prof_mask->pebs_exponent = ~(prof_mask->pebs_exponent);
 		break;
 
 	case ROC_NIX_BPF_ALGO_2698:
 		meter_rate_to_nix(cfg->algo2698.cir, &exponent_p, &mantissa_p,
 				  &div_exp_p, policer_timeunit);
-		aq->prof.cir_mantissa = mantissa_p;
-		aq->prof.cir_exponent = exponent_p;
+		prof->cir_mantissa = mantissa_p;
+		prof->cir_exponent = exponent_p;
 
 		meter_rate_to_nix(cfg->algo2698.pir, &exponent_p, &mantissa_p,
 				  &div_exp_p, policer_timeunit);
-		aq->prof.peir_mantissa = mantissa_p;
-		aq->prof.peir_exponent = exponent_p;
+		prof->peir_mantissa = mantissa_p;
+		prof->peir_exponent = exponent_p;
 
 		meter_burst_to_nix(cfg->algo2698.cbs, &exponent_p, &mantissa_p);
-		aq->prof.cbs_mantissa = mantissa_p;
-		aq->prof.cbs_exponent = exponent_p;
+		prof->cbs_mantissa = mantissa_p;
+		prof->cbs_exponent = exponent_p;
 
 		meter_burst_to_nix(cfg->algo2698.pbs, &exponent_p, &mantissa_p);
-		aq->prof.pebs_mantissa = mantissa_p;
-		aq->prof.pebs_exponent = exponent_p;
+		prof->pebs_mantissa = mantissa_p;
+		prof->pebs_exponent = exponent_p;
 
-		aq->prof_mask.cir_mantissa = ~(aq->prof_mask.cir_mantissa);
-		aq->prof_mask.peir_mantissa = ~(aq->prof_mask.peir_mantissa);
-		aq->prof_mask.cbs_mantissa = ~(aq->prof_mask.cbs_mantissa);
-		aq->prof_mask.pebs_mantissa = ~(aq->prof_mask.pebs_mantissa);
-		aq->prof_mask.cir_exponent = ~(aq->prof_mask.cir_exponent);
-		aq->prof_mask.peir_exponent = ~(aq->prof_mask.peir_exponent);
-		aq->prof_mask.cbs_exponent = ~(aq->prof_mask.cbs_exponent);
-		aq->prof_mask.pebs_exponent = ~(aq->prof_mask.pebs_exponent);
+		prof_mask->cir_mantissa = ~(prof_mask->cir_mantissa);
+		prof_mask->peir_mantissa = ~(prof_mask->peir_mantissa);
+		prof_mask->cbs_mantissa = ~(prof_mask->cbs_mantissa);
+		prof_mask->pebs_mantissa = ~(prof_mask->pebs_mantissa);
+		prof_mask->cir_exponent = ~(prof_mask->cir_exponent);
+		prof_mask->peir_exponent = ~(prof_mask->peir_exponent);
+		prof_mask->cbs_exponent = ~(prof_mask->cbs_exponent);
+		prof_mask->pebs_exponent = ~(prof_mask->pebs_exponent);
 		break;
 
 	case ROC_NIX_BPF_ALGO_4115:
 		meter_rate_to_nix(cfg->algo4115.cir, &exponent_p, &mantissa_p,
 				  &div_exp_p, policer_timeunit);
-		aq->prof.cir_mantissa = mantissa_p;
-		aq->prof.cir_exponent = exponent_p;
+		prof->cir_mantissa = mantissa_p;
+		prof->cir_exponent = exponent_p;
 
 		meter_rate_to_nix(cfg->algo4115.eir, &exponent_p, &mantissa_p,
 				  &div_exp_p, policer_timeunit);
-		aq->prof.peir_mantissa = mantissa_p;
-		aq->prof.peir_exponent = exponent_p;
+		prof->peir_mantissa = mantissa_p;
+		prof->peir_exponent = exponent_p;
 
 		meter_burst_to_nix(cfg->algo4115.cbs, &exponent_p, &mantissa_p);
-		aq->prof.cbs_mantissa = mantissa_p;
-		aq->prof.cbs_exponent = exponent_p;
+		prof->cbs_mantissa = mantissa_p;
+		prof->cbs_exponent = exponent_p;
 
 		meter_burst_to_nix(cfg->algo4115.ebs, &exponent_p, &mantissa_p);
-		aq->prof.pebs_mantissa = mantissa_p;
-		aq->prof.pebs_exponent = exponent_p;
+		prof->pebs_mantissa = mantissa_p;
+		prof->pebs_exponent = exponent_p;
 
-		aq->prof_mask.cir_mantissa = ~(aq->prof_mask.cir_mantissa);
-		aq->prof_mask.peir_mantissa = ~(aq->prof_mask.peir_mantissa);
-		aq->prof_mask.cbs_mantissa = ~(aq->prof_mask.cbs_mantissa);
-		aq->prof_mask.pebs_mantissa = ~(aq->prof_mask.pebs_mantissa);
+		prof_mask->cir_mantissa = ~(prof_mask->cir_mantissa);
+		prof_mask->peir_mantissa = ~(prof_mask->peir_mantissa);
+		prof_mask->cbs_mantissa = ~(prof_mask->cbs_mantissa);
+		prof_mask->pebs_mantissa = ~(prof_mask->pebs_mantissa);
 
-		aq->prof_mask.cir_exponent = ~(aq->prof_mask.cir_exponent);
-		aq->prof_mask.peir_exponent = ~(aq->prof_mask.peir_exponent);
-		aq->prof_mask.cbs_exponent = ~(aq->prof_mask.cbs_exponent);
-		aq->prof_mask.pebs_exponent = ~(aq->prof_mask.pebs_exponent);
+		prof_mask->cir_exponent = ~(prof_mask->cir_exponent);
+		prof_mask->peir_exponent = ~(prof_mask->peir_exponent);
+		prof_mask->cbs_exponent = ~(prof_mask->cbs_exponent);
+		prof_mask->pebs_exponent = ~(prof_mask->pebs_exponent);
 		break;
 
 	default:
@@ -672,23 +691,23 @@ roc_nix_bpf_config(struct roc_nix *roc_nix, uint16_t id,
 		goto exit;
 	}
 
-	aq->prof.lmode = cfg->lmode;
-	aq->prof.icolor = cfg->icolor;
-	aq->prof.meter_algo = cfg->alg;
-	aq->prof.pc_mode = cfg->pc_mode;
-	aq->prof.tnl_ena = cfg->tnl_ena;
-	aq->prof.gc_action = cfg->action[ROC_NIX_BPF_COLOR_GREEN];
-	aq->prof.yc_action = cfg->action[ROC_NIX_BPF_COLOR_YELLOW];
-	aq->prof.rc_action = cfg->action[ROC_NIX_BPF_COLOR_RED];
+	prof->lmode = cfg->lmode;
+	prof->icolor = cfg->icolor;
+	prof->meter_algo = cfg->alg;
+	prof->pc_mode = cfg->pc_mode;
+	prof->tnl_ena = cfg->tnl_ena;
+	prof->gc_action = cfg->action[ROC_NIX_BPF_COLOR_GREEN];
+	prof->yc_action = cfg->action[ROC_NIX_BPF_COLOR_YELLOW];
+	prof->rc_action = cfg->action[ROC_NIX_BPF_COLOR_RED];
 
-	aq->prof_mask.lmode = ~(aq->prof_mask.lmode);
-	aq->prof_mask.icolor = ~(aq->prof_mask.icolor);
-	aq->prof_mask.meter_algo = ~(aq->prof_mask.meter_algo);
-	aq->prof_mask.pc_mode = ~(aq->prof_mask.pc_mode);
-	aq->prof_mask.tnl_ena = ~(aq->prof_mask.tnl_ena);
-	aq->prof_mask.gc_action = ~(aq->prof_mask.gc_action);
-	aq->prof_mask.yc_action = ~(aq->prof_mask.yc_action);
-	aq->prof_mask.rc_action = ~(aq->prof_mask.rc_action);
+	prof_mask->lmode = ~(prof_mask->lmode);
+	prof_mask->icolor = ~(prof_mask->icolor);
+	prof_mask->meter_algo = ~(prof_mask->meter_algo);
+	prof_mask->pc_mode = ~(prof_mask->pc_mode);
+	prof_mask->tnl_ena = ~(prof_mask->tnl_ena);
+	prof_mask->gc_action = ~(prof_mask->gc_action);
+	prof_mask->yc_action = ~(prof_mask->yc_action);
+	prof_mask->rc_action = ~(prof_mask->rc_action);
 
 	rc = mbox_process(mbox);
 exit:
@@ -703,7 +722,6 @@ roc_nix_bpf_ena_dis(struct roc_nix *roc_nix, uint16_t id, struct roc_nix_rq *rq,
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = mbox_get(dev->mbox);
-	struct nix_cn10k_aq_enq_req *aq;
 	int rc;
 
 	if (roc_model_is_cn9k()) {
@@ -716,25 +734,53 @@ roc_nix_bpf_ena_dis(struct roc_nix *roc_nix, uint16_t id, struct roc_nix_rq *rq,
 		goto exit;
 	}
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
-	}
-	aq->qidx = rq->qid;
-	aq->ctype = NIX_AQ_CTYPE_RQ;
-	aq->op = NIX_AQ_INSTOP_WRITE;
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_req *aq;
 
-	aq->rq.policer_ena = enable;
-	aq->rq_mask.policer_ena = ~(aq->rq_mask.policer_ena);
-	if (enable) {
-		aq->rq.band_prof_id = id;
-		aq->rq_mask.band_prof_id = ~(aq->rq_mask.band_prof_id);
-	}
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = rq->qid;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+
+		aq->rq.policer_ena = enable;
+		aq->rq_mask.policer_ena = ~(aq->rq_mask.policer_ena);
+		if (enable) {
+			aq->rq.band_prof_id = id;
+			aq->rq_mask.band_prof_id = ~(aq->rq_mask.band_prof_id);
+		}
+
+		rc = mbox_process(mbox);
+		if (rc)
+			goto exit;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
 
-	rc = mbox_process(mbox);
-	if (rc)
-		goto exit;
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = rq->qid;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+
+		aq->rq.policer_ena = enable;
+		aq->rq_mask.policer_ena = ~(aq->rq_mask.policer_ena);
+		if (enable) {
+			aq->rq.band_prof_id_l = id & 0x3FF;
+			aq->rq.band_prof_id_h = (id >> 10) & 0xF;
+			aq->rq_mask.band_prof_id_l = ~(aq->rq_mask.band_prof_id_l);
+			aq->rq_mask.band_prof_id_h = ~(aq->rq_mask.band_prof_id_h);
+		}
+
+		rc = mbox_process(mbox);
+		if (rc)
+			goto exit;
+	}
 
 	rq->bpf_id = id;
 
@@ -750,8 +796,7 @@ roc_nix_bpf_dump(struct roc_nix *roc_nix, uint16_t id,
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = mbox_get(dev->mbox);
-	struct nix_cn10k_aq_enq_rsp *rsp;
-	struct nix_cn10k_aq_enq_req *aq;
+	volatile struct nix_band_prof_s *prof;
 	uint8_t level_idx;
 	int rc;
 
@@ -765,19 +810,42 @@ roc_nix_bpf_dump(struct roc_nix *roc_nix, uint16_t id,
 		rc = NIX_ERR_PARAM;
 		goto exit;
 	}
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_rsp *rsp;
+		struct nix_cn10k_aq_enq_req *aq;
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_READ;
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc)
+			goto exit;
+		prof = &rsp->prof;
+	} else {
+		struct nix_cn20k_aq_enq_rsp *rsp;
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_READ;
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc)
+			goto exit;
+		prof = &rsp->prof;
 	}
-	aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
-	aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
-	aq->op = NIX_AQ_INSTOP_READ;
-	rc = mbox_process_msg(mbox, (void *)&rsp);
 	if (!rc) {
 		plt_dump("============= band prof id =%d ===============", id);
-		nix_lf_bpf_dump(&rsp->prof);
+		nix_lf_bpf_dump(prof);
 	}
 exit:
 	mbox_put(mbox);
@@ -792,7 +860,6 @@ roc_nix_bpf_pre_color_tbl_setup(struct roc_nix *roc_nix, uint16_t id,
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = dev->mbox;
-	struct nix_cn10k_aq_enq_req *aq;
 	uint8_t pc_mode, tn_ena;
 	uint8_t level_idx;
 	int rc;
@@ -856,21 +923,43 @@ roc_nix_bpf_pre_color_tbl_setup(struct roc_nix *roc_nix, uint16_t id,
 		goto exit;
 	}
 
-	/* Update corresponding bandwidth profile too */
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox_get(mbox));
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_req *aq;
+
+		/* Update corresponding bandwidth profile too */
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox_get(mbox));
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | id;
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		aq->prof.pc_mode = pc_mode;
+		aq->prof.tnl_ena = tn_ena;
+		aq->prof_mask.pc_mode = ~(aq->prof_mask.pc_mode);
+		aq->prof_mask.tnl_ena = ~(aq->prof_mask.tnl_ena);
+
+		rc = mbox_process(mbox);
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		/* Update corresponding bandwidth profile too */
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox_get(mbox));
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | id;
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		aq->prof.pc_mode = pc_mode;
+		aq->prof.tnl_ena = tn_ena;
+		aq->prof_mask.pc_mode = ~(aq->prof_mask.pc_mode);
+		aq->prof_mask.tnl_ena = ~(aq->prof_mask.tnl_ena);
+
+		rc = mbox_process(mbox);
 	}
-	aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | id;
-	aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
-	aq->op = NIX_AQ_INSTOP_WRITE;
-	aq->prof.pc_mode = pc_mode;
-	aq->prof.tnl_ena = tn_ena;
-	aq->prof_mask.pc_mode = ~(aq->prof_mask.pc_mode);
-	aq->prof_mask.tnl_ena = ~(aq->prof_mask.tnl_ena);
-
-	rc = mbox_process(mbox);
 
 exit:
 	mbox_put(mbox);
@@ -883,9 +972,9 @@ roc_nix_bpf_connect(struct roc_nix *roc_nix,
 		    uint16_t dst_id)
 {
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	volatile struct nix_band_prof_s *prof, *prof_mask;
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = mbox_get(dev->mbox);
-	struct nix_cn10k_aq_enq_req *aq;
 	uint8_t level_idx;
 	int rc;
 
@@ -900,23 +989,42 @@ roc_nix_bpf_connect(struct roc_nix *roc_nix,
 		goto exit;
 	}
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | src_id;
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		prof = &aq->prof;
+		prof_mask = &aq->prof_mask;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | src_id;
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		prof = &aq->prof;
+		prof_mask = &aq->prof_mask;
 	}
-	aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | src_id;
-	aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
-	aq->op = NIX_AQ_INSTOP_WRITE;
 
 	if (dst_id == ROC_NIX_BPF_ID_INVALID) {
-		aq->prof.hl_en = false;
-		aq->prof_mask.hl_en = ~(aq->prof_mask.hl_en);
+		prof->hl_en = false;
+		prof_mask->hl_en = ~(prof_mask->hl_en);
 	} else {
-		aq->prof.hl_en = true;
-		aq->prof.band_prof_id = dst_id;
-		aq->prof_mask.hl_en = ~(aq->prof_mask.hl_en);
-		aq->prof_mask.band_prof_id = ~(aq->prof_mask.band_prof_id);
+		prof->hl_en = true;
+		prof->band_prof_id = dst_id;
+		prof_mask->hl_en = ~(prof_mask->hl_en);
+		prof_mask->band_prof_id = ~(prof_mask->band_prof_id);
 	}
 
 	rc = mbox_process(mbox);
@@ -937,8 +1045,7 @@ roc_nix_bpf_stats_read(struct roc_nix *roc_nix, uint16_t id, uint64_t mask,
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = mbox_get(dev->mbox);
-	struct nix_cn10k_aq_enq_rsp *rsp;
-	struct nix_cn10k_aq_enq_req *aq;
+	volatile struct nix_band_prof_s *prof;
 	uint8_t level_idx;
 	int rc;
 
@@ -953,17 +1060,39 @@ roc_nix_bpf_stats_read(struct roc_nix *roc_nix, uint16_t id, uint64_t mask,
 		goto exit;
 	}
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_rsp *rsp;
+		struct nix_cn10k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_READ;
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc)
+			goto exit;
+		prof = &rsp->prof;
+	} else {
+		struct nix_cn20k_aq_enq_rsp *rsp;
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_READ;
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc)
+			goto exit;
+		prof = &rsp->prof;
 	}
-	aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
-	aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
-	aq->op = NIX_AQ_INSTOP_READ;
-	rc = mbox_process_msg(mbox, (void *)&rsp);
-	if (rc)
-		goto exit;
 
 	green_pkt_pass =
 		roc_nix_bpf_stats_to_idx(mask & ROC_NIX_BPF_GREEN_PKT_F_PASS);
@@ -991,40 +1120,40 @@ roc_nix_bpf_stats_read(struct roc_nix *roc_nix, uint16_t id, uint64_t mask,
 		roc_nix_bpf_stats_to_idx(mask & ROC_NIX_BPF_RED_OCTS_F_DROP);
 
 	if (green_pkt_pass != ROC_NIX_BPF_STATS_MAX)
-		stats[green_pkt_pass] = rsp->prof.green_pkt_pass;
+		stats[green_pkt_pass] = prof->green_pkt_pass;
 
 	if (green_octs_pass != ROC_NIX_BPF_STATS_MAX)
-		stats[green_octs_pass] = rsp->prof.green_octs_pass;
+		stats[green_octs_pass] = prof->green_octs_pass;
 
 	if (green_pkt_drop != ROC_NIX_BPF_STATS_MAX)
-		stats[green_pkt_drop] = rsp->prof.green_pkt_drop;
+		stats[green_pkt_drop] = prof->green_pkt_drop;
 
 	if (green_octs_drop != ROC_NIX_BPF_STATS_MAX)
-		stats[green_octs_drop] = rsp->prof.green_octs_pass;
+		stats[green_octs_drop] = prof->green_octs_pass;
 
 	if (yellow_pkt_pass != ROC_NIX_BPF_STATS_MAX)
-		stats[yellow_pkt_pass] = rsp->prof.yellow_pkt_pass;
+		stats[yellow_pkt_pass] = prof->yellow_pkt_pass;
 
 	if (yellow_octs_pass != ROC_NIX_BPF_STATS_MAX)
-		stats[yellow_octs_pass] = rsp->prof.yellow_octs_pass;
+		stats[yellow_octs_pass] = prof->yellow_octs_pass;
 
 	if (yellow_pkt_drop != ROC_NIX_BPF_STATS_MAX)
-		stats[yellow_pkt_drop] = rsp->prof.yellow_pkt_drop;
+		stats[yellow_pkt_drop] = prof->yellow_pkt_drop;
 
 	if (yellow_octs_drop != ROC_NIX_BPF_STATS_MAX)
-		stats[yellow_octs_drop] = rsp->prof.yellow_octs_drop;
+		stats[yellow_octs_drop] = prof->yellow_octs_drop;
 
 	if (red_pkt_pass != ROC_NIX_BPF_STATS_MAX)
-		stats[red_pkt_pass] = rsp->prof.red_pkt_pass;
+		stats[red_pkt_pass] = prof->red_pkt_pass;
 
 	if (red_octs_pass != ROC_NIX_BPF_STATS_MAX)
-		stats[red_octs_pass] = rsp->prof.red_octs_pass;
+		stats[red_octs_pass] = prof->red_octs_pass;
 
 	if (red_pkt_drop != ROC_NIX_BPF_STATS_MAX)
-		stats[red_pkt_drop] = rsp->prof.red_pkt_drop;
+		stats[red_pkt_drop] = prof->red_pkt_drop;
 
 	if (red_octs_drop != ROC_NIX_BPF_STATS_MAX)
-		stats[red_octs_drop] = rsp->prof.red_octs_drop;
+		stats[red_octs_drop] = prof->red_octs_drop;
 
 	rc = 0;
 exit:
@@ -1037,9 +1166,9 @@ roc_nix_bpf_stats_reset(struct roc_nix *roc_nix, uint16_t id, uint64_t mask,
 			enum roc_nix_bpf_level_flag lvl_flag)
 {
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	volatile struct nix_band_prof_s *prof, *prof_mask;
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = mbox_get(dev->mbox);
-	struct nix_cn10k_aq_enq_req *aq;
 	uint8_t level_idx;
 	int rc;
 
@@ -1054,68 +1183,81 @@ roc_nix_bpf_stats_reset(struct roc_nix *roc_nix, uint16_t id, uint64_t mask,
 		goto exit;
 	}
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		prof = &aq->prof;
+		prof_mask = &aq->prof_mask;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		prof = &aq->prof;
+		prof_mask = &aq->prof_mask;
 	}
-	aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
-	aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
-	aq->op = NIX_AQ_INSTOP_WRITE;
 
 	if (mask & ROC_NIX_BPF_GREEN_PKT_F_PASS) {
-		aq->prof.green_pkt_pass = 0;
-		aq->prof_mask.green_pkt_pass = ~(aq->prof_mask.green_pkt_pass);
+		prof->green_pkt_pass = 0;
+		prof_mask->green_pkt_pass = ~(prof_mask->green_pkt_pass);
 	}
 	if (mask & ROC_NIX_BPF_GREEN_OCTS_F_PASS) {
-		aq->prof.green_octs_pass = 0;
-		aq->prof_mask.green_octs_pass =
-			~(aq->prof_mask.green_octs_pass);
+		prof->green_octs_pass = 0;
+		prof_mask->green_octs_pass = ~(prof_mask->green_octs_pass);
 	}
 	if (mask & ROC_NIX_BPF_GREEN_PKT_F_DROP) {
-		aq->prof.green_pkt_drop = 0;
-		aq->prof_mask.green_pkt_drop = ~(aq->prof_mask.green_pkt_drop);
+		prof->green_pkt_drop = 0;
+		prof_mask->green_pkt_drop = ~(prof_mask->green_pkt_drop);
 	}
 	if (mask & ROC_NIX_BPF_GREEN_OCTS_F_DROP) {
-		aq->prof.green_octs_drop = 0;
-		aq->prof_mask.green_octs_drop =
-			~(aq->prof_mask.green_octs_drop);
+		prof->green_octs_drop = 0;
+		prof_mask->green_octs_drop = ~(prof_mask->green_octs_drop);
 	}
 	if (mask & ROC_NIX_BPF_YELLOW_PKT_F_PASS) {
-		aq->prof.yellow_pkt_pass = 0;
-		aq->prof_mask.yellow_pkt_pass =
-			~(aq->prof_mask.yellow_pkt_pass);
+		prof->yellow_pkt_pass = 0;
+		prof_mask->yellow_pkt_pass = ~(prof_mask->yellow_pkt_pass);
 	}
 	if (mask & ROC_NIX_BPF_YELLOW_OCTS_F_PASS) {
-		aq->prof.yellow_octs_pass = 0;
-		aq->prof_mask.yellow_octs_pass =
-			~(aq->prof_mask.yellow_octs_pass);
+		prof->yellow_octs_pass = 0;
+		prof_mask->yellow_octs_pass = ~(prof_mask->yellow_octs_pass);
 	}
 	if (mask & ROC_NIX_BPF_YELLOW_PKT_F_DROP) {
-		aq->prof.yellow_pkt_drop = 0;
-		aq->prof_mask.yellow_pkt_drop =
-			~(aq->prof_mask.yellow_pkt_drop);
+		prof->yellow_pkt_drop = 0;
+		prof_mask->yellow_pkt_drop = ~(prof_mask->yellow_pkt_drop);
 	}
 	if (mask & ROC_NIX_BPF_YELLOW_OCTS_F_DROP) {
-		aq->prof.yellow_octs_drop = 0;
-		aq->prof_mask.yellow_octs_drop =
-			~(aq->prof_mask.yellow_octs_drop);
+		prof->yellow_octs_drop = 0;
+		prof_mask->yellow_octs_drop = ~(prof_mask->yellow_octs_drop);
 	}
 	if (mask & ROC_NIX_BPF_RED_PKT_F_PASS) {
-		aq->prof.red_pkt_pass = 0;
-		aq->prof_mask.red_pkt_pass = ~(aq->prof_mask.red_pkt_pass);
+		prof->red_pkt_pass = 0;
+		prof_mask->red_pkt_pass = ~(prof_mask->red_pkt_pass);
 	}
 	if (mask & ROC_NIX_BPF_RED_OCTS_F_PASS) {
-		aq->prof.red_octs_pass = 0;
-		aq->prof_mask.red_octs_pass = ~(aq->prof_mask.red_octs_pass);
+		prof->red_octs_pass = 0;
+		prof_mask->red_octs_pass = ~(prof_mask->red_octs_pass);
 	}
 	if (mask & ROC_NIX_BPF_RED_PKT_F_DROP) {
-		aq->prof.red_pkt_drop = 0;
-		aq->prof_mask.red_pkt_drop = ~(aq->prof_mask.red_pkt_drop);
+		prof->red_pkt_drop = 0;
+		prof_mask->red_pkt_drop = ~(prof_mask->red_pkt_drop);
 	}
 	if (mask & ROC_NIX_BPF_RED_OCTS_F_DROP) {
-		aq->prof.red_octs_drop = 0;
-		aq->prof_mask.red_octs_drop = ~(aq->prof_mask.red_octs_drop);
+		prof->red_octs_drop = 0;
+		prof_mask->red_octs_drop = ~(prof_mask->red_octs_drop);
 	}
 
 	rc = mbox_process(mbox);
diff --git a/drivers/common/cnxk/roc_nix_queue.c b/drivers/common/cnxk/roc_nix_queue.c
index bb1b70424f..06029275af 100644
--- a/drivers/common/cnxk/roc_nix_queue.c
+++ b/drivers/common/cnxk/roc_nix_queue.c
@@ -384,6 +384,94 @@ nix_rq_cn9k_cman_cfg(struct dev *dev, struct roc_nix_rq *rq)
 	return rc;
 }
 
+static int
+nix_rq_cn10k_cman_cfg(struct dev *dev, struct roc_nix_rq *rq)
+{
+	struct nix_cn10k_aq_enq_req *aq;
+	struct mbox *mbox = mbox_get(dev->mbox);
+	int rc;
+
+	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	if (!aq) {
+		rc = -ENOSPC;
+		goto exit;
+	}
+
+	aq->qidx = rq->qid;
+	aq->ctype = NIX_AQ_CTYPE_RQ;
+	aq->op = NIX_AQ_INSTOP_WRITE;
+
+	if (rq->red_pass && (rq->red_pass >= rq->red_drop)) {
+		aq->rq.lpb_pool_pass = rq->red_pass;
+		aq->rq.lpb_pool_drop = rq->red_drop;
+		aq->rq_mask.lpb_pool_pass = ~(aq->rq_mask.lpb_pool_pass);
+		aq->rq_mask.lpb_pool_drop = ~(aq->rq_mask.lpb_pool_drop);
+	}
+
+	if (rq->spb_red_pass && (rq->spb_red_pass >= rq->spb_red_drop)) {
+		aq->rq.spb_pool_pass = rq->spb_red_pass;
+		aq->rq.spb_pool_drop = rq->spb_red_drop;
+		aq->rq_mask.spb_pool_pass = ~(aq->rq_mask.spb_pool_pass);
+		aq->rq_mask.spb_pool_drop = ~(aq->rq_mask.spb_pool_drop);
+	}
+
+	if (rq->xqe_red_pass && (rq->xqe_red_pass >= rq->xqe_red_drop)) {
+		aq->rq.xqe_pass = rq->xqe_red_pass;
+		aq->rq.xqe_drop = rq->xqe_red_drop;
+		aq->rq_mask.xqe_drop = ~(aq->rq_mask.xqe_drop);
+		aq->rq_mask.xqe_pass = ~(aq->rq_mask.xqe_pass);
+	}
+
+	rc = mbox_process(mbox);
+exit:
+	mbox_put(mbox);
+	return rc;
+}
+
+static int
+nix_rq_cman_cfg(struct dev *dev, struct roc_nix_rq *rq)
+{
+	struct nix_cn20k_aq_enq_req *aq;
+	struct mbox *mbox = mbox_get(dev->mbox);
+	int rc;
+
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+	if (!aq) {
+		rc = -ENOSPC;
+		goto exit;
+	}
+
+	aq->qidx = rq->qid;
+	aq->ctype = NIX_AQ_CTYPE_RQ;
+	aq->op = NIX_AQ_INSTOP_WRITE;
+
+	if (rq->red_pass && (rq->red_pass >= rq->red_drop)) {
+		aq->rq.lpb_pool_pass = rq->red_pass;
+		aq->rq.lpb_pool_drop = rq->red_drop;
+		aq->rq_mask.lpb_pool_pass = ~(aq->rq_mask.lpb_pool_pass);
+		aq->rq_mask.lpb_pool_drop = ~(aq->rq_mask.lpb_pool_drop);
+	}
+
+	if (rq->spb_red_pass && (rq->spb_red_pass >= rq->spb_red_drop)) {
+		aq->rq.spb_pool_pass = rq->spb_red_pass;
+		aq->rq.spb_pool_drop = rq->spb_red_drop;
+		aq->rq_mask.spb_pool_pass = ~(aq->rq_mask.spb_pool_pass);
+		aq->rq_mask.spb_pool_drop = ~(aq->rq_mask.spb_pool_drop);
+	}
+
+	if (rq->xqe_red_pass && (rq->xqe_red_pass >= rq->xqe_red_drop)) {
+		aq->rq.xqe_pass = rq->xqe_red_pass;
+		aq->rq.xqe_drop = rq->xqe_red_drop;
+		aq->rq_mask.xqe_drop = ~(aq->rq_mask.xqe_drop);
+		aq->rq_mask.xqe_pass = ~(aq->rq_mask.xqe_pass);
+	}
+
+	rc = mbox_process(mbox);
+exit:
+	mbox_put(mbox);
+	return rc;
+}
+
 int
 nix_rq_cn9k_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints,
 		bool cfg, bool ena)
@@ -680,52 +768,6 @@ nix_rq_cn10k_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cf
 	return 0;
 }
 
-static int
-nix_rq_cman_cfg(struct dev *dev, struct roc_nix_rq *rq)
-{
-	struct nix_cn10k_aq_enq_req *aq;
-	struct mbox *mbox = mbox_get(dev->mbox);
-	int rc;
-
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
-	if (!aq) {
-		rc = -ENOSPC;
-		goto exit;
-	}
-
-	aq->qidx = rq->qid;
-	aq->ctype = NIX_AQ_CTYPE_RQ;
-	aq->op = NIX_AQ_INSTOP_WRITE;
-
-	if (rq->red_pass && (rq->red_pass >= rq->red_drop)) {
-		aq->rq.lpb_pool_pass = rq->red_pass;
-		aq->rq.lpb_pool_drop = rq->red_drop;
-		aq->rq_mask.lpb_pool_pass = ~(aq->rq_mask.lpb_pool_pass);
-		aq->rq_mask.lpb_pool_drop = ~(aq->rq_mask.lpb_pool_drop);
-
-	}
-
-	if (rq->spb_red_pass && (rq->spb_red_pass >= rq->spb_red_drop)) {
-		aq->rq.spb_pool_pass = rq->spb_red_pass;
-		aq->rq.spb_pool_drop = rq->spb_red_drop;
-		aq->rq_mask.spb_pool_pass = ~(aq->rq_mask.spb_pool_pass);
-		aq->rq_mask.spb_pool_drop = ~(aq->rq_mask.spb_pool_drop);
-
-	}
-
-	if (rq->xqe_red_pass && (rq->xqe_red_pass >= rq->xqe_red_drop)) {
-		aq->rq.xqe_pass = rq->xqe_red_pass;
-		aq->rq.xqe_drop = rq->xqe_red_drop;
-		aq->rq_mask.xqe_drop = ~(aq->rq_mask.xqe_drop);
-		aq->rq_mask.xqe_pass = ~(aq->rq_mask.xqe_pass);
-	}
-
-	rc = mbox_process(mbox);
-exit:
-	mbox_put(mbox);
-	return rc;
-}
-
 int
 nix_rq_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cfg, bool ena)
 {
@@ -1021,6 +1063,8 @@ roc_nix_rq_cman_config(struct roc_nix *roc_nix, struct roc_nix_rq *rq)
 
 	if (is_cn9k)
 		rc = nix_rq_cn9k_cman_cfg(dev, rq);
+	else if (roc_model_is_cn10k())
+		rc = nix_rq_cn10k_cman_cfg(dev, rq);
 	else
 		rc = nix_rq_cman_cfg(dev, rq);
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 16/33] common/cnxk: support NIX debug for cn20k
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (14 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 15/33] common/cnxk: support bandwidth profile " Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 17/33] common/cnxk: add RSS support " Nithin Dabilpuram
                   ` (19 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

From: Satha Rao <skoteshwar@marvell.com>

Add support to dump cn20k queue structs and also provide the same
in telemetry data.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
---
 drivers/common/cnxk/cnxk_telemetry_nix.c | 260 ++++++++++++++++++++++-
 drivers/common/cnxk/roc_nix_debug.c      | 234 +++++++++++++++++++-
 drivers/common/cnxk/roc_nix_priv.h       |   3 +-
 3 files changed, 488 insertions(+), 9 deletions(-)

diff --git a/drivers/common/cnxk/cnxk_telemetry_nix.c b/drivers/common/cnxk/cnxk_telemetry_nix.c
index ccae5d7853..abeefafe1e 100644
--- a/drivers/common/cnxk/cnxk_telemetry_nix.c
+++ b/drivers/common/cnxk/cnxk_telemetry_nix.c
@@ -346,7 +346,7 @@ nix_rq_ctx_cn9k(volatile void *qctx, struct plt_tel_data *d)
 }
 
 static void
-nix_rq_ctx(volatile void *qctx, struct plt_tel_data *d)
+nix_rq_ctx_cn10k(volatile void *qctx, struct plt_tel_data *d)
 {
 	volatile struct nix_cn10k_rq_ctx_s *ctx;
 
@@ -438,6 +438,100 @@ nix_rq_ctx(volatile void *qctx, struct plt_tel_data *d)
 	CNXK_TEL_DICT_U64(d, ctx, re_pkts, w10_);
 }
 
+static void
+nix_rq_ctx(volatile void *qctx, struct plt_tel_data *d)
+{
+	volatile struct nix_cn20k_rq_ctx_s *ctx;
+
+	ctx = (volatile struct nix_cn20k_rq_ctx_s *)qctx;
+
+	/* W0 */
+	CNXK_TEL_DICT_INT(d, ctx, wqe_aura, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, len_ol3_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, len_ol4_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, len_il3_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, len_il4_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, csum_ol4_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, csum_il4_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, lenerr_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, port_ol4_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, port_il4_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, ena_wqwd, w0);
+	CNXK_TEL_DICT_INT(d, ctx, ipsech_ena, w0);
+	CNXK_TEL_DICT_INT(d, ctx, sso_ena, w0);
+	CNXK_TEL_DICT_INT(d, ctx, ena, w0);
+
+	/* W1 */
+	CNXK_TEL_DICT_INT(d, ctx, chi_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, ipsecd_drop_en, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, pb_stashing, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_drop_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_drop_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, xqe_drop_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, wqe_caching, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, pb_caching, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, sso_tt, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, sso_grp, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_aura, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_aura, w1_);
+
+	/* W2 */
+	CNXK_TEL_DICT_INT(d, ctx, xqe_hdr_split, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, xqe_imm_copy, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, band_prof_id_h, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, xqe_imm_size, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, later_skip, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, sso_bp_ena, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, first_skip, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_sizem1, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_ena, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_high_sizem1, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, wqe_skip, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_sizem1, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, policer_ena, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, band_prof_id_l, w2_);
+
+	/* W3 */
+	CNXK_TEL_DICT_INT(d, ctx, spb_pool_pass, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_pool_drop, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_aura_pass, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_aura_drop, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, wqe_pool_pass, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, wqe_pool_drop, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, xqe_pass, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, xqe_drop, w3_);
+
+	/* W4 */
+	CNXK_TEL_DICT_INT(d, ctx, qint_idx, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, rq_int_ena, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, rq_int, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_pool_pass, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_pool_drop, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_aura_pass, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_aura_drop, w4_);
+
+	/* W5 */
+	CNXK_TEL_DICT_INT(d, ctx, flow_tagw, w5_);
+	CNXK_TEL_DICT_INT(d, ctx, bad_utag, w5_);
+	CNXK_TEL_DICT_INT(d, ctx, good_utag, w5_);
+	CNXK_TEL_DICT_INT(d, ctx, ltag, w5_);
+
+	/* W6 */
+	CNXK_TEL_DICT_U64(d, ctx, octs, w6_);
+
+	/* W7 */
+	CNXK_TEL_DICT_U64(d, ctx, pkts, w7_);
+
+	/* W8 */
+	CNXK_TEL_DICT_U64(d, ctx, drop_octs, w8_);
+
+	/* W9 */
+	CNXK_TEL_DICT_U64(d, ctx, drop_pkts, w9_);
+
+	/* W10 */
+	CNXK_TEL_DICT_U64(d, ctx, re_pkts, w10_);
+}
+
 static int
 cnxk_tel_nix_rq_ctx(struct roc_nix *roc_nix, uint8_t n, struct plt_tel_data *d)
 {
@@ -459,12 +553,77 @@ cnxk_tel_nix_rq_ctx(struct roc_nix *roc_nix, uint8_t n, struct plt_tel_data *d)
 
 	if (roc_model_is_cn9k())
 		nix_rq_ctx_cn9k(qctx, d);
+	else if (roc_model_is_cn10k())
+		nix_rq_ctx_cn10k(qctx, d);
 	else
 		nix_rq_ctx(qctx, d);
 
 	return 0;
 }
 
+static int
+cnxk_tel_nix_cq_ctx_cn20k(struct roc_nix *roc_nix, uint8_t n, struct plt_tel_data *d)
+{
+	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	struct dev *dev = &nix->dev;
+	struct npa_lf *npa_lf;
+	volatile struct nix_cn20k_cq_ctx_s *ctx;
+	int rc = -1;
+
+	npa_lf = idev_npa_obj_get();
+	if (npa_lf == NULL)
+		return NPA_ERR_DEVICE_NOT_BOUNDED;
+
+	rc = nix_q_ctx_get(dev, NIX_AQ_CTYPE_CQ, n, (void *)&ctx);
+	if (rc) {
+		plt_err("Failed to get cq context");
+		return rc;
+	}
+
+	/* W0 */
+	CNXK_TEL_DICT_PTR(d, ctx, base, w0_);
+
+	/* W1 */
+	CNXK_TEL_DICT_U64(d, ctx, wrptr, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, avg_con, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, cint_idx, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, cq_err, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, qint_idx, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, lbpid_high, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, bpid, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, lbpid_med, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, bp_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, lbpid_low, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, lbp_ena, w1_);
+
+	/* W2 */
+	CNXK_TEL_DICT_INT(d, ctx, update_time, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, avg_level, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, head, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, tail, w2_);
+
+	/* W3 */
+	CNXK_TEL_DICT_INT(d, ctx, cq_err_int_ena, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, cq_err_int, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, qsize, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, stashing, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, caching, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, lbp_frac, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, stash_thresh, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, msh_valid, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, msh_dst, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, cpt_drop_err_en, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, ena, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, drop_ena, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, drop, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, bp, w3_);
+
+	CNXK_TEL_DICT_INT(d, ctx, lbpid_ext, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, bpid_ext, w4_);
+
+	return 0;
+}
+
 static int
 cnxk_tel_nix_cq_ctx(struct roc_nix *roc_nix, uint8_t n, struct plt_tel_data *d)
 {
@@ -474,6 +633,9 @@ cnxk_tel_nix_cq_ctx(struct roc_nix *roc_nix, uint8_t n, struct plt_tel_data *d)
 	volatile struct nix_cq_ctx_s *ctx;
 	int rc = -1;
 
+	if (roc_model_is_cn20k())
+		return cnxk_tel_nix_cq_ctx_cn20k(roc_nix, n, d);
+
 	npa_lf = idev_npa_obj_get();
 	if (npa_lf == NULL)
 		return NPA_ERR_DEVICE_NOT_BOUNDED;
@@ -602,7 +764,7 @@ nix_sq_ctx_cn9k(volatile void *qctx, struct plt_tel_data *d)
 }
 
 static void
-nix_sq_ctx(volatile void *qctx, struct plt_tel_data *d)
+nix_sq_ctx_cn10k(volatile void *qctx, struct plt_tel_data *d)
 {
 	volatile struct nix_cn10k_sq_ctx_s *ctx;
 
@@ -617,6 +779,97 @@ nix_sq_ctx(volatile void *qctx, struct plt_tel_data *d)
 	CNXK_TEL_DICT_INT(d, ctx, ena, w0_);
 
 	/* W1 */
+	CNXK_TEL_DICT_INT(d, ctx, smq_rr_count_lb, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, sqb_count, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, default_chan, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, smq_rr_weight, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, sso_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, xoff, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, cq_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, smq, w1_);
+
+	/* W2 */
+	CNXK_TEL_DICT_INT(d, ctx, sqe_stype, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, sq_int_ena, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, sq_int, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, sqb_aura, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, smq_rr_count_ub, w2_);
+
+	/* W3 */
+	CNXK_TEL_DICT_INT(d, ctx, smq_next_sq_vld, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, smq_pend, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, smenq_next_sqb_vld, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, head_offset, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, smenq_offset, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, tail_offset, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, smq_lso_segnum, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, smq_next_sq, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, mnq_dis, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, lmt_dis, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, cq_limit, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, max_sqe_size, w3_);
+
+	/* W4 */
+	CNXK_TEL_DICT_PTR(d, ctx, next_sqb, w4_);
+
+	/* W5 */
+	CNXK_TEL_DICT_PTR(d, ctx, tail_sqb, w5_);
+
+	/* W6 */
+	CNXK_TEL_DICT_PTR(d, ctx, smenq_sqb, w6_);
+
+	/* W7 */
+	CNXK_TEL_DICT_PTR(d, ctx, smenq_next_sqb, w7_);
+
+	/* W8 */
+	CNXK_TEL_DICT_PTR(d, ctx, head_sqb, w8_);
+
+	/* W9 */
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_vld, w9_);
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_vlan1_ins_ena, w9_);
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_vlan0_ins_ena, w9_);
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_mps, w9_);
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_sb, w9_);
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_sizem1, w9_);
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_total, w9_);
+
+	/* W10 */
+	CNXK_TEL_DICT_BF_PTR(d, ctx, scm_lso_rem, w10_);
+
+	/* W11 */
+	CNXK_TEL_DICT_BF_PTR(d, ctx, octs, w11_);
+
+	/* W12 */
+	CNXK_TEL_DICT_BF_PTR(d, ctx, pkts, w12_);
+
+	/* W13 */
+	CNXK_TEL_DICT_INT(d, ctx, aged_drop_octs, w13_);
+	CNXK_TEL_DICT_INT(d, ctx, aged_drop_pkts, w13_);
+
+	/* W14 */
+	CNXK_TEL_DICT_BF_PTR(d, ctx, drop_octs, w14_);
+
+	/* W15 */
+	CNXK_TEL_DICT_BF_PTR(d, ctx, drop_pkts, w15_);
+}
+
+static void
+nix_sq_ctx(volatile void *qctx, struct plt_tel_data *d)
+{
+	volatile struct nix_cn20k_sq_ctx_s *ctx;
+
+	ctx = (volatile struct nix_cn20k_sq_ctx_s *)qctx;
+
+	/* W0 */
+	CNXK_TEL_DICT_INT(d, ctx, sqe_way_mask, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, cq, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, sdp_mcast, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, substream, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, qint_idx, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, ena, w0_);
+
+	/* W1 */
+	CNXK_TEL_DICT_INT(d, ctx, smq_rr_count_lb, w1_);
 	CNXK_TEL_DICT_INT(d, ctx, sqb_count, w1_);
 	CNXK_TEL_DICT_INT(d, ctx, default_chan, w1_);
 	CNXK_TEL_DICT_INT(d, ctx, smq_rr_weight, w1_);
@@ -631,7 +884,6 @@ nix_sq_ctx(volatile void *qctx, struct plt_tel_data *d)
 	CNXK_TEL_DICT_INT(d, ctx, sq_int, w2_);
 	CNXK_TEL_DICT_INT(d, ctx, sqb_aura, w2_);
 	CNXK_TEL_DICT_INT(d, ctx, smq_rr_count_ub, w2_);
-	CNXK_TEL_DICT_INT(d, ctx, smq_rr_count_lb, w2_);
 
 	/* W3 */
 	CNXK_TEL_DICT_INT(d, ctx, smq_next_sq_vld, w3_);
@@ -712,6 +964,8 @@ cnxk_tel_nix_sq_ctx(struct roc_nix *roc_nix, uint8_t n, struct plt_tel_data *d)
 
 	if (roc_model_is_cn9k())
 		nix_sq_ctx_cn9k(qctx, d);
+	else if (roc_model_is_cn10k())
+		nix_sq_ctx_cn10k(qctx, d);
 	else
 		nix_sq_ctx(qctx, d);
 
diff --git a/drivers/common/cnxk/roc_nix_debug.c b/drivers/common/cnxk/roc_nix_debug.c
index 2e91470c09..0cc8d7cc1e 100644
--- a/drivers/common/cnxk/roc_nix_debug.c
+++ b/drivers/common/cnxk/roc_nix_debug.c
@@ -358,7 +358,7 @@ nix_q_ctx_get(struct dev *dev, uint8_t ctype, uint16_t qid, __io void **ctx_p)
 			*ctx_p = &rsp->sq;
 		else
 			*ctx_p = &rsp->cq;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_rsp *rsp;
 		struct nix_cn10k_aq_enq_req *aq;
 
@@ -372,6 +372,30 @@ nix_q_ctx_get(struct dev *dev, uint8_t ctype, uint16_t qid, __io void **ctx_p)
 		aq->ctype = ctype;
 		aq->op = NIX_AQ_INSTOP_READ;
 
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc)
+			goto exit;
+
+		if (ctype == NIX_AQ_CTYPE_RQ)
+			*ctx_p = &rsp->rq;
+		else if (ctype == NIX_AQ_CTYPE_SQ)
+			*ctx_p = &rsp->sq;
+		else
+			*ctx_p = &rsp->cq;
+	} else {
+		struct nix_cn20k_aq_enq_rsp *rsp;
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = qid;
+		aq->ctype = ctype;
+		aq->op = NIX_AQ_INSTOP_READ;
+
 		rc = mbox_process_msg(mbox, (void *)&rsp);
 		if (rc)
 			goto exit;
@@ -452,7 +476,69 @@ nix_cn9k_lf_sq_dump(__io struct nix_sq_ctx_s *ctx, uint32_t *sqb_aura_p, FILE *f
 }
 
 static inline void
-nix_lf_sq_dump(__io struct nix_cn10k_sq_ctx_s *ctx, uint32_t *sqb_aura_p, FILE *file)
+nix_cn10k_lf_sq_dump(__io struct nix_cn10k_sq_ctx_s *ctx, uint32_t *sqb_aura_p, FILE *file)
+{
+	nix_dump(file, "W0: sqe_way_mask \t\t%d\nW0: cq \t\t\t\t%d",
+		 ctx->sqe_way_mask, ctx->cq);
+	nix_dump(file, "W0: sdp_mcast \t\t\t%d\nW0: substream \t\t\t0x%03x",
+		 ctx->sdp_mcast, ctx->substream);
+	nix_dump(file, "W0: qint_idx \t\t\t%d\nW0: ena \t\t\t%d\n", ctx->qint_idx,
+		 ctx->ena);
+
+	nix_dump(file, "W1: sqb_count \t\t\t%d\nW1: default_chan \t\t%d",
+		 ctx->sqb_count, ctx->default_chan);
+	nix_dump(file, "W1: smq_rr_weight \t\t%d\nW1: sso_ena \t\t\t%d",
+		 ctx->smq_rr_weight, ctx->sso_ena);
+	nix_dump(file, "W1: xoff \t\t\t%d\nW1: cq_ena \t\t\t%d\nW1: smq\t\t\t\t%d\n",
+		 ctx->xoff, ctx->cq_ena, ctx->smq);
+
+	nix_dump(file, "W2: sqe_stype \t\t\t%d\nW2: sq_int_ena \t\t\t%d",
+		 ctx->sqe_stype, ctx->sq_int_ena);
+	nix_dump(file, "W2: sq_int  \t\t\t%d\nW2: sqb_aura \t\t\t%d", ctx->sq_int,
+		 ctx->sqb_aura);
+	nix_dump(file, "W2: smq_rr_count[ub:lb] \t\t%x:%x\n", ctx->smq_rr_count_ub,
+		 ctx->smq_rr_count_lb);
+
+	nix_dump(file, "W3: smq_next_sq_vld\t\t%d\nW3: smq_pend\t\t\t%d",
+		 ctx->smq_next_sq_vld, ctx->smq_pend);
+	nix_dump(file, "W3: smenq_next_sqb_vld  \t%d\nW3: head_offset\t\t\t%d",
+		 ctx->smenq_next_sqb_vld, ctx->head_offset);
+	nix_dump(file, "W3: smenq_offset\t\t%d\nW3: tail_offset \t\t%d",
+		 ctx->smenq_offset, ctx->tail_offset);
+	nix_dump(file, "W3: smq_lso_segnum \t\t%d\nW3: smq_next_sq \t\t%d",
+		 ctx->smq_lso_segnum, ctx->smq_next_sq);
+	nix_dump(file, "W3: mnq_dis \t\t\t%d\nW3: lmt_dis \t\t\t%d", ctx->mnq_dis,
+		 ctx->lmt_dis);
+	nix_dump(file, "W3: cq_limit\t\t\t%d\nW3: max_sqe_size\t\t%d\n",
+		 ctx->cq_limit, ctx->max_sqe_size);
+
+	nix_dump(file, "W4: next_sqb \t\t\t0x%" PRIx64 "", ctx->next_sqb);
+	nix_dump(file, "W5: tail_sqb \t\t\t0x%" PRIx64 "", ctx->tail_sqb);
+	nix_dump(file, "W6: smenq_sqb \t\t\t0x%" PRIx64 "", ctx->smenq_sqb);
+	nix_dump(file, "W7: smenq_next_sqb \t\t0x%" PRIx64 "", ctx->smenq_next_sqb);
+	nix_dump(file, "W8: head_sqb \t\t\t0x%" PRIx64 "", ctx->head_sqb);
+
+	nix_dump(file, "W9: vfi_lso_vld \t\t%d\nW9: vfi_lso_vlan1_ins_ena\t%d", ctx->vfi_lso_vld,
+		 ctx->vfi_lso_vlan1_ins_ena);
+	nix_dump(file, "W9: vfi_lso_vlan0_ins_ena\t%d\nW9: vfi_lso_mps\t\t\t%d",
+		 ctx->vfi_lso_vlan0_ins_ena, ctx->vfi_lso_mps);
+	nix_dump(file, "W9: vfi_lso_sb \t\t\t%d\nW9: vfi_lso_sizem1\t\t%d", ctx->vfi_lso_sb,
+		 ctx->vfi_lso_sizem1);
+	nix_dump(file, "W9: vfi_lso_total\t\t%d", ctx->vfi_lso_total);
+
+	nix_dump(file, "W10: scm_lso_rem \t\t0x%" PRIx64 "", (uint64_t)ctx->scm_lso_rem);
+	nix_dump(file, "W11: octs \t\t\t0x%" PRIx64 "", (uint64_t)ctx->octs);
+	nix_dump(file, "W12: pkts \t\t\t0x%" PRIx64 "", (uint64_t)ctx->pkts);
+	nix_dump(file, "W13: aged_drop_pkts \t\t\t0x%" PRIx64 "", (uint64_t)ctx->aged_drop_pkts);
+	nix_dump(file, "W13: aged_drop_octs \t\t\t0x%" PRIx64 "", (uint64_t)ctx->aged_drop_octs);
+	nix_dump(file, "W14: dropped_octs \t\t0x%" PRIx64 "", (uint64_t)ctx->drop_octs);
+	nix_dump(file, "W15: dropped_pkts \t\t0x%" PRIx64 "", (uint64_t)ctx->drop_pkts);
+
+	*sqb_aura_p = ctx->sqb_aura;
+}
+
+static inline void
+nix_lf_sq_dump(__io struct nix_cn20k_sq_ctx_s *ctx, uint32_t *sqb_aura_p, FILE *file)
 {
 	nix_dump(file, "W0: sqe_way_mask \t\t%d\nW0: cq \t\t\t\t%d",
 		 ctx->sqe_way_mask, ctx->cq);
@@ -574,7 +660,7 @@ nix_cn9k_lf_rq_dump(__io struct nix_rq_ctx_s *ctx, FILE *file)
 }
 
 void
-nix_lf_rq_dump(__io struct nix_cn10k_rq_ctx_s *ctx, FILE *file)
+nix_cn10k_lf_rq_dump(__io struct nix_cn10k_rq_ctx_s *ctx, FILE *file)
 {
 	nix_dump(file, "W0: wqe_aura \t\t\t%d\nW0: len_ol3_dis \t\t\t%d",
 		 ctx->wqe_aura, ctx->len_ol3_dis);
@@ -649,6 +735,124 @@ nix_lf_rq_dump(__io struct nix_cn10k_rq_ctx_s *ctx, FILE *file)
 	nix_dump(file, "W10: re_pkts \t\t\t0x%" PRIx64 "\n", (uint64_t)ctx->re_pkts);
 }
 
+void
+nix_lf_rq_dump(__io struct nix_cn20k_rq_ctx_s *ctx, FILE *file)
+{
+	nix_dump(file, "W0: wqe_aura \t\t\t%d\nW0: len_ol3_dis \t\t\t%d",
+		 ctx->wqe_aura, ctx->len_ol3_dis);
+	nix_dump(file, "W0: len_ol4_dis \t\t\t%d\nW0: len_il3_dis \t\t\t%d",
+		 ctx->len_ol4_dis, ctx->len_il3_dis);
+	nix_dump(file, "W0: len_il4_dis \t\t\t%d\nW0: csum_ol4_dis \t\t\t%d",
+		 ctx->len_il4_dis, ctx->csum_ol4_dis);
+	nix_dump(file, "W0: csum_il4_dis \t\t\t%d\nW0: lenerr_dis \t\t\t%d",
+		 ctx->csum_il4_dis, ctx->lenerr_dis);
+	nix_dump(file, "W0: port_ol4_dis \t\t\t%d\nW0: port_il4_dis\t\t\t%d",
+		 ctx->port_ol4_dis, ctx->port_il4_dis);
+	nix_dump(file, "W0: cq \t\t\t\t%d\nW0: ena_wqwd \t\t\t%d", ctx->cq,
+		 ctx->ena_wqwd);
+	nix_dump(file, "W0: ipsech_ena \t\t\t%d\nW0: sso_ena \t\t\t%d",
+		 ctx->ipsech_ena, ctx->sso_ena);
+	nix_dump(file, "W0: ena \t\t\t%d\n", ctx->ena);
+
+	nix_dump(file, "W1: chi_ena \t\t%d\nW1: ipsecd_drop_en \t\t%d", ctx->chi_ena,
+		 ctx->ipsecd_drop_en);
+	nix_dump(file, "W1: pb_stashing \t\t\t%d", ctx->pb_stashing);
+	nix_dump(file, "W1: lpb_drop_ena \t\t%d\nW1: spb_drop_ena \t\t%d",
+		 ctx->lpb_drop_ena, ctx->spb_drop_ena);
+	nix_dump(file, "W1: xqe_drop_ena \t\t%d\nW1: wqe_caching \t\t%d",
+		 ctx->xqe_drop_ena, ctx->wqe_caching);
+	nix_dump(file, "W1: pb_caching \t\t\t%d\nW1: sso_tt \t\t\t%d",
+		 ctx->pb_caching, ctx->sso_tt);
+	nix_dump(file, "W1: sso_grp \t\t\t%d\nW1: lpb_aura \t\t\t%d", ctx->sso_grp,
+		 ctx->lpb_aura);
+	nix_dump(file, "W1: spb_aura \t\t\t%d\n", ctx->spb_aura);
+
+	nix_dump(file, "W2: xqe_hdr_split \t\t%d\nW2: xqe_imm_copy \t\t%d",
+		 ctx->xqe_hdr_split, ctx->xqe_imm_copy);
+	nix_dump(file, "W2: band_prof_id\t\t%d\n",
+		 ((ctx->band_prof_id_h << 10) | ctx->band_prof_id_l));
+	nix_dump(file, "W2: xqe_imm_size \t\t%d\nW2: later_skip \t\t\t%d",
+		 ctx->xqe_imm_size, ctx->later_skip);
+	nix_dump(file, "W2: sso_bp_ena\t\t%d\n", ctx->sso_bp_ena);
+	nix_dump(file, "W2: first_skip \t\t\t%d\nW2: lpb_sizem1 \t\t\t%d",
+		 ctx->first_skip, ctx->lpb_sizem1);
+	nix_dump(file, "W2: spb_ena \t\t\t%d\nW2: spb_high_sizem1 \t\t\t%d", ctx->spb_ena,
+		 ctx->spb_high_sizem1);
+	nix_dump(file, "W2: wqe_skip \t\t\t%d", ctx->wqe_skip);
+	nix_dump(file, "W2: spb_sizem1 \t\t\t%d\nW2: policer_ena \t\t\t%d",
+		 ctx->spb_sizem1, ctx->policer_ena);
+	nix_dump(file, "W2: sso_fc_ena \t\t\t%d\n", ctx->sso_fc_ena);
+
+	nix_dump(file, "W3: spb_pool_pass \t\t%d\nW3: spb_pool_drop \t\t%d",
+		 ctx->spb_pool_pass, ctx->spb_pool_drop);
+	nix_dump(file, "W3: spb_aura_pass \t\t%d\nW3: spb_aura_drop \t\t%d",
+		 ctx->spb_aura_pass, ctx->spb_aura_drop);
+	nix_dump(file, "W3: wqe_pool_pass \t\t%d\nW3: wqe_pool_drop \t\t%d",
+		 ctx->wqe_pool_pass, ctx->wqe_pool_drop);
+	nix_dump(file, "W3: xqe_pass \t\t\t%d\nW3: xqe_drop \t\t\t%d\n",
+		 ctx->xqe_pass, ctx->xqe_drop);
+
+	nix_dump(file, "W4: qint_idx \t\t\t%d\nW4: rq_int_ena \t\t\t%d",
+		 ctx->qint_idx, ctx->rq_int_ena);
+	nix_dump(file, "W4: rq_int \t\t\t%d\nW4: lpb_pool_pass \t\t%d", ctx->rq_int,
+		 ctx->lpb_pool_pass);
+	nix_dump(file, "W4: lpb_pool_drop \t\t%d\nW4: lpb_aura_pass \t\t%d",
+		 ctx->lpb_pool_drop, ctx->lpb_aura_pass);
+	nix_dump(file, "W4: lpb_aura_drop \t\t%d\n", ctx->lpb_aura_drop);
+
+	nix_dump(file, "W5: flow_tagw \t\t\t%d\nW5: bad_utag \t\t\t%d",
+		 ctx->flow_tagw, ctx->bad_utag);
+	nix_dump(file, "W5: good_utag \t\t\t%d\nW5: ltag \t\t\t%d\n", ctx->good_utag,
+		 ctx->ltag);
+
+	nix_dump(file, "W6: octs \t\t\t0x%" PRIx64 "", (uint64_t)ctx->octs);
+	nix_dump(file, "W7: pkts \t\t\t0x%" PRIx64 "", (uint64_t)ctx->pkts);
+	nix_dump(file, "W8: drop_octs \t\t\t0x%" PRIx64 "", (uint64_t)ctx->drop_octs);
+	nix_dump(file, "W9: drop_pkts \t\t\t0x%" PRIx64 "", (uint64_t)ctx->drop_pkts);
+	nix_dump(file, "W10: re_pkts \t\t\t0x%" PRIx64 "\n", (uint64_t)ctx->re_pkts);
+}
+
+static inline void
+nix_cn20k_lf_cq_dump(__io struct nix_cn20k_cq_ctx_s *ctx, FILE *file)
+{
+	nix_dump(file, "W0: base \t\t\t0x%" PRIx64 "\n", ctx->base);
+
+	nix_dump(file, "W1: wrptr \t\t\t%" PRIx64 "", (uint64_t)ctx->wrptr);
+	nix_dump(file, "W1: avg_con \t\t\t%d\nW1: cint_idx \t\t\t%d", ctx->avg_con,
+		 ctx->cint_idx);
+	nix_dump(file, "W1: cq_err \t\t\t%d\nW1: qint_idx \t\t\t%d", ctx->cq_err,
+		 ctx->qint_idx);
+	nix_dump(file, "W1: bpid  \t\t\t%d\nW1: bp_ena \t\t\t%d\n", ctx->bpid,
+		 ctx->bp_ena);
+	nix_dump(file,
+		 "W1: lbpid_high \t\t\t0x%03x\nW1: lbpid_med \t\t\t0x%03x\n"
+		 "W1: lbpid_low \t\t\t0x%03x\n(W1: lbpid) \t\t\t0x%03x\n",
+		 ctx->lbpid_high, ctx->lbpid_med, ctx->lbpid_low, (unsigned int)
+		 (ctx->lbpid_high << 6 | ctx->lbpid_med << 3 | ctx->lbpid_low));
+	nix_dump(file, "W1: lbp_ena \t\t\t\t%d\n", ctx->lbp_ena);
+
+	nix_dump(file, "W2: update_time \t\t%d\nW2: avg_level \t\t\t%d",
+		 ctx->update_time, ctx->avg_level);
+	nix_dump(file, "W2: head \t\t\t%d\nW2: tail \t\t\t%d\n", ctx->head,
+		 ctx->tail);
+
+	nix_dump(file, "W3: cq_err_int_ena \t\t%d\nW3: cq_err_int \t\t\t%d",
+		 ctx->cq_err_int_ena, ctx->cq_err_int);
+	nix_dump(file, "W3: qsize \t\t\t%d\nW3: stashing \t\t\t%d", ctx->qsize,
+		 ctx->stashing);
+	nix_dump(file, "W3: caching \t\t\t%d\nW3: lbp_frac \t\t\t%d", ctx->caching, ctx->lbp_frac);
+	nix_dump(file, "W3: stash_thresh \t\t\t%d\nW3: msh_valid\t\t\t%d", ctx->stash_thresh,
+		 ctx->msh_valid);
+	nix_dump(file, "W3: msh_dst \t\t\t0x%03x\nW3: cpt_drop_err_en \t\t\t%d\n",
+		 ctx->msh_dst, ctx->cpt_drop_err_en);
+	nix_dump(file, "W3: ena \t\t\t%d\n", ctx->ena);
+	nix_dump(file, "W3: drop_ena \t\t\t%d\nW3: drop \t\t\t%d", ctx->drop_ena,
+		 ctx->drop);
+	nix_dump(file, "W3: bp \t\t\t\t%d\n", ctx->bp);
+	nix_dump(file, "W4: lbpid_ext \t\t\t%d\nW3: bpid_ext \t\t\t%d", ctx->lbpid_ext,
+		 ctx->bpid_ext);
+}
+
 static inline void
 nix_lf_cq_dump(__io struct nix_cq_ctx_s *ctx, FILE *file)
 {
@@ -713,7 +917,10 @@ roc_nix_queues_ctx_dump(struct roc_nix *roc_nix, FILE *file)
 		}
 		nix_dump(file, "============== port=%d cq=%d ===============",
 			 roc_nix->port_id, q);
-		nix_lf_cq_dump(ctx, file);
+		if (roc_model_is_cn20k())
+			nix_cn20k_lf_cq_dump(ctx, file);
+		else
+			nix_lf_cq_dump(ctx, file);
 	}
 
 	for (q = 0; q < rq; q++) {
@@ -726,6 +933,8 @@ roc_nix_queues_ctx_dump(struct roc_nix *roc_nix, FILE *file)
 			 roc_nix->port_id, q);
 		if (roc_model_is_cn9k())
 			nix_cn9k_lf_rq_dump(ctx, file);
+		else if (roc_model_is_cn10k())
+			nix_cn10k_lf_rq_dump(ctx, file);
 		else
 			nix_lf_rq_dump(ctx, file);
 	}
@@ -751,6 +960,8 @@ roc_nix_queues_ctx_dump(struct roc_nix *roc_nix, FILE *file)
 			 inl_rq->qid);
 		if (roc_model_is_cn9k())
 			nix_cn9k_lf_rq_dump(ctx, file);
+		else if (roc_model_is_cn10k())
+			nix_cn10k_lf_rq_dump(ctx, file);
 		else
 			nix_lf_rq_dump(ctx, file);
 	}
@@ -765,6 +976,8 @@ roc_nix_queues_ctx_dump(struct roc_nix *roc_nix, FILE *file)
 			 roc_nix->port_id, q);
 		if (roc_model_is_cn9k())
 			nix_cn9k_lf_sq_dump(ctx, &sqb_aura, file);
+		else if (roc_model_is_cn10k())
+			nix_cn10k_lf_sq_dump(ctx, &sqb_aura, file);
 		else
 			nix_lf_sq_dump(ctx, &sqb_aura, file);
 
@@ -1480,9 +1693,20 @@ roc_nix_sq_desc_dump(struct roc_nix *roc_nix, uint16_t q, uint16_t offset, uint1
 		tail_sqb = (void *)ctx->tail_sqb;
 		head_off = ctx->head_offset;
 		tail_off = ctx->tail_offset;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		volatile struct nix_cn10k_sq_ctx_s *ctx = (struct nix_cn10k_sq_ctx_s *)dat;
 
+		if (ctx->mnq_dis || ctx->lmt_dis)
+			full = 1;
+
+		count = ctx->sqb_count;
+		sqb_buf = (void *)ctx->head_sqb;
+		tail_sqb = (void *)ctx->tail_sqb;
+		head_off = ctx->head_offset;
+		tail_off = ctx->tail_offset;
+	} else {
+		volatile struct nix_cn20k_sq_ctx_s *ctx = (struct nix_cn20k_sq_ctx_s *)dat;
+
 		if (ctx->mnq_dis || ctx->lmt_dis)
 			full = 1;
 
diff --git a/drivers/common/cnxk/roc_nix_priv.h b/drivers/common/cnxk/roc_nix_priv.h
index ade42c1878..3fd6fcbe9f 100644
--- a/drivers/common/cnxk/roc_nix_priv.h
+++ b/drivers/common/cnxk/roc_nix_priv.h
@@ -469,7 +469,8 @@ struct nix_tm_shaper_profile *nix_tm_shaper_profile_alloc(void);
 void nix_tm_shaper_profile_free(struct nix_tm_shaper_profile *profile);
 
 uint64_t nix_get_blkaddr(struct dev *dev);
-void nix_lf_rq_dump(__io struct nix_cn10k_rq_ctx_s *ctx, FILE *file);
+void nix_cn10k_lf_rq_dump(__io struct nix_cn10k_rq_ctx_s *ctx, FILE *file);
+void nix_lf_rq_dump(__io struct nix_cn20k_rq_ctx_s *ctx, FILE *file);
 int nix_lf_gen_reg_dump(uintptr_t nix_lf_base, uint64_t *data);
 int nix_lf_stat_reg_dump(uintptr_t nix_lf_base, uint64_t *data, uint8_t lf_tx_stats,
 			 uint8_t lf_rx_stats);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 17/33] common/cnxk: add RSS support for cn20k
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (15 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 16/33] common/cnxk: support NIX debug " Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 18/33] net/cnxk: add cn20k base control path support Nithin Dabilpuram
                   ` (18 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

From: Satha Rao <skoteshwar@marvell.com>

Add RSS configuration support for cn20k.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
---
 drivers/common/cnxk/roc_nix_rss.c | 74 +++++++++++++++++++++++++++++--
 1 file changed, 70 insertions(+), 4 deletions(-)

diff --git a/drivers/common/cnxk/roc_nix_rss.c b/drivers/common/cnxk/roc_nix_rss.c
index 2b88e1360d..fd1472e9b9 100644
--- a/drivers/common/cnxk/roc_nix_rss.c
+++ b/drivers/common/cnxk/roc_nix_rss.c
@@ -70,7 +70,7 @@ nix_cn9k_rss_reta_set(struct nix *nix, uint8_t group,
 				goto exit;
 			req = mbox_alloc_msg_nix_aq_enq(mbox);
 			if (!req) {
-				rc =  NIX_ERR_NO_MEM;
+				rc = NIX_ERR_NO_MEM;
 				goto exit;
 			}
 		}
@@ -93,7 +93,7 @@ nix_cn9k_rss_reta_set(struct nix *nix, uint8_t group,
 				goto exit;
 			req = mbox_alloc_msg_nix_aq_enq(mbox);
 			if (!req) {
-				rc =  NIX_ERR_NO_MEM;
+				rc = NIX_ERR_NO_MEM;
 				goto exit;
 			}
 		}
@@ -115,8 +115,8 @@ nix_cn9k_rss_reta_set(struct nix *nix, uint8_t group,
 }
 
 static int
-nix_rss_reta_set(struct nix *nix, uint8_t group,
-		 uint16_t reta[ROC_NIX_RSS_RETA_MAX], uint8_t lock_rx_ctx)
+nix_cn10k_rss_reta_set(struct nix *nix, uint8_t group, uint16_t reta[ROC_NIX_RSS_RETA_MAX],
+		       uint8_t lock_rx_ctx)
 {
 	struct mbox *mbox = mbox_get((&nix->dev)->mbox);
 	struct nix_cn10k_aq_enq_req *req;
@@ -178,6 +178,70 @@ nix_rss_reta_set(struct nix *nix, uint8_t group,
 	return rc;
 }
 
+static int
+nix_rss_reta_set(struct nix *nix, uint8_t group, uint16_t reta[ROC_NIX_RSS_RETA_MAX],
+		 uint8_t lock_rx_ctx)
+{
+	struct mbox *mbox = mbox_get((&nix->dev)->mbox);
+	struct nix_cn20k_aq_enq_req *req;
+	uint16_t idx;
+	int rc;
+
+	for (idx = 0; idx < nix->reta_sz; idx++) {
+		req = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!req) {
+			/* The shared memory buffer can be full.
+			 * Flush it and retry
+			 */
+			rc = mbox_process(mbox);
+			if (rc < 0)
+				goto exit;
+			req = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+			if (!req) {
+				rc =  NIX_ERR_NO_MEM;
+				goto exit;
+			}
+		}
+		req->rss.rq = reta[idx];
+		/* Fill AQ info */
+		req->qidx = (group * nix->reta_sz) + idx;
+		req->ctype = NIX_AQ_CTYPE_RSS;
+		req->op = NIX_AQ_INSTOP_INIT;
+
+		if (!lock_rx_ctx)
+			continue;
+
+		req = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!req) {
+			/* The shared memory buffer can be full.
+			 * Flush it and retry
+			 */
+			rc = mbox_process(mbox);
+			if (rc < 0)
+				goto exit;
+			req = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+			if (!req) {
+				rc =  NIX_ERR_NO_MEM;
+				goto exit;
+			}
+		}
+		req->rss.rq = reta[idx];
+		/* Fill AQ info */
+		req->qidx = (group * nix->reta_sz) + idx;
+		req->ctype = NIX_AQ_CTYPE_RSS;
+		req->op = NIX_AQ_INSTOP_LOCK;
+	}
+
+	rc = mbox_process(mbox);
+	if (rc < 0)
+		goto exit;
+
+	rc = 0;
+exit:
+	mbox_put(mbox);
+	return rc;
+}
+
 int
 roc_nix_rss_reta_set(struct roc_nix *roc_nix, uint8_t group,
 		     uint16_t reta[ROC_NIX_RSS_RETA_MAX])
@@ -191,6 +255,8 @@ roc_nix_rss_reta_set(struct roc_nix *roc_nix, uint8_t group,
 	if (roc_model_is_cn9k())
 		rc = nix_cn9k_rss_reta_set(nix, group, reta,
 					   roc_nix->lock_rx_ctx);
+	else if (roc_model_is_cn10k())
+		rc = nix_cn10k_rss_reta_set(nix, group, reta, roc_nix->lock_rx_ctx);
 	else
 		rc = nix_rss_reta_set(nix, group, reta, roc_nix->lock_rx_ctx);
 	if (rc)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 18/33] net/cnxk: add cn20k base control path support
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (16 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 17/33] common/cnxk: add RSS support " Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 19/33] net/cnxk: support Rx function select for cn20k Nithin Dabilpuram
                   ` (17 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra, Anatoly Burakov
  Cc: dev, Rakesh Kudurumalla, Rahul Bhansali

Add cn20k base control path support for ethdev.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
---
 drivers/net/cnxk/cn20k_ethdev.c   | 553 ++++++++++++++++++++++++++++++
 drivers/net/cnxk/cn20k_ethdev.h   |  11 +
 drivers/net/cnxk/cn20k_rx.h       |  33 ++
 drivers/net/cnxk/cn20k_rxtx.h     |  89 +++++
 drivers/net/cnxk/cn20k_tx.h       |  35 ++
 drivers/net/cnxk/cnxk_ethdev_dp.h |   3 +
 drivers/net/cnxk/meson.build      |  11 +-
 7 files changed, 734 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/cnxk/cn20k_ethdev.c
 create mode 100644 drivers/net/cnxk/cn20k_ethdev.h
 create mode 100644 drivers/net/cnxk/cn20k_rx.h
 create mode 100644 drivers/net/cnxk/cn20k_rxtx.h
 create mode 100644 drivers/net/cnxk/cn20k_tx.h

diff --git a/drivers/net/cnxk/cn20k_ethdev.c b/drivers/net/cnxk/cn20k_ethdev.c
new file mode 100644
index 0000000000..b4d21fe4be
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_ethdev.c
@@ -0,0 +1,553 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+#include "cn20k_ethdev.h"
+#include "cn20k_rx.h"
+#include "cn20k_tx.h"
+
+static int
+cn20k_nix_ptypes_set(struct rte_eth_dev *eth_dev, uint32_t ptype_mask)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	if (ptype_mask) {
+		dev->rx_offload_flags |= NIX_RX_OFFLOAD_PTYPE_F;
+		dev->ptype_disable = 0;
+	} else {
+		dev->rx_offload_flags &= ~NIX_RX_OFFLOAD_PTYPE_F;
+		dev->ptype_disable = 1;
+	}
+
+	return 0;
+}
+
+static void
+nix_form_default_desc(struct cnxk_eth_dev *dev, struct cn20k_eth_txq *txq, uint16_t qid)
+{
+	union nix_send_hdr_w0_u send_hdr_w0;
+
+	/* Initialize the fields based on basic single segment packet */
+	send_hdr_w0.u = 0;
+	if (dev->tx_offload_flags & NIX_TX_NEED_EXT_HDR) {
+		/* 2(HDR) + 2(EXT_HDR) + 1(SG) + 1(IOVA) = 6/2 - 1 = 2 */
+		send_hdr_w0.sizem1 = 2;
+		if (dev->tx_offload_flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+			/* Default: one seg packet would have:
+			 * 2(HDR) + 2(EXT) + 1(SG) + 1(IOVA) + 2(MEM)
+			 * => 8/2 - 1 = 3
+			 */
+			send_hdr_w0.sizem1 = 3;
+
+			/* To calculate the offset for send_mem,
+			 * send_hdr->w0.sizem1 * 2
+			 */
+			txq->ts_mem = dev->tstamp.tx_tstamp_iova;
+		}
+	} else {
+		/* 2(HDR) + 1(SG) + 1(IOVA) = 4/2 - 1 = 1 */
+		send_hdr_w0.sizem1 = 1;
+	}
+	send_hdr_w0.sq = qid;
+	txq->send_hdr_w0 = send_hdr_w0.u;
+	rte_wmb();
+}
+
+static int
+cn20k_nix_tx_compl_setup(struct cnxk_eth_dev *dev, struct cn20k_eth_txq *txq, struct roc_nix_sq *sq,
+			 uint16_t nb_desc)
+{
+	struct roc_nix_cq *cq;
+
+	cq = &dev->cqs[sq->cqid];
+	txq->tx_compl.desc_base = (uintptr_t)cq->desc_base;
+	txq->tx_compl.cq_door = cq->door;
+	txq->tx_compl.cq_status = cq->status;
+	txq->tx_compl.wdata = cq->wdata;
+	txq->tx_compl.head = cq->head;
+	txq->tx_compl.qmask = cq->qmask;
+	/* Total array size holding buffers is equal to
+	 * number of entries in cq and sq
+	 * max buffer in array = desc in cq + desc in sq
+	 */
+	txq->tx_compl.nb_desc_mask = (2 * rte_align32pow2(nb_desc)) - 1;
+	txq->tx_compl.ena = true;
+
+	txq->tx_compl.ptr = (struct rte_mbuf **)plt_zmalloc(
+		txq->tx_compl.nb_desc_mask * sizeof(struct rte_mbuf *), 0);
+	if (!txq->tx_compl.ptr)
+		return -1;
+
+	return 0;
+}
+
+static void
+cn20k_nix_tx_queue_release(struct rte_eth_dev *eth_dev, uint16_t qid)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct roc_nix *nix = &dev->nix;
+	struct cn20k_eth_txq *txq;
+
+	cnxk_nix_tx_queue_release(eth_dev, qid);
+	txq = eth_dev->data->tx_queues[qid];
+
+	if (nix->tx_compl_ena)
+		plt_free(txq->tx_compl.ptr);
+}
+
+static int
+cn20k_nix_tx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid, uint16_t nb_desc,
+			 unsigned int socket, const struct rte_eth_txconf *tx_conf)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct roc_nix *nix = &dev->nix;
+	uint64_t mark_fmt, mark_flag;
+	struct roc_cpt_lf *inl_lf;
+	struct cn20k_eth_txq *txq;
+	struct roc_nix_sq *sq;
+	uint16_t crypto_qid;
+	int rc;
+
+	RTE_SET_USED(socket);
+
+	/* Common Tx queue setup */
+	rc = cnxk_nix_tx_queue_setup(eth_dev, qid, nb_desc, sizeof(struct cn20k_eth_txq), tx_conf);
+	if (rc)
+		return rc;
+
+	sq = &dev->sqs[qid];
+	/* Update fast path queue */
+	txq = eth_dev->data->tx_queues[qid];
+	txq->fc_mem = sq->fc;
+	if (nix->tx_compl_ena) {
+		rc = cn20k_nix_tx_compl_setup(dev, txq, sq, nb_desc);
+		if (rc)
+			return rc;
+	}
+
+	/* Set Txq flag for MT_LOCKFREE */
+	txq->flag = !!(dev->tx_offloads & RTE_ETH_TX_OFFLOAD_MT_LOCKFREE);
+
+	/* Store lmt base in tx queue for easy access */
+	txq->lmt_base = nix->lmt_base;
+	txq->io_addr = sq->io_addr;
+	txq->nb_sqb_bufs_adj = sq->nb_sqb_bufs_adj;
+	txq->sqes_per_sqb_log2 = sq->sqes_per_sqb_log2;
+
+	/* Fetch CPT LF info for outbound if present */
+	if (dev->outb.lf_base) {
+		crypto_qid = qid % dev->outb.nb_crypto_qs;
+		inl_lf = dev->outb.lf_base + crypto_qid;
+
+		txq->cpt_io_addr = inl_lf->io_addr;
+		txq->cpt_fc = inl_lf->fc_addr;
+		txq->cpt_fc_sw = (int32_t *)((uintptr_t)dev->outb.fc_sw_mem +
+					     crypto_qid * RTE_CACHE_LINE_SIZE);
+
+		txq->cpt_desc = inl_lf->nb_desc * 0.7;
+		txq->sa_base = (uint64_t)dev->outb.sa_base;
+		txq->sa_base |= (uint64_t)eth_dev->data->port_id;
+		PLT_STATIC_ASSERT(ROC_NIX_INL_SA_BASE_ALIGN == BIT_ULL(16));
+	}
+
+	/* Restore marking flag from roc */
+	mark_fmt = roc_nix_tm_mark_format_get(nix, &mark_flag);
+	txq->mark_flag = mark_flag & CNXK_TM_MARK_MASK;
+	txq->mark_fmt = mark_fmt & CNXK_TX_MARK_FMT_MASK;
+
+	nix_form_default_desc(dev, txq, qid);
+	txq->lso_tun_fmt = dev->lso_tun_fmt;
+	return 0;
+}
+
+static int
+cn20k_nix_rx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid, uint16_t nb_desc,
+			 unsigned int socket, const struct rte_eth_rxconf *rx_conf,
+			 struct rte_mempool *mp)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct cn20k_eth_rxq *rxq;
+	struct roc_nix_rq *rq;
+	struct roc_nix_cq *cq;
+	int rc;
+
+	RTE_SET_USED(socket);
+
+	/* CQ Errata needs min 4K ring */
+	if (dev->cq_min_4k && nb_desc < 4096)
+		nb_desc = 4096;
+
+	/* Common Rx queue setup */
+	rc = cnxk_nix_rx_queue_setup(eth_dev, qid, nb_desc, sizeof(struct cn20k_eth_rxq), rx_conf,
+				     mp);
+	if (rc)
+		return rc;
+
+	/* Do initial mtu setup for RQ0 before device start */
+	if (!qid) {
+		rc = nix_recalc_mtu(eth_dev);
+		if (rc)
+			return rc;
+	}
+
+	rq = &dev->rqs[qid];
+	cq = &dev->cqs[qid];
+
+	/* Update fast path queue */
+	rxq = eth_dev->data->rx_queues[qid];
+	rxq->rq = qid;
+	rxq->desc = (uintptr_t)cq->desc_base;
+	rxq->cq_door = cq->door;
+	rxq->cq_status = cq->status;
+	rxq->wdata = cq->wdata;
+	rxq->head = cq->head;
+	rxq->qmask = cq->qmask;
+	rxq->tstamp = &dev->tstamp;
+
+	/* Data offset from data to start of mbuf is first_skip */
+	rxq->data_off = rq->first_skip;
+	rxq->mbuf_initializer = cnxk_nix_rxq_mbuf_setup(dev);
+
+	/* Setup security related info */
+	if (dev->rx_offload_flags & NIX_RX_OFFLOAD_SECURITY_F) {
+		rxq->lmt_base = dev->nix.lmt_base;
+		rxq->sa_base = roc_nix_inl_inb_sa_base_get(&dev->nix, dev->inb.inl_dev);
+	}
+
+	/* Lookup mem */
+	rxq->lookup_mem = cnxk_nix_fastpath_lookup_mem_get();
+	return 0;
+}
+
+static int
+cn20k_nix_tx_queue_stop(struct rte_eth_dev *eth_dev, uint16_t qidx)
+{
+	struct cn20k_eth_txq *txq = eth_dev->data->tx_queues[qidx];
+	int rc;
+
+	rc = cnxk_nix_tx_queue_stop(eth_dev, qidx);
+	if (rc)
+		return rc;
+
+	/* Clear fc cache pkts to trigger worker stop */
+	txq->fc_cache_pkts = 0;
+
+	return 0;
+}
+
+static int
+cn20k_nix_configure(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	int rc;
+
+	/* Common nix configure */
+	rc = cnxk_nix_configure(eth_dev);
+	if (rc)
+		return rc;
+
+	/* reset reassembly dynfield/flag offset */
+	dev->reass_dynfield_off = -1;
+	dev->reass_dynflag_bit = -1;
+
+	plt_nix_dbg("Configured port%d platform specific rx_offload_flags=%x"
+		    " tx_offload_flags=0x%x",
+		    eth_dev->data->port_id, dev->rx_offload_flags, dev->tx_offload_flags);
+	return 0;
+}
+
+static int
+cn20k_nix_timesync_enable(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	int i, rc;
+
+	rc = cnxk_nix_timesync_enable(eth_dev);
+	if (rc)
+		return rc;
+
+	dev->rx_offload_flags |= NIX_RX_OFFLOAD_TSTAMP_F;
+	dev->tx_offload_flags |= NIX_TX_OFFLOAD_TSTAMP_F;
+
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
+		nix_form_default_desc(dev, eth_dev->data->tx_queues[i], i);
+
+	return 0;
+}
+
+static int
+cn20k_nix_timesync_disable(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	int i, rc;
+
+	rc = cnxk_nix_timesync_disable(eth_dev);
+	if (rc)
+		return rc;
+
+	dev->rx_offload_flags &= ~NIX_RX_OFFLOAD_TSTAMP_F;
+	dev->tx_offload_flags &= ~NIX_TX_OFFLOAD_TSTAMP_F;
+
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
+		nix_form_default_desc(dev, eth_dev->data->tx_queues[i], i);
+
+	return 0;
+}
+
+static int
+cn20k_nix_timesync_read_tx_timestamp(struct rte_eth_dev *eth_dev, struct timespec *timestamp)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct cnxk_timesync_info *tstamp = &dev->tstamp;
+	uint64_t ns;
+
+	if (*tstamp->tx_tstamp == 0)
+		return -EINVAL;
+
+	*tstamp->tx_tstamp =
+		((*tstamp->tx_tstamp >> 32) * NSEC_PER_SEC) + (*tstamp->tx_tstamp & 0xFFFFFFFFUL);
+	ns = rte_timecounter_update(&dev->tx_tstamp_tc, *tstamp->tx_tstamp);
+	*timestamp = rte_ns_to_timespec(ns);
+	*tstamp->tx_tstamp = 0;
+	rte_wmb();
+
+	return 0;
+}
+
+static int
+cn20k_nix_dev_start(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct roc_nix *nix = &dev->nix;
+	int rc;
+
+	/* Common eth dev start */
+	rc = cnxk_nix_dev_start(eth_dev);
+	if (rc)
+		return rc;
+
+	/* Set flags for Rx Inject feature */
+	if (roc_idev_nix_rx_inject_get(nix->port_id))
+		dev->rx_offload_flags |= NIX_RX_SEC_REASSEMBLY_F;
+
+	return 0;
+}
+
+static int
+cn20k_nix_reassembly_capability_get(struct rte_eth_dev *eth_dev,
+				    struct rte_eth_ip_reassembly_params *reassembly_capa)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	int rc = -ENOTSUP;
+	RTE_SET_USED(eth_dev);
+
+	if (!roc_feature_nix_has_reass())
+		return -ENOTSUP;
+
+	if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SECURITY) {
+		reassembly_capa->timeout_ms = 60 * 1000;
+		reassembly_capa->max_frags = 4;
+		reassembly_capa->flags =
+			RTE_ETH_DEV_REASSEMBLY_F_IPV4 | RTE_ETH_DEV_REASSEMBLY_F_IPV6;
+		rc = 0;
+	}
+
+	return rc;
+}
+
+static int
+cn20k_nix_reassembly_conf_get(struct rte_eth_dev *eth_dev,
+			      struct rte_eth_ip_reassembly_params *conf)
+{
+	RTE_SET_USED(eth_dev);
+	RTE_SET_USED(conf);
+	return -ENOTSUP;
+}
+
+static int
+cn20k_nix_reassembly_conf_set(struct rte_eth_dev *eth_dev,
+			      const struct rte_eth_ip_reassembly_params *conf)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	int rc = 0;
+
+	if (!roc_feature_nix_has_reass())
+		return -ENOTSUP;
+
+	if (!conf->flags) {
+		/* Clear offload flags on disable */
+		if (!dev->inb.nb_oop)
+			dev->rx_offload_flags &= ~NIX_RX_REAS_F;
+		dev->inb.reass_en = false;
+		return 0;
+	}
+
+	rc = roc_nix_reassembly_configure(conf->timeout_ms, conf->max_frags);
+	if (!rc && dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SECURITY) {
+		dev->rx_offload_flags |= NIX_RX_REAS_F;
+		dev->inb.reass_en = true;
+	}
+
+	return rc;
+}
+
+static int
+cn20k_nix_rx_avail_get(struct cn20k_eth_rxq *rxq)
+{
+	uint32_t qmask = rxq->qmask;
+	uint64_t reg, head, tail;
+	int available;
+
+	/* Use LDADDA version to avoid reorder */
+	reg = roc_atomic64_add_sync(rxq->wdata, rxq->cq_status);
+	/* CQ_OP_STATUS operation error */
+	if (reg & BIT_ULL(NIX_CQ_OP_STAT_OP_ERR) || reg & BIT_ULL(NIX_CQ_OP_STAT_CQ_ERR))
+		return 0;
+	tail = reg & 0xFFFFF;
+	head = (reg >> 20) & 0xFFFFF;
+	if (tail < head)
+		available = tail - head + qmask + 1;
+	else
+		available = tail - head;
+
+	return available;
+}
+
+static int
+cn20k_rx_descriptor_dump(const struct rte_eth_dev *eth_dev, uint16_t qid, uint16_t offset,
+			 uint16_t num, FILE *file)
+{
+	struct cn20k_eth_rxq *rxq = eth_dev->data->rx_queues[qid];
+	const uint64_t data_off = rxq->data_off;
+	const uint32_t qmask = rxq->qmask;
+	const uintptr_t desc = rxq->desc;
+	struct cpt_parse_hdr_s *cpth;
+	uint32_t head = rxq->head;
+	struct nix_cqe_hdr_s *cq;
+	uint16_t count = 0;
+	int available_pkts;
+	uint64_t cq_w1;
+
+	available_pkts = cn20k_nix_rx_avail_get(rxq);
+
+	if ((offset + num - 1) >= available_pkts) {
+		plt_err("Invalid BD num=%u\n", num);
+		return -EINVAL;
+	}
+
+	while (count < num) {
+		cq = (struct nix_cqe_hdr_s *)(desc + CQE_SZ(head) + count + offset);
+		cq_w1 = *((const uint64_t *)cq + 1);
+		if (cq_w1 & BIT(11)) {
+			rte_iova_t buff = *((rte_iova_t *)((uint64_t *)cq + 9));
+			struct rte_mbuf *mbuf = (struct rte_mbuf *)(buff - data_off);
+			cpth = (struct cpt_parse_hdr_s *)((uintptr_t)mbuf + (uint16_t)data_off);
+			roc_cpt_parse_hdr_dump(file, cpth);
+		} else {
+			roc_nix_cqe_dump(file, cq);
+		}
+
+		count++;
+		head &= qmask;
+	}
+	return 0;
+}
+
+/* Update platform specific eth dev ops */
+static void
+nix_eth_dev_ops_override(void)
+{
+	static int init_once;
+
+	if (init_once)
+		return;
+	init_once = 1;
+
+	/* Update platform specific ops */
+	cnxk_eth_dev_ops.dev_configure = cn20k_nix_configure;
+	cnxk_eth_dev_ops.tx_queue_setup = cn20k_nix_tx_queue_setup;
+	cnxk_eth_dev_ops.rx_queue_setup = cn20k_nix_rx_queue_setup;
+	cnxk_eth_dev_ops.tx_queue_release = cn20k_nix_tx_queue_release;
+	cnxk_eth_dev_ops.tx_queue_stop = cn20k_nix_tx_queue_stop;
+	cnxk_eth_dev_ops.dev_start = cn20k_nix_dev_start;
+	cnxk_eth_dev_ops.dev_ptypes_set = cn20k_nix_ptypes_set;
+	cnxk_eth_dev_ops.timesync_enable = cn20k_nix_timesync_enable;
+	cnxk_eth_dev_ops.timesync_disable = cn20k_nix_timesync_disable;
+	cnxk_eth_dev_ops.timesync_read_tx_timestamp = cn20k_nix_timesync_read_tx_timestamp;
+	cnxk_eth_dev_ops.ip_reassembly_capability_get = cn20k_nix_reassembly_capability_get;
+	cnxk_eth_dev_ops.ip_reassembly_conf_get = cn20k_nix_reassembly_conf_get;
+	cnxk_eth_dev_ops.ip_reassembly_conf_set = cn20k_nix_reassembly_conf_set;
+	cnxk_eth_dev_ops.eth_rx_descriptor_dump = cn20k_rx_descriptor_dump;
+}
+
+/* Update platform specific tm ops */
+static void
+nix_tm_ops_override(void)
+{
+	static int init_once;
+
+	if (init_once)
+		return;
+	init_once = 1;
+}
+
+static int
+cn20k_nix_remove(struct rte_pci_device *pci_dev)
+{
+	return cnxk_nix_remove(pci_dev);
+}
+
+static int
+cn20k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
+{
+	struct rte_eth_dev *eth_dev;
+	int rc;
+
+	rc = roc_plt_init();
+	if (rc) {
+		plt_err("Failed to initialize platform model, rc=%d", rc);
+		return rc;
+	}
+
+	nix_eth_dev_ops_override();
+	nix_tm_ops_override();
+
+	/* Common probe */
+	rc = cnxk_nix_probe(pci_drv, pci_dev);
+	if (rc)
+		return rc;
+
+	/* Find eth dev allocated */
+	eth_dev = rte_eth_dev_allocated(pci_dev->device.name);
+	if (!eth_dev) {
+		/* Ignore if ethdev is in mid of detach state in secondary */
+		if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+			return 0;
+		return -ENOENT;
+	}
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return 0;
+
+	return 0;
+}
+
+static const struct rte_pci_id cn20k_pci_nix_map[] = {
+	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN20KA, PCI_DEVID_CNXK_RVU_PF),
+	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN20KA, PCI_DEVID_CNXK_RVU_VF),
+	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN20KA, PCI_DEVID_CNXK_RVU_AF_VF),
+	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN20KA, PCI_DEVID_CNXK_RVU_SDP_VF),
+	{
+		.vendor_id = 0,
+	},
+};
+
+static struct rte_pci_driver cn20k_pci_nix = {
+	.id_table = cn20k_pci_nix_map,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA | RTE_PCI_DRV_INTR_LSC,
+	.probe = cn20k_nix_probe,
+	.remove = cn20k_nix_remove,
+};
+
+RTE_PMD_REGISTER_PCI(net_cn20k, cn20k_pci_nix);
+RTE_PMD_REGISTER_PCI_TABLE(net_cn20k, cn20k_pci_nix_map);
+RTE_PMD_REGISTER_KMOD_DEP(net_cn20k, "vfio-pci");
diff --git a/drivers/net/cnxk/cn20k_ethdev.h b/drivers/net/cnxk/cn20k_ethdev.h
new file mode 100644
index 0000000000..1af490befc
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_ethdev.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+#ifndef __CN20K_ETHDEV_H__
+#define __CN20K_ETHDEV_H__
+
+#include <cn20k_rxtx.h>
+#include <cnxk_ethdev.h>
+#include <cnxk_security.h>
+
+#endif /* __CN20K_ETHDEV_H__ */
diff --git a/drivers/net/cnxk/cn20k_rx.h b/drivers/net/cnxk/cn20k_rx.h
new file mode 100644
index 0000000000..58a2920a54
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_rx.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+#ifndef __CN20K_RX_H__
+#define __CN20K_RX_H__
+
+#include "cn20k_rxtx.h"
+#include <rte_ethdev.h>
+#include <rte_security_driver.h>
+#include <rte_vect.h>
+
+#define NSEC_PER_SEC 1000000000L
+
+#define NIX_RX_OFFLOAD_NONE	     (0)
+#define NIX_RX_OFFLOAD_RSS_F	     BIT(0)
+#define NIX_RX_OFFLOAD_PTYPE_F	     BIT(1)
+#define NIX_RX_OFFLOAD_CHECKSUM_F    BIT(2)
+#define NIX_RX_OFFLOAD_MARK_UPDATE_F BIT(3)
+#define NIX_RX_OFFLOAD_TSTAMP_F	     BIT(4)
+#define NIX_RX_OFFLOAD_VLAN_STRIP_F  BIT(5)
+#define NIX_RX_OFFLOAD_SECURITY_F    BIT(6)
+#define NIX_RX_OFFLOAD_MAX	     (NIX_RX_OFFLOAD_SECURITY_F << 1)
+
+/* Flags to control cqe_to_mbuf conversion function.
+ * Defining it from backwards to denote its been
+ * not used as offload flags to pick function
+ */
+#define NIX_RX_REAS_F	   BIT(12)
+#define NIX_RX_VWQE_F	   BIT(13)
+#define NIX_RX_MULTI_SEG_F BIT(14)
+
+#define NIX_RX_SEC_REASSEMBLY_F (NIX_RX_REAS_F | NIX_RX_OFFLOAD_SECURITY_F)
+#endif /* __CN20K_RX_H__ */
diff --git a/drivers/net/cnxk/cn20k_rxtx.h b/drivers/net/cnxk/cn20k_rxtx.h
new file mode 100644
index 0000000000..5cc445d4b1
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_rxtx.h
@@ -0,0 +1,89 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#ifndef __CN20K_RXTX_H__
+#define __CN20K_RXTX_H__
+
+#include <rte_security.h>
+
+/* ROC Constants */
+#include "roc_constants.h"
+
+/* Platform definition */
+#include "roc_platform.h"
+
+/* IO */
+#if defined(__aarch64__)
+#include "roc_io.h"
+#else
+#include "roc_io_generic.h"
+#endif
+
+/* HW structure definition */
+#include "hw/cpt.h"
+#include "hw/nix.h"
+#include "hw/npa.h"
+#include "hw/npc.h"
+#include "hw/ssow.h"
+
+#include "roc_ie_ot.h"
+
+/* NPA */
+#include "roc_npa_dp.h"
+
+/* SSO */
+#include "roc_sso_dp.h"
+
+/* CPT */
+#include "roc_cpt.h"
+
+/* NIX Inline dev */
+#include "roc_nix_inl_dp.h"
+
+#include "cnxk_ethdev_dp.h"
+
+struct cn20k_eth_txq {
+	uint64_t send_hdr_w0;
+	int64_t fc_cache_pkts;
+	uint64_t *fc_mem;
+	uintptr_t lmt_base;
+	rte_iova_t io_addr;
+	uint16_t sqes_per_sqb_log2;
+	int16_t nb_sqb_bufs_adj;
+	uint8_t flag;
+	rte_iova_t cpt_io_addr;
+	uint64_t sa_base;
+	uint64_t *cpt_fc;
+	uint16_t cpt_desc;
+	int32_t *cpt_fc_sw;
+	uint64_t lso_tun_fmt;
+	uint64_t ts_mem;
+	uint64_t mark_flag : 8;
+	uint64_t mark_fmt : 48;
+	struct cnxk_eth_txq_comp tx_compl;
+} __plt_cache_aligned;
+
+struct cn20k_eth_rxq {
+	uint64_t mbuf_initializer;
+	uintptr_t desc;
+	void *lookup_mem;
+	uintptr_t cq_door;
+	uint64_t wdata;
+	int64_t *cq_status;
+	uint32_t head;
+	uint32_t qmask;
+	uint32_t available;
+	uint16_t data_off;
+	uint64_t sa_base;
+	uint64_t lmt_base;
+	uint64_t meta_aura;
+	uintptr_t meta_pool;
+	uint16_t rq;
+	struct cnxk_timesync_info *tstamp;
+} __plt_cache_aligned;
+
+#define LMT_OFF(lmt_addr, lmt_num, offset)                                                         \
+	(void *)((uintptr_t)(lmt_addr) + ((uint64_t)(lmt_num) << ROC_LMT_LINE_SIZE_LOG2) + (offset))
+
+#endif /* __CN20K_RXTX_H__ */
diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
new file mode 100644
index 0000000000..a00c9d5776
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+#ifndef __CN20K_TX_H__
+#define __CN20K_TX_H__
+
+#include "cn20k_rxtx.h"
+#include <rte_eventdev.h>
+#include <rte_vect.h>
+
+#define NIX_TX_OFFLOAD_NONE	      (0)
+#define NIX_TX_OFFLOAD_L3_L4_CSUM_F   BIT(0)
+#define NIX_TX_OFFLOAD_OL3_OL4_CSUM_F BIT(1)
+#define NIX_TX_OFFLOAD_VLAN_QINQ_F    BIT(2)
+#define NIX_TX_OFFLOAD_MBUF_NOFF_F    BIT(3)
+#define NIX_TX_OFFLOAD_TSO_F	      BIT(4)
+#define NIX_TX_OFFLOAD_TSTAMP_F	      BIT(5)
+#define NIX_TX_OFFLOAD_SECURITY_F     BIT(6)
+#define NIX_TX_OFFLOAD_MAX	      (NIX_TX_OFFLOAD_SECURITY_F << 1)
+
+/* Flags to control xmit_prepare function.
+ * Defining it from backwards to denote its been
+ * not used as offload flags to pick function
+ */
+#define NIX_TX_VWQE_F	   BIT(14)
+#define NIX_TX_MULTI_SEG_F BIT(15)
+
+#define NIX_TX_NEED_SEND_HDR_W1                                                                    \
+	(NIX_TX_OFFLOAD_L3_L4_CSUM_F | NIX_TX_OFFLOAD_OL3_OL4_CSUM_F |                             \
+	 NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSO_F)
+
+#define NIX_TX_NEED_EXT_HDR                                                                        \
+	(NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSTAMP_F | NIX_TX_OFFLOAD_TSO_F)
+
+#endif /* __CN20K_TX_H__ */
diff --git a/drivers/net/cnxk/cnxk_ethdev_dp.h b/drivers/net/cnxk/cnxk_ethdev_dp.h
index 119bb1836a..100d22e759 100644
--- a/drivers/net/cnxk/cnxk_ethdev_dp.h
+++ b/drivers/net/cnxk/cnxk_ethdev_dp.h
@@ -59,6 +59,9 @@
 
 #define CNXK_TX_MARK_FMT_MASK (0xFFFFFFFFFFFFull)
 
+#define CNXK_NIX_CQ_ENTRY_SZ 128
+#define CQE_SZ(x)            ((x) * CNXK_NIX_CQ_ENTRY_SZ)
+
 struct cnxk_eth_txq_comp {
 	uintptr_t desc_base;
 	uintptr_t cq_door;
diff --git a/drivers/net/cnxk/meson.build b/drivers/net/cnxk/meson.build
index 7bce80098a..cf2ce09f77 100644
--- a/drivers/net/cnxk/meson.build
+++ b/drivers/net/cnxk/meson.build
@@ -14,7 +14,7 @@ else
         soc_type = platform
 endif
 
-if soc_type != 'cn9k' and soc_type != 'cn10k'
+if soc_type != 'cn9k' and soc_type != 'cn10k' and soc_type != 'cn20k'
         soc_type = 'all'
 endif
 
@@ -231,6 +231,15 @@ sources += files(
 endif
 endif
 
+
+if soc_type == 'cn20k' or soc_type == 'all'
+# CN20K
+sources += files(
+        'cn20k_ethdev.c',
+)
+endif
+
+
 deps += ['bus_pci', 'cryptodev', 'eventdev', 'security']
 deps += ['common_cnxk', 'mempool_cnxk']
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 19/33] net/cnxk: support Rx function select for cn20k
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (17 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 18/33] net/cnxk: add cn20k base control path support Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 20/33] net/cnxk: support Tx " Nithin Dabilpuram
                   ` (16 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra, Anatoly Burakov
  Cc: dev

Add support to select Rx function based on offload flags
for cn20k.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
---
 drivers/net/cnxk/cn20k_ethdev.c               |  59 ++++-
 drivers/net/cnxk/cn20k_ethdev.h               |   3 +
 drivers/net/cnxk/cn20k_rx.h                   | 226 ++++++++++++++++++
 drivers/net/cnxk/cn20k_rx_select.c            | 162 +++++++++++++
 drivers/net/cnxk/meson.build                  |  44 ++++
 drivers/net/cnxk/rx/cn20k/rx_0_15.c           |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c      |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c       |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c  |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_112_127.c        |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c   |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c    |  20 ++
 .../net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c   |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_16_31.c          |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c     |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c      |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_32_47.c          |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c     |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c      |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_48_63.c          |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c     |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c      |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_64_79.c          |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c     |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c      |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_80_95.c          |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c     |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c      |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_96_111.c         |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c    |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c     |  20 ++
 .../net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c    |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_all_offload.c    |  57 +++++
 38 files changed, 1190 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/cnxk/cn20k_rx_select.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_all_offload.c

diff --git a/drivers/net/cnxk/cn20k_ethdev.c b/drivers/net/cnxk/cn20k_ethdev.c
index b4d21fe4be..d1cb3a52bf 100644
--- a/drivers/net/cnxk/cn20k_ethdev.c
+++ b/drivers/net/cnxk/cn20k_ethdev.c
@@ -5,6 +5,41 @@
 #include "cn20k_rx.h"
 #include "cn20k_tx.h"
 
+static uint16_t
+nix_rx_offload_flags(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct rte_eth_dev_data *data = eth_dev->data;
+	struct rte_eth_conf *conf = &data->dev_conf;
+	struct rte_eth_rxmode *rxmode = &conf->rxmode;
+	uint16_t flags = 0;
+
+	if (rxmode->mq_mode == RTE_ETH_MQ_RX_RSS &&
+	    (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH))
+		flags |= NIX_RX_OFFLOAD_RSS_F;
+
+	if (dev->rx_offloads & (RTE_ETH_RX_OFFLOAD_TCP_CKSUM | RTE_ETH_RX_OFFLOAD_UDP_CKSUM))
+		flags |= NIX_RX_OFFLOAD_CHECKSUM_F;
+
+	if (dev->rx_offloads &
+	    (RTE_ETH_RX_OFFLOAD_IPV4_CKSUM | RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM))
+		flags |= NIX_RX_OFFLOAD_CHECKSUM_F;
+
+	if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER)
+		flags |= NIX_RX_MULTI_SEG_F;
+
+	if ((dev->rx_offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP))
+		flags |= NIX_RX_OFFLOAD_TSTAMP_F;
+
+	if (!dev->ptype_disable)
+		flags |= NIX_RX_OFFLOAD_PTYPE_F;
+
+	if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SECURITY)
+		flags |= NIX_RX_OFFLOAD_SECURITY_F;
+
+	return flags;
+}
+
 static int
 cn20k_nix_ptypes_set(struct rte_eth_dev *eth_dev, uint32_t ptype_mask)
 {
@@ -18,6 +53,7 @@ cn20k_nix_ptypes_set(struct rte_eth_dev *eth_dev, uint32_t ptype_mask)
 		dev->ptype_disable = 1;
 	}
 
+	cn20k_eth_set_rx_function(eth_dev);
 	return 0;
 }
 
@@ -187,6 +223,9 @@ cn20k_nix_rx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid, uint16_t nb_
 		rc = nix_recalc_mtu(eth_dev);
 		if (rc)
 			return rc;
+
+		/* Update offload flags */
+		dev->rx_offload_flags = nix_rx_offload_flags(eth_dev);
 	}
 
 	rq = &dev->rqs[qid];
@@ -245,6 +284,8 @@ cn20k_nix_configure(struct rte_eth_dev *eth_dev)
 	if (rc)
 		return rc;
 
+	/* Update offload flags */
+	dev->rx_offload_flags = nix_rx_offload_flags(eth_dev);
 	/* reset reassembly dynfield/flag offset */
 	dev->reass_dynfield_off = -1;
 	dev->reass_dynflag_bit = -1;
@@ -271,6 +312,10 @@ cn20k_nix_timesync_enable(struct rte_eth_dev *eth_dev)
 	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
 		nix_form_default_desc(dev, eth_dev->data->tx_queues[i], i);
 
+	/* Setting up the rx[tx]_offload_flags due to change
+	 * in rx[tx]_offloads.
+	 */
+	cn20k_eth_set_rx_function(eth_dev);
 	return 0;
 }
 
@@ -290,6 +335,10 @@ cn20k_nix_timesync_disable(struct rte_eth_dev *eth_dev)
 	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
 		nix_form_default_desc(dev, eth_dev->data->tx_queues[i], i);
 
+	/* Setting up the rx[tx]_offload_flags due to change
+	 * in rx[tx]_offloads.
+	 */
+	cn20k_eth_set_rx_function(eth_dev);
 	return 0;
 }
 
@@ -325,10 +374,15 @@ cn20k_nix_dev_start(struct rte_eth_dev *eth_dev)
 	if (rc)
 		return rc;
 
+	/* Setting up the rx[tx]_offload_flags due to change
+	 * in rx[tx]_offloads.
+	 */
+	dev->rx_offload_flags |= nix_rx_offload_flags(eth_dev);
 	/* Set flags for Rx Inject feature */
 	if (roc_idev_nix_rx_inject_get(nix->port_id))
 		dev->rx_offload_flags |= NIX_RX_SEC_REASSEMBLY_F;
 
+	cn20k_eth_set_rx_function(eth_dev);
 	return 0;
 }
 
@@ -525,8 +579,11 @@ cn20k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		return -ENOENT;
 	}
 
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		/* Setup callbacks for secondary process */
+		cn20k_eth_set_rx_function(eth_dev);
 		return 0;
+	}
 
 	return 0;
 }
diff --git a/drivers/net/cnxk/cn20k_ethdev.h b/drivers/net/cnxk/cn20k_ethdev.h
index 1af490befc..2049ee7fa4 100644
--- a/drivers/net/cnxk/cn20k_ethdev.h
+++ b/drivers/net/cnxk/cn20k_ethdev.h
@@ -8,4 +8,7 @@
 #include <cnxk_ethdev.h>
 #include <cnxk_security.h>
 
+/* Rx and Tx routines */
+void cn20k_eth_set_rx_function(struct rte_eth_dev *eth_dev);
+
 #endif /* __CN20K_ETHDEV_H__ */
diff --git a/drivers/net/cnxk/cn20k_rx.h b/drivers/net/cnxk/cn20k_rx.h
index 58a2920a54..2cb77c0b46 100644
--- a/drivers/net/cnxk/cn20k_rx.h
+++ b/drivers/net/cnxk/cn20k_rx.h
@@ -30,4 +30,230 @@
 #define NIX_RX_MULTI_SEG_F BIT(14)
 
 #define NIX_RX_SEC_REASSEMBLY_F (NIX_RX_REAS_F | NIX_RX_OFFLOAD_SECURITY_F)
+
+#define RSS_F	  NIX_RX_OFFLOAD_RSS_F
+#define PTYPE_F	  NIX_RX_OFFLOAD_PTYPE_F
+#define CKSUM_F	  NIX_RX_OFFLOAD_CHECKSUM_F
+#define MARK_F	  NIX_RX_OFFLOAD_MARK_UPDATE_F
+#define TS_F	  NIX_RX_OFFLOAD_TSTAMP_F
+#define RX_VLAN_F NIX_RX_OFFLOAD_VLAN_STRIP_F
+#define R_SEC_F	  NIX_RX_OFFLOAD_SECURITY_F
+
+/* [R_SEC_F] [RX_VLAN_F] [TS] [MARK] [CKSUM] [PTYPE] [RSS] */
+#define NIX_RX_FASTPATH_MODES_0_15                                                                 \
+	R(no_offload, NIX_RX_OFFLOAD_NONE)                                                         \
+	R(rss, RSS_F)                                                                              \
+	R(ptype, PTYPE_F)                                                                          \
+	R(ptype_rss, PTYPE_F | RSS_F)                                                              \
+	R(cksum, CKSUM_F)                                                                          \
+	R(cksum_rss, CKSUM_F | RSS_F)                                                              \
+	R(cksum_ptype, CKSUM_F | PTYPE_F)                                                          \
+	R(cksum_ptype_rss, CKSUM_F | PTYPE_F | RSS_F)                                              \
+	R(mark, MARK_F)                                                                            \
+	R(mark_rss, MARK_F | RSS_F)                                                                \
+	R(mark_ptype, MARK_F | PTYPE_F)                                                            \
+	R(mark_ptype_rss, MARK_F | PTYPE_F | RSS_F)                                                \
+	R(mark_cksum, MARK_F | CKSUM_F)                                                            \
+	R(mark_cksum_rss, MARK_F | CKSUM_F | RSS_F)                                                \
+	R(mark_cksum_ptype, MARK_F | CKSUM_F | PTYPE_F)                                            \
+	R(mark_cksum_ptype_rss, MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_16_31                                                                \
+	R(ts, TS_F)                                                                                \
+	R(ts_rss, TS_F | RSS_F)                                                                    \
+	R(ts_ptype, TS_F | PTYPE_F)                                                                \
+	R(ts_ptype_rss, TS_F | PTYPE_F | RSS_F)                                                    \
+	R(ts_cksum, TS_F | CKSUM_F)                                                                \
+	R(ts_cksum_rss, TS_F | CKSUM_F | RSS_F)                                                    \
+	R(ts_cksum_ptype, TS_F | CKSUM_F | PTYPE_F)                                                \
+	R(ts_cksum_ptype_rss, TS_F | CKSUM_F | PTYPE_F | RSS_F)                                    \
+	R(ts_mark, TS_F | MARK_F)                                                                  \
+	R(ts_mark_rss, TS_F | MARK_F | RSS_F)                                                      \
+	R(ts_mark_ptype, TS_F | MARK_F | PTYPE_F)                                                  \
+	R(ts_mark_ptype_rss, TS_F | MARK_F | PTYPE_F | RSS_F)                                      \
+	R(ts_mark_cksum, TS_F | MARK_F | CKSUM_F)                                                  \
+	R(ts_mark_cksum_rss, TS_F | MARK_F | CKSUM_F | RSS_F)                                      \
+	R(ts_mark_cksum_ptype, TS_F | MARK_F | CKSUM_F | PTYPE_F)                                  \
+	R(ts_mark_cksum_ptype_rss, TS_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_32_47                                                                \
+	R(vlan, RX_VLAN_F)                                                                         \
+	R(vlan_rss, RX_VLAN_F | RSS_F)                                                             \
+	R(vlan_ptype, RX_VLAN_F | PTYPE_F)                                                         \
+	R(vlan_ptype_rss, RX_VLAN_F | PTYPE_F | RSS_F)                                             \
+	R(vlan_cksum, RX_VLAN_F | CKSUM_F)                                                         \
+	R(vlan_cksum_rss, RX_VLAN_F | CKSUM_F | RSS_F)                                             \
+	R(vlan_cksum_ptype, RX_VLAN_F | CKSUM_F | PTYPE_F)                                         \
+	R(vlan_cksum_ptype_rss, RX_VLAN_F | CKSUM_F | PTYPE_F | RSS_F)                             \
+	R(vlan_mark, RX_VLAN_F | MARK_F)                                                           \
+	R(vlan_mark_rss, RX_VLAN_F | MARK_F | RSS_F)                                               \
+	R(vlan_mark_ptype, RX_VLAN_F | MARK_F | PTYPE_F)                                           \
+	R(vlan_mark_ptype_rss, RX_VLAN_F | MARK_F | PTYPE_F | RSS_F)                               \
+	R(vlan_mark_cksum, RX_VLAN_F | MARK_F | CKSUM_F)                                           \
+	R(vlan_mark_cksum_rss, RX_VLAN_F | MARK_F | CKSUM_F | RSS_F)                               \
+	R(vlan_mark_cksum_ptype, RX_VLAN_F | MARK_F | CKSUM_F | PTYPE_F)                           \
+	R(vlan_mark_cksum_ptype_rss, RX_VLAN_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_48_63                                                                \
+	R(vlan_ts, RX_VLAN_F | TS_F)                                                               \
+	R(vlan_ts_rss, RX_VLAN_F | TS_F | RSS_F)                                                   \
+	R(vlan_ts_ptype, RX_VLAN_F | TS_F | PTYPE_F)                                               \
+	R(vlan_ts_ptype_rss, RX_VLAN_F | TS_F | PTYPE_F | RSS_F)                                   \
+	R(vlan_ts_cksum, RX_VLAN_F | TS_F | CKSUM_F)                                               \
+	R(vlan_ts_cksum_rss, RX_VLAN_F | TS_F | CKSUM_F | RSS_F)                                   \
+	R(vlan_ts_cksum_ptype, RX_VLAN_F | TS_F | CKSUM_F | PTYPE_F)                               \
+	R(vlan_ts_cksum_ptype_rss, RX_VLAN_F | TS_F | CKSUM_F | PTYPE_F | RSS_F)                   \
+	R(vlan_ts_mark, RX_VLAN_F | TS_F | MARK_F)                                                 \
+	R(vlan_ts_mark_rss, RX_VLAN_F | TS_F | MARK_F | RSS_F)                                     \
+	R(vlan_ts_mark_ptype, RX_VLAN_F | TS_F | MARK_F | PTYPE_F)                                 \
+	R(vlan_ts_mark_ptype_rss, RX_VLAN_F | TS_F | MARK_F | PTYPE_F | RSS_F)                     \
+	R(vlan_ts_mark_cksum, RX_VLAN_F | TS_F | MARK_F | CKSUM_F)                                 \
+	R(vlan_ts_mark_cksum_rss, RX_VLAN_F | TS_F | MARK_F | CKSUM_F | RSS_F)                     \
+	R(vlan_ts_mark_cksum_ptype, RX_VLAN_F | TS_F | MARK_F | CKSUM_F | PTYPE_F)                 \
+	R(vlan_ts_mark_cksum_ptype_rss, RX_VLAN_F | TS_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_64_79                                                                \
+	R(sec, R_SEC_F)                                                                            \
+	R(sec_rss, R_SEC_F | RSS_F)                                                                \
+	R(sec_ptype, R_SEC_F | PTYPE_F)                                                            \
+	R(sec_ptype_rss, R_SEC_F | PTYPE_F | RSS_F)                                                \
+	R(sec_cksum, R_SEC_F | CKSUM_F)                                                            \
+	R(sec_cksum_rss, R_SEC_F | CKSUM_F | RSS_F)                                                \
+	R(sec_cksum_ptype, R_SEC_F | CKSUM_F | PTYPE_F)                                            \
+	R(sec_cksum_ptype_rss, R_SEC_F | CKSUM_F | PTYPE_F | RSS_F)                                \
+	R(sec_mark, R_SEC_F | MARK_F)                                                              \
+	R(sec_mark_rss, R_SEC_F | MARK_F | RSS_F)                                                  \
+	R(sec_mark_ptype, R_SEC_F | MARK_F | PTYPE_F)                                              \
+	R(sec_mark_ptype_rss, R_SEC_F | MARK_F | PTYPE_F | RSS_F)                                  \
+	R(sec_mark_cksum, R_SEC_F | MARK_F | CKSUM_F)                                              \
+	R(sec_mark_cksum_rss, R_SEC_F | MARK_F | CKSUM_F | RSS_F)                                  \
+	R(sec_mark_cksum_ptype, R_SEC_F | MARK_F | CKSUM_F | PTYPE_F)                              \
+	R(sec_mark_cksum_ptype_rss, R_SEC_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_80_95                                                                \
+	R(sec_ts, R_SEC_F | TS_F)                                                                  \
+	R(sec_ts_rss, R_SEC_F | TS_F | RSS_F)                                                      \
+	R(sec_ts_ptype, R_SEC_F | TS_F | PTYPE_F)                                                  \
+	R(sec_ts_ptype_rss, R_SEC_F | TS_F | PTYPE_F | RSS_F)                                      \
+	R(sec_ts_cksum, R_SEC_F | TS_F | CKSUM_F)                                                  \
+	R(sec_ts_cksum_rss, R_SEC_F | TS_F | CKSUM_F | RSS_F)                                      \
+	R(sec_ts_cksum_ptype, R_SEC_F | TS_F | CKSUM_F | PTYPE_F)                                  \
+	R(sec_ts_cksum_ptype_rss, R_SEC_F | TS_F | CKSUM_F | PTYPE_F | RSS_F)                      \
+	R(sec_ts_mark, R_SEC_F | TS_F | MARK_F)                                                    \
+	R(sec_ts_mark_rss, R_SEC_F | TS_F | MARK_F | RSS_F)                                        \
+	R(sec_ts_mark_ptype, R_SEC_F | TS_F | MARK_F | PTYPE_F)                                    \
+	R(sec_ts_mark_ptype_rss, R_SEC_F | TS_F | MARK_F | PTYPE_F | RSS_F)                        \
+	R(sec_ts_mark_cksum, R_SEC_F | TS_F | MARK_F | CKSUM_F)                                    \
+	R(sec_ts_mark_cksum_rss, R_SEC_F | TS_F | MARK_F | CKSUM_F | RSS_F)                        \
+	R(sec_ts_mark_cksum_ptype, R_SEC_F | TS_F | MARK_F | CKSUM_F | PTYPE_F)                    \
+	R(sec_ts_mark_cksum_ptype_rss, R_SEC_F | TS_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_96_111                                                               \
+	R(sec_vlan, R_SEC_F | RX_VLAN_F)                                                           \
+	R(sec_vlan_rss, R_SEC_F | RX_VLAN_F | RSS_F)                                               \
+	R(sec_vlan_ptype, R_SEC_F | RX_VLAN_F | PTYPE_F)                                           \
+	R(sec_vlan_ptype_rss, R_SEC_F | RX_VLAN_F | PTYPE_F | RSS_F)                               \
+	R(sec_vlan_cksum, R_SEC_F | RX_VLAN_F | CKSUM_F)                                           \
+	R(sec_vlan_cksum_rss, R_SEC_F | RX_VLAN_F | CKSUM_F | RSS_F)                               \
+	R(sec_vlan_cksum_ptype, R_SEC_F | RX_VLAN_F | CKSUM_F | PTYPE_F)                           \
+	R(sec_vlan_cksum_ptype_rss, R_SEC_F | RX_VLAN_F | CKSUM_F | PTYPE_F | RSS_F)               \
+	R(sec_vlan_mark, R_SEC_F | RX_VLAN_F | MARK_F)                                             \
+	R(sec_vlan_mark_rss, R_SEC_F | RX_VLAN_F | MARK_F | RSS_F)                                 \
+	R(sec_vlan_mark_ptype, R_SEC_F | RX_VLAN_F | MARK_F | PTYPE_F)                             \
+	R(sec_vlan_mark_ptype_rss, R_SEC_F | RX_VLAN_F | MARK_F | PTYPE_F | RSS_F)                 \
+	R(sec_vlan_mark_cksum, R_SEC_F | RX_VLAN_F | MARK_F | CKSUM_F)                             \
+	R(sec_vlan_mark_cksum_rss, R_SEC_F | RX_VLAN_F | MARK_F | CKSUM_F | RSS_F)                 \
+	R(sec_vlan_mark_cksum_ptype, R_SEC_F | RX_VLAN_F | MARK_F | CKSUM_F | PTYPE_F)             \
+	R(sec_vlan_mark_cksum_ptype_rss, R_SEC_F | RX_VLAN_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_112_127                                                              \
+	R(sec_vlan_ts, R_SEC_F | RX_VLAN_F | TS_F)                                                 \
+	R(sec_vlan_ts_rss, R_SEC_F | RX_VLAN_F | TS_F | RSS_F)                                     \
+	R(sec_vlan_ts_ptype, R_SEC_F | RX_VLAN_F | TS_F | PTYPE_F)                                 \
+	R(sec_vlan_ts_ptype_rss, R_SEC_F | RX_VLAN_F | TS_F | PTYPE_F | RSS_F)                     \
+	R(sec_vlan_ts_cksum, R_SEC_F | RX_VLAN_F | TS_F | CKSUM_F)                                 \
+	R(sec_vlan_ts_cksum_rss, R_SEC_F | RX_VLAN_F | TS_F | CKSUM_F | RSS_F)                     \
+	R(sec_vlan_ts_cksum_ptype, R_SEC_F | RX_VLAN_F | TS_F | CKSUM_F | PTYPE_F)                 \
+	R(sec_vlan_ts_cksum_ptype_rss, R_SEC_F | RX_VLAN_F | TS_F | CKSUM_F | PTYPE_F | RSS_F)     \
+	R(sec_vlan_ts_mark, R_SEC_F | RX_VLAN_F | TS_F | MARK_F)                                   \
+	R(sec_vlan_ts_mark_rss, R_SEC_F | RX_VLAN_F | TS_F | MARK_F | RSS_F)                       \
+	R(sec_vlan_ts_mark_ptype, R_SEC_F | RX_VLAN_F | TS_F | MARK_F | PTYPE_F)                   \
+	R(sec_vlan_ts_mark_ptype_rss, R_SEC_F | RX_VLAN_F | TS_F | MARK_F | PTYPE_F | RSS_F)       \
+	R(sec_vlan_ts_mark_cksum, R_SEC_F | RX_VLAN_F | TS_F | MARK_F | CKSUM_F)                   \
+	R(sec_vlan_ts_mark_cksum_rss, R_SEC_F | RX_VLAN_F | TS_F | MARK_F | CKSUM_F | RSS_F)       \
+	R(sec_vlan_ts_mark_cksum_ptype, R_SEC_F | RX_VLAN_F | TS_F | MARK_F | CKSUM_F | PTYPE_F)   \
+	R(sec_vlan_ts_mark_cksum_ptype_rss,                                                        \
+	  R_SEC_F | RX_VLAN_F | TS_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES                                                                      \
+	NIX_RX_FASTPATH_MODES_0_15                                                                 \
+	NIX_RX_FASTPATH_MODES_16_31                                                                \
+	NIX_RX_FASTPATH_MODES_32_47                                                                \
+	NIX_RX_FASTPATH_MODES_48_63                                                                \
+	NIX_RX_FASTPATH_MODES_64_79                                                                \
+	NIX_RX_FASTPATH_MODES_80_95                                                                \
+	NIX_RX_FASTPATH_MODES_96_111                                                               \
+	NIX_RX_FASTPATH_MODES_112_127
+
+#define R(name, flags)                                                                             \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_##name(                              \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_mseg_##name(                         \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_vec_##name(                          \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_vec_mseg_##name(                     \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_reas_##name(                         \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_reas_mseg_##name(                    \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_reas_vec_##name(                     \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_reas_vec_mseg_##name(                \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);
+
+NIX_RX_FASTPATH_MODES
+#undef R
+
+#define NIX_RX_RECV(fn, flags)                                                                     \
+	uint16_t __rte_noinline __rte_hot fn(void *rx_queue, struct rte_mbuf **rx_pkts,            \
+					     uint16_t pkts)                                        \
+	{                                                                                          \
+		RTE_SET_USED(rx_queue);                                                            \
+		RTE_SET_USED(rx_pkts);                                                             \
+		RTE_SET_USED(pkts);                                                                \
+		return 0;                                                                          \
+	}
+
+#define NIX_RX_RECV_MSEG(fn, flags) NIX_RX_RECV(fn, flags | NIX_RX_MULTI_SEG_F)
+
+#define NIX_RX_RECV_VEC(fn, flags)                                                                 \
+	uint16_t __rte_noinline __rte_hot fn(void *rx_queue, struct rte_mbuf **rx_pkts,            \
+					     uint16_t pkts)                                        \
+	{                                                                                          \
+		RTE_SET_USED(rx_queue);                                                            \
+		RTE_SET_USED(rx_pkts);                                                             \
+		RTE_SET_USED(pkts);                                                                \
+		return 0;                                                                          \
+	}
+
+#define NIX_RX_RECV_VEC_MSEG(fn, flags) NIX_RX_RECV_VEC(fn, flags | NIX_RX_MULTI_SEG_F)
+
+uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_all_offload(void *rx_queue,
+								  struct rte_mbuf **rx_pkts,
+								  uint16_t pkts);
+
+uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_vec_all_offload(void *rx_queue,
+								      struct rte_mbuf **rx_pkts,
+								      uint16_t pkts);
+
+uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_all_offload_tst(void *rx_queue,
+								      struct rte_mbuf **rx_pkts,
+								      uint16_t pkts);
+
+uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_vec_all_offload_tst(void *rx_queue,
+									  struct rte_mbuf **rx_pkts,
+									  uint16_t pkts);
+
 #endif /* __CN20K_RX_H__ */
diff --git a/drivers/net/cnxk/cn20k_rx_select.c b/drivers/net/cnxk/cn20k_rx_select.c
new file mode 100644
index 0000000000..82e06a62ef
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_rx_select.c
@@ -0,0 +1,162 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_ethdev.h"
+#include "cn20k_rx.h"
+
+static __rte_used void
+pick_rx_func(struct rte_eth_dev *eth_dev, const eth_rx_burst_t rx_burst[NIX_RX_OFFLOAD_MAX])
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	/* [VLAN] [TSP] [MARK] [CKSUM] [PTYPE] [RSS] */
+	eth_dev->rx_pkt_burst = rx_burst[dev->rx_offload_flags & (NIX_RX_OFFLOAD_MAX - 1)];
+
+	if (eth_dev->data->dev_started)
+		rte_eth_fp_ops[eth_dev->data->port_id].rx_pkt_burst = eth_dev->rx_pkt_burst;
+
+	rte_atomic_thread_fence(rte_memory_order_release);
+}
+
+static uint16_t __rte_noinline __rte_hot __rte_unused
+cn20k_nix_flush_rx(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)
+{
+	RTE_SET_USED(rx_queue);
+	RTE_SET_USED(rx_pkts);
+	RTE_SET_USED(pkts);
+	return 0;
+}
+
+#if defined(RTE_ARCH_ARM64)
+static void
+cn20k_eth_set_rx_tmplt_func(struct rte_eth_dev *eth_dev)
+{
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	const eth_rx_burst_t nix_eth_rx_burst[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_##name,
+
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_burst_mseg[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_mseg_##name,
+
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_burst_reas[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_reas_##name,
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_burst_mseg_reas[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_reas_mseg_##name,
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_vec_burst[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_vec_##name,
+
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_vec_burst_mseg[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_vec_mseg_##name,
+
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_vec_burst_reas[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_reas_vec_##name,
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_vec_burst_mseg_reas[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_reas_vec_mseg_##name,
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	/* Copy multi seg version with security for tear down sequence */
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		dev->rx_pkt_burst_no_offload = cn20k_nix_flush_rx;
+
+	if (dev->scalar_ena) {
+		if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER) {
+			if (dev->rx_offload_flags & NIX_RX_REAS_F)
+				return pick_rx_func(eth_dev, nix_eth_rx_burst_mseg_reas);
+			else
+				return pick_rx_func(eth_dev, nix_eth_rx_burst_mseg);
+		}
+		if (dev->rx_offload_flags & NIX_RX_REAS_F)
+			return pick_rx_func(eth_dev, nix_eth_rx_burst_reas);
+		else
+			return pick_rx_func(eth_dev, nix_eth_rx_burst);
+	}
+
+	if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER) {
+		if (dev->rx_offload_flags & NIX_RX_REAS_F)
+			return pick_rx_func(eth_dev, nix_eth_rx_vec_burst_mseg_reas);
+		else
+			return pick_rx_func(eth_dev, nix_eth_rx_vec_burst_mseg);
+	}
+
+	if (dev->rx_offload_flags & NIX_RX_REAS_F)
+		return pick_rx_func(eth_dev, nix_eth_rx_vec_burst_reas);
+	else
+		return pick_rx_func(eth_dev, nix_eth_rx_vec_burst);
+#else
+	RTE_SET_USED(eth_dev);
+#endif
+}
+
+static void
+cn20k_eth_set_rx_blk_func(struct rte_eth_dev *eth_dev)
+{
+#if defined(CNXK_DIS_TMPLT_FUNC)
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	/* Copy multi seg version with security for tear down sequence */
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		dev->rx_pkt_burst_no_offload = cn20k_nix_flush_rx;
+
+	if (dev->scalar_ena) {
+		eth_dev->rx_pkt_burst = cn20k_nix_recv_pkts_all_offload;
+		if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
+			eth_dev->rx_pkt_burst = cn20k_nix_recv_pkts_all_offload_tst;
+	} else {
+		eth_dev->rx_pkt_burst = cn20k_nix_recv_pkts_vec_all_offload;
+		if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
+			eth_dev->rx_pkt_burst = cn20k_nix_recv_pkts_vec_all_offload_tst;
+	}
+
+	if (eth_dev->data->dev_started)
+		rte_eth_fp_ops[eth_dev->data->port_id].rx_pkt_burst = eth_dev->rx_pkt_burst;
+#else
+	RTE_SET_USED(eth_dev);
+#endif
+}
+#endif
+
+void
+cn20k_eth_set_rx_function(struct rte_eth_dev *eth_dev)
+{
+#if defined(RTE_ARCH_ARM64)
+	cn20k_eth_set_rx_blk_func(eth_dev);
+	cn20k_eth_set_rx_tmplt_func(eth_dev);
+
+	rte_atomic_thread_fence(rte_memory_order_release);
+#else
+	RTE_SET_USED(eth_dev);
+#endif
+}
diff --git a/drivers/net/cnxk/meson.build b/drivers/net/cnxk/meson.build
index cf2ce09f77..f41238be9c 100644
--- a/drivers/net/cnxk/meson.build
+++ b/drivers/net/cnxk/meson.build
@@ -236,7 +236,51 @@ if soc_type == 'cn20k' or soc_type == 'all'
 # CN20K
 sources += files(
         'cn20k_ethdev.c',
+        'cn20k_rx_select.c',
 )
+
+if host_machine.cpu_family().startswith('aarch') and not disable_template
+sources += files(
+        'rx/cn20k/rx_0_15.c',
+        'rx/cn20k/rx_16_31.c',
+        'rx/cn20k/rx_32_47.c',
+        'rx/cn20k/rx_48_63.c',
+        'rx/cn20k/rx_64_79.c',
+        'rx/cn20k/rx_80_95.c',
+        'rx/cn20k/rx_96_111.c',
+        'rx/cn20k/rx_112_127.c',
+        'rx/cn20k/rx_0_15_mseg.c',
+        'rx/cn20k/rx_16_31_mseg.c',
+        'rx/cn20k/rx_32_47_mseg.c',
+        'rx/cn20k/rx_48_63_mseg.c',
+        'rx/cn20k/rx_64_79_mseg.c',
+        'rx/cn20k/rx_80_95_mseg.c',
+        'rx/cn20k/rx_96_111_mseg.c',
+        'rx/cn20k/rx_112_127_mseg.c',
+        'rx/cn20k/rx_0_15_vec.c',
+        'rx/cn20k/rx_16_31_vec.c',
+        'rx/cn20k/rx_32_47_vec.c',
+        'rx/cn20k/rx_48_63_vec.c',
+        'rx/cn20k/rx_64_79_vec.c',
+        'rx/cn20k/rx_80_95_vec.c',
+        'rx/cn20k/rx_96_111_vec.c',
+        'rx/cn20k/rx_112_127_vec.c',
+        'rx/cn20k/rx_0_15_vec_mseg.c',
+        'rx/cn20k/rx_16_31_vec_mseg.c',
+        'rx/cn20k/rx_32_47_vec_mseg.c',
+        'rx/cn20k/rx_48_63_vec_mseg.c',
+        'rx/cn20k/rx_64_79_vec_mseg.c',
+        'rx/cn20k/rx_80_95_vec_mseg.c',
+        'rx/cn20k/rx_96_111_vec_mseg.c',
+        'rx/cn20k/rx_112_127_vec_mseg.c',
+        'rx/cn20k/rx_all_offload.c',
+)
+
+else
+sources += files(
+        'rx/cn20k/rx_all_offload.c',
+)
+endif
 endif
 
 
diff --git a/drivers/net/cnxk/rx/cn20k/rx_0_15.c b/drivers/net/cnxk/rx/cn20k/rx_0_15.c
new file mode 100644
index 0000000000..d248eb8c7e
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_0_15.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_0_15
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c
new file mode 100644
index 0000000000..b159632921
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_0_15
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c b/drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c
new file mode 100644
index 0000000000..76846bfea8
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_0_15
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c
new file mode 100644
index 0000000000..73533631ad
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_0_15
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_112_127.c b/drivers/net/cnxk/rx/cn20k/rx_112_127.c
new file mode 100644
index 0000000000..b7c53def26
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_112_127.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_112_127
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c
new file mode 100644
index 0000000000..ed3a95479c
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_112_127
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c b/drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c
new file mode 100644
index 0000000000..4bbba8bdbe
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_112_127
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c
new file mode 100644
index 0000000000..3a2b67436f
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_112_127
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_16_31.c b/drivers/net/cnxk/rx/cn20k/rx_16_31.c
new file mode 100644
index 0000000000..cd60faaefd
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_16_31.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_16_31
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c
new file mode 100644
index 0000000000..2f2d527def
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_16_31
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c b/drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c
new file mode 100644
index 0000000000..595ec8689e
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_16_31
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c
new file mode 100644
index 0000000000..7cf1c65f4a
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_16_31
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_32_47.c b/drivers/net/cnxk/rx/cn20k/rx_32_47.c
new file mode 100644
index 0000000000..e3778448ca
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_32_47.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_32_47
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c
new file mode 100644
index 0000000000..2203247aa4
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_32_47
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c b/drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c
new file mode 100644
index 0000000000..7aae8225e7
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_32_47
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c
new file mode 100644
index 0000000000..1a221ae095
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_32_47
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_48_63.c b/drivers/net/cnxk/rx/cn20k/rx_48_63.c
new file mode 100644
index 0000000000..c5fedd06cd
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_48_63.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_48_63
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c
new file mode 100644
index 0000000000..6c2d8ac331
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_48_63
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c b/drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c
new file mode 100644
index 0000000000..20a937e453
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_48_63
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c
new file mode 100644
index 0000000000..929d807c8d
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_48_63
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_64_79.c b/drivers/net/cnxk/rx/cn20k/rx_64_79.c
new file mode 100644
index 0000000000..30beebc326
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_64_79.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_64_79
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c
new file mode 100644
index 0000000000..30ece8f8ee
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_64_79
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c b/drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c
new file mode 100644
index 0000000000..1f533c01f6
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_64_79
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c
new file mode 100644
index 0000000000..ed3c012798
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_64_79
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_80_95.c b/drivers/net/cnxk/rx/cn20k/rx_80_95.c
new file mode 100644
index 0000000000..a13ecb244f
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_80_95.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_80_95
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c
new file mode 100644
index 0000000000..c6438120d8
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_80_95
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c b/drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c
new file mode 100644
index 0000000000..94c685ba7c
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_80_95
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c
new file mode 100644
index 0000000000..370376da7d
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_80_95
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_96_111.c b/drivers/net/cnxk/rx/cn20k/rx_96_111.c
new file mode 100644
index 0000000000..15b5375e3c
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_96_111.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_96_111
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c
new file mode 100644
index 0000000000..561b48c789
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_96_111
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c b/drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c
new file mode 100644
index 0000000000..17031f7b6f
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_96_111
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c
new file mode 100644
index 0000000000..9dd1f3f39a
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_96_111
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_all_offload.c b/drivers/net/cnxk/rx/cn20k/rx_all_offload.c
new file mode 100644
index 0000000000..1d032b3b17
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_all_offload.c
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if defined(CNXK_DIS_TMPLT_FUNC)
+
+uint16_t __rte_noinline __rte_hot
+cn20k_nix_recv_pkts_all_offload(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)
+{
+	return cn20k_nix_recv_pkts(rx_queue, rx_pkts, pkts,
+				   NIX_RX_OFFLOAD_RSS_F | NIX_RX_OFFLOAD_PTYPE_F |
+					   NIX_RX_OFFLOAD_CHECKSUM_F |
+					   NIX_RX_OFFLOAD_MARK_UPDATE_F | NIX_RX_OFFLOAD_TSTAMP_F |
+					   NIX_RX_OFFLOAD_VLAN_STRIP_F | NIX_RX_OFFLOAD_SECURITY_F |
+					   NIX_RX_MULTI_SEG_F | NIX_RX_REAS_F);
+}
+
+uint16_t __rte_noinline __rte_hot
+cn20k_nix_recv_pkts_vec_all_offload(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)
+{
+	return cn20k_nix_recv_pkts_vector(
+		rx_queue, rx_pkts, pkts,
+		NIX_RX_OFFLOAD_RSS_F | NIX_RX_OFFLOAD_PTYPE_F | NIX_RX_OFFLOAD_CHECKSUM_F |
+			NIX_RX_OFFLOAD_MARK_UPDATE_F | NIX_RX_OFFLOAD_TSTAMP_F |
+			NIX_RX_OFFLOAD_VLAN_STRIP_F | NIX_RX_OFFLOAD_SECURITY_F |
+			NIX_RX_MULTI_SEG_F | NIX_RX_REAS_F,
+		NULL, NULL, 0, 0);
+}
+
+uint16_t __rte_noinline __rte_hot
+cn20k_nix_recv_pkts_all_offload_tst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)
+{
+	return cn20k_nix_recv_pkts(
+		rx_queue, rx_pkts, pkts,
+		NIX_RX_OFFLOAD_RSS_F | NIX_RX_OFFLOAD_PTYPE_F | NIX_RX_OFFLOAD_CHECKSUM_F |
+			NIX_RX_OFFLOAD_MARK_UPDATE_F | NIX_RX_OFFLOAD_VLAN_STRIP_F |
+			NIX_RX_OFFLOAD_SECURITY_F | NIX_RX_MULTI_SEG_F | NIX_RX_REAS_F);
+}
+
+uint16_t __rte_noinline __rte_hot
+cn20k_nix_recv_pkts_vec_all_offload_tst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)
+{
+	return cn20k_nix_recv_pkts_vector(
+		rx_queue, rx_pkts, pkts,
+		NIX_RX_OFFLOAD_RSS_F | NIX_RX_OFFLOAD_PTYPE_F | NIX_RX_OFFLOAD_CHECKSUM_F |
+			NIX_RX_OFFLOAD_MARK_UPDATE_F | NIX_RX_OFFLOAD_VLAN_STRIP_F |
+			NIX_RX_OFFLOAD_SECURITY_F | NIX_RX_MULTI_SEG_F | NIX_RX_REAS_F,
+		NULL, NULL, 0, 0);
+}
+
+#endif
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 20/33] net/cnxk: support Tx function select for cn20k
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (18 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 19/33] net/cnxk: support Rx function select for cn20k Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 21/33] net/cnxk: support Rx burst scalar " Nithin Dabilpuram
                   ` (15 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

Add support to select Tx function based on offload flags
for cn20k.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
---
 drivers/net/cnxk/cn20k_ethdev.c               |  80 ++++++
 drivers/net/cnxk/cn20k_ethdev.h               |   1 +
 drivers/net/cnxk/cn20k_tx.h                   | 237 ++++++++++++++++++
 drivers/net/cnxk/cn20k_tx_select.c            | 122 +++++++++
 drivers/net/cnxk/meson.build                  |  37 +++
 drivers/net/cnxk/tx/cn20k/tx_0_15.c           |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c      |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c       |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c  |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_112_127.c        |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c   |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c    |  18 ++
 .../net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c   |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_16_31.c          |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c     |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c      |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_32_47.c          |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c     |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c      |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_48_63.c          |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c     |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c      |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_64_79.c          |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c     |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c      |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_80_95.c          |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c     |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c      |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_96_111.c         |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c    |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c     |  18 ++
 .../net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c    |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_all_offload.c    |  39 +++
 38 files changed, 1092 insertions(+)
 create mode 100644 drivers/net/cnxk/cn20k_tx_select.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_all_offload.c

diff --git a/drivers/net/cnxk/cn20k_ethdev.c b/drivers/net/cnxk/cn20k_ethdev.c
index d1cb3a52bf..4b2f04ba31 100644
--- a/drivers/net/cnxk/cn20k_ethdev.c
+++ b/drivers/net/cnxk/cn20k_ethdev.c
@@ -40,6 +40,78 @@ nix_rx_offload_flags(struct rte_eth_dev *eth_dev)
 	return flags;
 }
 
+static uint16_t
+nix_tx_offload_flags(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	uint64_t conf = dev->tx_offloads;
+	struct roc_nix *nix = &dev->nix;
+	uint16_t flags = 0;
+
+	/* Fastpath is dependent on these enums */
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_TCP_CKSUM != (1ULL << 52));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_SCTP_CKSUM != (2ULL << 52));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_UDP_CKSUM != (3ULL << 52));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_IP_CKSUM != (1ULL << 54));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_IPV4 != (1ULL << 55));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_OUTER_IP_CKSUM != (1ULL << 58));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_OUTER_IPV4 != (1ULL << 59));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_OUTER_IPV6 != (1ULL << 60));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_OUTER_UDP_CKSUM != (1ULL << 41));
+	RTE_BUILD_BUG_ON(RTE_MBUF_L2_LEN_BITS != 7);
+	RTE_BUILD_BUG_ON(RTE_MBUF_L3_LEN_BITS != 9);
+	RTE_BUILD_BUG_ON(RTE_MBUF_OUTL2_LEN_BITS != 7);
+	RTE_BUILD_BUG_ON(RTE_MBUF_OUTL3_LEN_BITS != 9);
+	RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, data_off) !=
+			 offsetof(struct rte_mbuf, buf_addr) + 16);
+	RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) !=
+			 offsetof(struct rte_mbuf, buf_addr) + 24);
+	RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, pkt_len) !=
+			 offsetof(struct rte_mbuf, ol_flags) + 12);
+	RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, tx_offload) !=
+			 offsetof(struct rte_mbuf, pool) + 2 * sizeof(void *));
+
+	if (conf & RTE_ETH_TX_OFFLOAD_VLAN_INSERT || conf & RTE_ETH_TX_OFFLOAD_QINQ_INSERT)
+		flags |= NIX_TX_OFFLOAD_VLAN_QINQ_F;
+
+	if (conf & RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM || conf & RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM)
+		flags |= NIX_TX_OFFLOAD_OL3_OL4_CSUM_F;
+
+	if (conf & RTE_ETH_TX_OFFLOAD_IPV4_CKSUM || conf & RTE_ETH_TX_OFFLOAD_TCP_CKSUM ||
+	    conf & RTE_ETH_TX_OFFLOAD_UDP_CKSUM || conf & RTE_ETH_TX_OFFLOAD_SCTP_CKSUM)
+		flags |= NIX_TX_OFFLOAD_L3_L4_CSUM_F;
+
+	if (!(conf & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE))
+		flags |= NIX_TX_OFFLOAD_MBUF_NOFF_F;
+
+	if (conf & RTE_ETH_TX_OFFLOAD_MULTI_SEGS)
+		flags |= NIX_TX_MULTI_SEG_F;
+
+	/* Enable Inner checksum for TSO */
+	if (conf & RTE_ETH_TX_OFFLOAD_TCP_TSO)
+		flags |= (NIX_TX_OFFLOAD_TSO_F | NIX_TX_OFFLOAD_L3_L4_CSUM_F);
+
+	/* Enable Inner and Outer checksum for Tunnel TSO */
+	if (conf & (RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO | RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO |
+		    RTE_ETH_TX_OFFLOAD_GRE_TNL_TSO))
+		flags |= (NIX_TX_OFFLOAD_TSO_F | NIX_TX_OFFLOAD_OL3_OL4_CSUM_F |
+			  NIX_TX_OFFLOAD_L3_L4_CSUM_F);
+
+	if ((dev->rx_offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP))
+		flags |= NIX_TX_OFFLOAD_TSTAMP_F;
+
+	if (conf & RTE_ETH_TX_OFFLOAD_SECURITY)
+		flags |= NIX_TX_OFFLOAD_SECURITY_F;
+
+	if (dev->tx_mark)
+		flags |= NIX_TX_OFFLOAD_VLAN_QINQ_F;
+
+	if (nix->tx_compl_ena)
+		flags |= NIX_TX_OFFLOAD_MBUF_NOFF_F;
+
+	return flags;
+}
+
 static int
 cn20k_nix_ptypes_set(struct rte_eth_dev *eth_dev, uint32_t ptype_mask)
 {
@@ -226,6 +298,7 @@ cn20k_nix_rx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid, uint16_t nb_
 
 		/* Update offload flags */
 		dev->rx_offload_flags = nix_rx_offload_flags(eth_dev);
+		dev->tx_offload_flags = nix_tx_offload_flags(eth_dev);
 	}
 
 	rq = &dev->rqs[qid];
@@ -286,6 +359,8 @@ cn20k_nix_configure(struct rte_eth_dev *eth_dev)
 
 	/* Update offload flags */
 	dev->rx_offload_flags = nix_rx_offload_flags(eth_dev);
+	dev->tx_offload_flags = nix_tx_offload_flags(eth_dev);
+
 	/* reset reassembly dynfield/flag offset */
 	dev->reass_dynfield_off = -1;
 	dev->reass_dynflag_bit = -1;
@@ -316,6 +391,7 @@ cn20k_nix_timesync_enable(struct rte_eth_dev *eth_dev)
 	 * in rx[tx]_offloads.
 	 */
 	cn20k_eth_set_rx_function(eth_dev);
+	cn20k_eth_set_tx_function(eth_dev);
 	return 0;
 }
 
@@ -339,6 +415,7 @@ cn20k_nix_timesync_disable(struct rte_eth_dev *eth_dev)
 	 * in rx[tx]_offloads.
 	 */
 	cn20k_eth_set_rx_function(eth_dev);
+	cn20k_eth_set_tx_function(eth_dev);
 	return 0;
 }
 
@@ -378,10 +455,12 @@ cn20k_nix_dev_start(struct rte_eth_dev *eth_dev)
 	 * in rx[tx]_offloads.
 	 */
 	dev->rx_offload_flags |= nix_rx_offload_flags(eth_dev);
+	dev->tx_offload_flags |= nix_tx_offload_flags(eth_dev);
 	/* Set flags for Rx Inject feature */
 	if (roc_idev_nix_rx_inject_get(nix->port_id))
 		dev->rx_offload_flags |= NIX_RX_SEC_REASSEMBLY_F;
 
+	cn20k_eth_set_tx_function(eth_dev);
 	cn20k_eth_set_rx_function(eth_dev);
 	return 0;
 }
@@ -581,6 +660,7 @@ cn20k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
 		/* Setup callbacks for secondary process */
+		cn20k_eth_set_tx_function(eth_dev);
 		cn20k_eth_set_rx_function(eth_dev);
 		return 0;
 	}
diff --git a/drivers/net/cnxk/cn20k_ethdev.h b/drivers/net/cnxk/cn20k_ethdev.h
index 2049ee7fa4..cb46044d60 100644
--- a/drivers/net/cnxk/cn20k_ethdev.h
+++ b/drivers/net/cnxk/cn20k_ethdev.h
@@ -10,5 +10,6 @@
 
 /* Rx and Tx routines */
 void cn20k_eth_set_rx_function(struct rte_eth_dev *eth_dev);
+void cn20k_eth_set_tx_function(struct rte_eth_dev *eth_dev);
 
 #endif /* __CN20K_ETHDEV_H__ */
diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
index a00c9d5776..9fd925ac34 100644
--- a/drivers/net/cnxk/cn20k_tx.h
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -32,4 +32,241 @@
 #define NIX_TX_NEED_EXT_HDR                                                                        \
 	(NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSTAMP_F | NIX_TX_OFFLOAD_TSO_F)
 
+#define L3L4CSUM_F   NIX_TX_OFFLOAD_L3_L4_CSUM_F
+#define OL3OL4CSUM_F NIX_TX_OFFLOAD_OL3_OL4_CSUM_F
+#define VLAN_F	     NIX_TX_OFFLOAD_VLAN_QINQ_F
+#define NOFF_F	     NIX_TX_OFFLOAD_MBUF_NOFF_F
+#define TSO_F	     NIX_TX_OFFLOAD_TSO_F
+#define TSP_F	     NIX_TX_OFFLOAD_TSTAMP_F
+#define T_SEC_F	     NIX_TX_OFFLOAD_SECURITY_F
+
+/* [T_SEC_F] [TSP] [TSO] [NOFF] [VLAN] [OL3OL4CSUM] [L3L4CSUM] */
+#define NIX_TX_FASTPATH_MODES_0_15                                                                 \
+	T(no_offload, 6, NIX_TX_OFFLOAD_NONE)                                                      \
+	T(l3l4csum, 6, L3L4CSUM_F)                                                                 \
+	T(ol3ol4csum, 6, OL3OL4CSUM_F)                                                             \
+	T(ol3ol4csum_l3l4csum, 6, OL3OL4CSUM_F | L3L4CSUM_F)                                       \
+	T(vlan, 6, VLAN_F)                                                                         \
+	T(vlan_l3l4csum, 6, VLAN_F | L3L4CSUM_F)                                                   \
+	T(vlan_ol3ol4csum, 6, VLAN_F | OL3OL4CSUM_F)                                               \
+	T(vlan_ol3ol4csum_l3l4csum, 6, VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)                         \
+	T(noff, 6, NOFF_F)                                                                         \
+	T(noff_l3l4csum, 6, NOFF_F | L3L4CSUM_F)                                                   \
+	T(noff_ol3ol4csum, 6, NOFF_F | OL3OL4CSUM_F)                                               \
+	T(noff_ol3ol4csum_l3l4csum, 6, NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)                         \
+	T(noff_vlan, 6, NOFF_F | VLAN_F)                                                           \
+	T(noff_vlan_l3l4csum, 6, NOFF_F | VLAN_F | L3L4CSUM_F)                                     \
+	T(noff_vlan_ol3ol4csum, 6, NOFF_F | VLAN_F | OL3OL4CSUM_F)                                 \
+	T(noff_vlan_ol3ol4csum_l3l4csum, 6, NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_16_31                                                                \
+	T(tso, 6, TSO_F)                                                                           \
+	T(tso_l3l4csum, 6, TSO_F | L3L4CSUM_F)                                                     \
+	T(tso_ol3ol4csum, 6, TSO_F | OL3OL4CSUM_F)                                                 \
+	T(tso_ol3ol4csum_l3l4csum, 6, TSO_F | OL3OL4CSUM_F | L3L4CSUM_F)                           \
+	T(tso_vlan, 6, TSO_F | VLAN_F)                                                             \
+	T(tso_vlan_l3l4csum, 6, TSO_F | VLAN_F | L3L4CSUM_F)                                       \
+	T(tso_vlan_ol3ol4csum, 6, TSO_F | VLAN_F | OL3OL4CSUM_F)                                   \
+	T(tso_vlan_ol3ol4csum_l3l4csum, 6, TSO_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)             \
+	T(tso_noff, 6, TSO_F | NOFF_F)                                                             \
+	T(tso_noff_l3l4csum, 6, TSO_F | NOFF_F | L3L4CSUM_F)                                       \
+	T(tso_noff_ol3ol4csum, 6, TSO_F | NOFF_F | OL3OL4CSUM_F)                                   \
+	T(tso_noff_ol3ol4csum_l3l4csum, 6, TSO_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)             \
+	T(tso_noff_vlan, 6, TSO_F | NOFF_F | VLAN_F)                                               \
+	T(tso_noff_vlan_l3l4csum, 6, TSO_F | NOFF_F | VLAN_F | L3L4CSUM_F)                         \
+	T(tso_noff_vlan_ol3ol4csum, 6, TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)                     \
+	T(tso_noff_vlan_ol3ol4csum_l3l4csum, 6, TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_32_47                                                                \
+	T(ts, 8, TSP_F)                                                                            \
+	T(ts_l3l4csum, 8, TSP_F | L3L4CSUM_F)                                                      \
+	T(ts_ol3ol4csum, 8, TSP_F | OL3OL4CSUM_F)                                                  \
+	T(ts_ol3ol4csum_l3l4csum, 8, TSP_F | OL3OL4CSUM_F | L3L4CSUM_F)                            \
+	T(ts_vlan, 8, TSP_F | VLAN_F)                                                              \
+	T(ts_vlan_l3l4csum, 8, TSP_F | VLAN_F | L3L4CSUM_F)                                        \
+	T(ts_vlan_ol3ol4csum, 8, TSP_F | VLAN_F | OL3OL4CSUM_F)                                    \
+	T(ts_vlan_ol3ol4csum_l3l4csum, 8, TSP_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)              \
+	T(ts_noff, 8, TSP_F | NOFF_F)                                                              \
+	T(ts_noff_l3l4csum, 8, TSP_F | NOFF_F | L3L4CSUM_F)                                        \
+	T(ts_noff_ol3ol4csum, 8, TSP_F | NOFF_F | OL3OL4CSUM_F)                                    \
+	T(ts_noff_ol3ol4csum_l3l4csum, 8, TSP_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)              \
+	T(ts_noff_vlan, 8, TSP_F | NOFF_F | VLAN_F)                                                \
+	T(ts_noff_vlan_l3l4csum, 8, TSP_F | NOFF_F | VLAN_F | L3L4CSUM_F)                          \
+	T(ts_noff_vlan_ol3ol4csum, 8, TSP_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)                      \
+	T(ts_noff_vlan_ol3ol4csum_l3l4csum, 8, TSP_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_48_63                                                                \
+	T(ts_tso, 8, TSP_F | TSO_F)                                                                \
+	T(ts_tso_l3l4csum, 8, TSP_F | TSO_F | L3L4CSUM_F)                                          \
+	T(ts_tso_ol3ol4csum, 8, TSP_F | TSO_F | OL3OL4CSUM_F)                                      \
+	T(ts_tso_ol3ol4csum_l3l4csum, 8, TSP_F | TSO_F | OL3OL4CSUM_F | L3L4CSUM_F)                \
+	T(ts_tso_vlan, 8, TSP_F | TSO_F | VLAN_F)                                                  \
+	T(ts_tso_vlan_l3l4csum, 8, TSP_F | TSO_F | VLAN_F | L3L4CSUM_F)                            \
+	T(ts_tso_vlan_ol3ol4csum, 8, TSP_F | TSO_F | VLAN_F | OL3OL4CSUM_F)                        \
+	T(ts_tso_vlan_ol3ol4csum_l3l4csum, 8, TSP_F | TSO_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)  \
+	T(ts_tso_noff, 8, TSP_F | TSO_F | NOFF_F)                                                  \
+	T(ts_tso_noff_l3l4csum, 8, TSP_F | TSO_F | NOFF_F | L3L4CSUM_F)                            \
+	T(ts_tso_noff_ol3ol4csum, 8, TSP_F | TSO_F | NOFF_F | OL3OL4CSUM_F)                        \
+	T(ts_tso_noff_ol3ol4csum_l3l4csum, 8, TSP_F | TSO_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)  \
+	T(ts_tso_noff_vlan, 8, TSP_F | TSO_F | NOFF_F | VLAN_F)                                    \
+	T(ts_tso_noff_vlan_l3l4csum, 8, TSP_F | TSO_F | NOFF_F | VLAN_F | L3L4CSUM_F)              \
+	T(ts_tso_noff_vlan_ol3ol4csum, 8, TSP_F | TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)          \
+	T(ts_tso_noff_vlan_ol3ol4csum_l3l4csum, 8,                                                 \
+	  TSP_F | TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_64_79                                                                \
+	T(sec, 6, T_SEC_F)                                                                         \
+	T(sec_l3l4csum, 6, T_SEC_F | L3L4CSUM_F)                                                   \
+	T(sec_ol3ol4csum, 6, T_SEC_F | OL3OL4CSUM_F)                                               \
+	T(sec_ol3ol4csum_l3l4csum, 6, T_SEC_F | OL3OL4CSUM_F | L3L4CSUM_F)                         \
+	T(sec_vlan, 6, T_SEC_F | VLAN_F)                                                           \
+	T(sec_vlan_l3l4csum, 6, T_SEC_F | VLAN_F | L3L4CSUM_F)                                     \
+	T(sec_vlan_ol3ol4csum, 6, T_SEC_F | VLAN_F | OL3OL4CSUM_F)                                 \
+	T(sec_vlan_ol3ol4csum_l3l4csum, 6, T_SEC_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)           \
+	T(sec_noff, 6, T_SEC_F | NOFF_F)                                                           \
+	T(sec_noff_l3l4csum, 6, T_SEC_F | NOFF_F | L3L4CSUM_F)                                     \
+	T(sec_noff_ol3ol4csum, 6, T_SEC_F | NOFF_F | OL3OL4CSUM_F)                                 \
+	T(sec_noff_ol3ol4csum_l3l4csum, 6, T_SEC_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)           \
+	T(sec_noff_vlan, 6, T_SEC_F | NOFF_F | VLAN_F)                                             \
+	T(sec_noff_vlan_l3l4csum, 6, T_SEC_F | NOFF_F | VLAN_F | L3L4CSUM_F)                       \
+	T(sec_noff_vlan_ol3ol4csum, 6, T_SEC_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)                   \
+	T(sec_noff_vlan_ol3ol4csum_l3l4csum, 6,                                                    \
+	  T_SEC_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_80_95                                                                \
+	T(sec_tso, 6, T_SEC_F | TSO_F)                                                             \
+	T(sec_tso_l3l4csum, 6, T_SEC_F | TSO_F | L3L4CSUM_F)                                       \
+	T(sec_tso_ol3ol4csum, 6, T_SEC_F | TSO_F | OL3OL4CSUM_F)                                   \
+	T(sec_tso_ol3ol4csum_l3l4csum, 6, T_SEC_F | TSO_F | OL3OL4CSUM_F | L3L4CSUM_F)             \
+	T(sec_tso_vlan, 6, T_SEC_F | TSO_F | VLAN_F)                                               \
+	T(sec_tso_vlan_l3l4csum, 6, T_SEC_F | TSO_F | VLAN_F | L3L4CSUM_F)                         \
+	T(sec_tso_vlan_ol3ol4csum, 6, T_SEC_F | TSO_F | VLAN_F | OL3OL4CSUM_F)                     \
+	T(sec_tso_vlan_ol3ol4csum_l3l4csum, 6,                                                     \
+	  T_SEC_F | TSO_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)                                    \
+	T(sec_tso_noff, 6, T_SEC_F | TSO_F | NOFF_F)                                               \
+	T(sec_tso_noff_l3l4csum, 6, T_SEC_F | TSO_F | NOFF_F | L3L4CSUM_F)                         \
+	T(sec_tso_noff_ol3ol4csum, 6, T_SEC_F | TSO_F | NOFF_F | OL3OL4CSUM_F)                     \
+	T(sec_tso_noff_ol3ol4csum_l3l4csum, 6,                                                     \
+	  T_SEC_F | TSO_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)                                    \
+	T(sec_tso_noff_vlan, 6, T_SEC_F | TSO_F | NOFF_F | VLAN_F)                                 \
+	T(sec_tso_noff_vlan_l3l4csum, 6, T_SEC_F | TSO_F | NOFF_F | VLAN_F | L3L4CSUM_F)           \
+	T(sec_tso_noff_vlan_ol3ol4csum, 6, T_SEC_F | TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)       \
+	T(sec_tso_noff_vlan_ol3ol4csum_l3l4csum, 6,                                                \
+	  T_SEC_F | TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_96_111                                                               \
+	T(sec_ts, 8, T_SEC_F | TSP_F)                                                              \
+	T(sec_ts_l3l4csum, 8, T_SEC_F | TSP_F | L3L4CSUM_F)                                        \
+	T(sec_ts_ol3ol4csum, 8, T_SEC_F | TSP_F | OL3OL4CSUM_F)                                    \
+	T(sec_ts_ol3ol4csum_l3l4csum, 8, T_SEC_F | TSP_F | OL3OL4CSUM_F | L3L4CSUM_F)              \
+	T(sec_ts_vlan, 8, T_SEC_F | TSP_F | VLAN_F)                                                \
+	T(sec_ts_vlan_l3l4csum, 8, T_SEC_F | TSP_F | VLAN_F | L3L4CSUM_F)                          \
+	T(sec_ts_vlan_ol3ol4csum, 8, T_SEC_F | TSP_F | VLAN_F | OL3OL4CSUM_F)                      \
+	T(sec_ts_vlan_ol3ol4csum_l3l4csum, 8,                                                      \
+	  T_SEC_F | TSP_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)                                    \
+	T(sec_ts_noff, 8, T_SEC_F | TSP_F | NOFF_F)                                                \
+	T(sec_ts_noff_l3l4csum, 8, T_SEC_F | TSP_F | NOFF_F | L3L4CSUM_F)                          \
+	T(sec_ts_noff_ol3ol4csum, 8, T_SEC_F | TSP_F | NOFF_F | OL3OL4CSUM_F)                      \
+	T(sec_ts_noff_ol3ol4csum_l3l4csum, 8,                                                      \
+	  T_SEC_F | TSP_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)                                    \
+	T(sec_ts_noff_vlan, 8, T_SEC_F | TSP_F | NOFF_F | VLAN_F)                                  \
+	T(sec_ts_noff_vlan_l3l4csum, 8, T_SEC_F | TSP_F | NOFF_F | VLAN_F | L3L4CSUM_F)            \
+	T(sec_ts_noff_vlan_ol3ol4csum, 8, T_SEC_F | TSP_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)        \
+	T(sec_ts_noff_vlan_ol3ol4csum_l3l4csum, 8,                                                 \
+	  T_SEC_F | TSP_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_112_127                                                              \
+	T(sec_ts_tso, 8, T_SEC_F | TSP_F | TSO_F)                                                  \
+	T(sec_ts_tso_l3l4csum, 8, T_SEC_F | TSP_F | TSO_F | L3L4CSUM_F)                            \
+	T(sec_ts_tso_ol3ol4csum, 8, T_SEC_F | TSP_F | TSO_F | OL3OL4CSUM_F)                        \
+	T(sec_ts_tso_ol3ol4csum_l3l4csum, 8, T_SEC_F | TSP_F | TSO_F | OL3OL4CSUM_F | L3L4CSUM_F)  \
+	T(sec_ts_tso_vlan, 8, T_SEC_F | TSP_F | TSO_F | VLAN_F)                                    \
+	T(sec_ts_tso_vlan_l3l4csum, 8, T_SEC_F | TSP_F | TSO_F | VLAN_F | L3L4CSUM_F)              \
+	T(sec_ts_tso_vlan_ol3ol4csum, 8, T_SEC_F | TSP_F | TSO_F | VLAN_F | OL3OL4CSUM_F)          \
+	T(sec_ts_tso_vlan_ol3ol4csum_l3l4csum, 8,                                                  \
+	  T_SEC_F | TSP_F | TSO_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)                            \
+	T(sec_ts_tso_noff, 8, T_SEC_F | TSP_F | TSO_F | NOFF_F)                                    \
+	T(sec_ts_tso_noff_l3l4csum, 8, T_SEC_F | TSP_F | TSO_F | NOFF_F | L3L4CSUM_F)              \
+	T(sec_ts_tso_noff_ol3ol4csum, 8, T_SEC_F | TSP_F | TSO_F | NOFF_F | OL3OL4CSUM_F)          \
+	T(sec_ts_tso_noff_ol3ol4csum_l3l4csum, 8,                                                  \
+	  T_SEC_F | TSP_F | TSO_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)                            \
+	T(sec_ts_tso_noff_vlan, 8, T_SEC_F | TSP_F | TSO_F | NOFF_F | VLAN_F)                      \
+	T(sec_ts_tso_noff_vlan_l3l4csum, 8,                                                        \
+	  T_SEC_F | TSP_F | TSO_F | NOFF_F | VLAN_F | L3L4CSUM_F)                                  \
+	T(sec_ts_tso_noff_vlan_ol3ol4csum, 8,                                                      \
+	  T_SEC_F | TSP_F | TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)                                \
+	T(sec_ts_tso_noff_vlan_ol3ol4csum_l3l4csum, 8,                                             \
+	  T_SEC_F | TSP_F | TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES                                                                      \
+	NIX_TX_FASTPATH_MODES_0_15                                                                 \
+	NIX_TX_FASTPATH_MODES_16_31                                                                \
+	NIX_TX_FASTPATH_MODES_32_47                                                                \
+	NIX_TX_FASTPATH_MODES_48_63                                                                \
+	NIX_TX_FASTPATH_MODES_64_79                                                                \
+	NIX_TX_FASTPATH_MODES_80_95                                                                \
+	NIX_TX_FASTPATH_MODES_96_111                                                               \
+	NIX_TX_FASTPATH_MODES_112_127
+
+#define T(name, sz, flags)                                                                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_##name(                              \
+		void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_mseg_##name(                         \
+		void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_vec_##name(                          \
+		void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_vec_mseg_##name(                     \
+		void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts);
+
+NIX_TX_FASTPATH_MODES
+#undef T
+
+#define NIX_TX_XMIT(fn, sz, flags)                                                                 \
+	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
+					     uint16_t pkts)                                        \
+	{                                                                                          \
+		RTE_SET_USED(tx_queue);                                                            \
+		RTE_SET_USED(tx_pkts);                                                             \
+		RTE_SET_USED(pkts);                                                                \
+		return 0;                                                                          \
+	}
+
+#define NIX_TX_XMIT_MSEG(fn, sz, flags)                                                            \
+	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
+					     uint16_t pkts)                                        \
+	{                                                                                          \
+		RTE_SET_USED(tx_queue);                                                            \
+		RTE_SET_USED(tx_pkts);                                                             \
+		RTE_SET_USED(pkts);                                                                \
+		return 0;                                                                          \
+	}
+
+#define NIX_TX_XMIT_VEC(fn, sz, flags)                                                             \
+	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
+					     uint16_t pkts)                                        \
+	{                                                                                          \
+		RTE_SET_USED(tx_queue);                                                            \
+		RTE_SET_USED(tx_pkts);                                                             \
+		RTE_SET_USED(pkts);                                                                \
+		return 0;                                                                          \
+	}
+
+#define NIX_TX_XMIT_VEC_MSEG(fn, sz, flags)                                                        \
+	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
+					     uint16_t pkts)                                        \
+	{                                                                                          \
+		RTE_SET_USED(tx_queue);                                                            \
+		RTE_SET_USED(tx_pkts);                                                             \
+		RTE_SET_USED(pkts);                                                                \
+		return 0;                                                                          \
+	}
+
+uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_all_offload(void *tx_queue,
+								  struct rte_mbuf **tx_pkts,
+								  uint16_t pkts);
+
+uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_vec_all_offload(void *tx_queue,
+								      struct rte_mbuf **tx_pkts,
+								      uint16_t pkts);
+
 #endif /* __CN20K_TX_H__ */
diff --git a/drivers/net/cnxk/cn20k_tx_select.c b/drivers/net/cnxk/cn20k_tx_select.c
new file mode 100644
index 0000000000..fb62b54a5f
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_tx_select.c
@@ -0,0 +1,122 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_ethdev.h"
+#include "cn20k_tx.h"
+
+static __rte_used inline void
+pick_tx_func(struct rte_eth_dev *eth_dev, const eth_tx_burst_t tx_burst[NIX_TX_OFFLOAD_MAX])
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	/* [SEC] [TSP] [TSO] [NOFF] [VLAN] [OL3_OL4_CSUM] [IL3_IL4_CSUM] */
+	eth_dev->tx_pkt_burst = tx_burst[dev->tx_offload_flags & (NIX_TX_OFFLOAD_MAX - 1)];
+
+	if (eth_dev->data->dev_started)
+		rte_eth_fp_ops[eth_dev->data->port_id].tx_pkt_burst = eth_dev->tx_pkt_burst;
+}
+
+#if defined(RTE_ARCH_ARM64)
+static int
+cn20k_nix_tx_queue_count(void *tx_queue)
+{
+	struct cn20k_eth_txq *txq = (struct cn20k_eth_txq *)tx_queue;
+
+	return cnxk_nix_tx_queue_count(txq->fc_mem, txq->sqes_per_sqb_log2);
+}
+
+static int
+cn20k_nix_tx_queue_sec_count(void *tx_queue)
+{
+	struct cn20k_eth_txq *txq = (struct cn20k_eth_txq *)tx_queue;
+
+	return cnxk_nix_tx_queue_sec_count(txq->fc_mem, txq->sqes_per_sqb_log2, txq->cpt_fc);
+}
+
+static void
+cn20k_eth_set_tx_tmplt_func(struct rte_eth_dev *eth_dev)
+{
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	const eth_tx_burst_t nix_eth_tx_burst[NIX_TX_OFFLOAD_MAX] = {
+#define T(name, sz, flags) [flags] = cn20k_nix_xmit_pkts_##name,
+
+		NIX_TX_FASTPATH_MODES
+#undef T
+	};
+
+	const eth_tx_burst_t nix_eth_tx_burst_mseg[NIX_TX_OFFLOAD_MAX] = {
+#define T(name, sz, flags) [flags] = cn20k_nix_xmit_pkts_mseg_##name,
+
+		NIX_TX_FASTPATH_MODES
+#undef T
+	};
+
+	const eth_tx_burst_t nix_eth_tx_vec_burst[NIX_TX_OFFLOAD_MAX] = {
+#define T(name, sz, flags) [flags] = cn20k_nix_xmit_pkts_vec_##name,
+
+		NIX_TX_FASTPATH_MODES
+#undef T
+	};
+
+	const eth_tx_burst_t nix_eth_tx_vec_burst_mseg[NIX_TX_OFFLOAD_MAX] = {
+#define T(name, sz, flags) [flags] = cn20k_nix_xmit_pkts_vec_mseg_##name,
+
+		NIX_TX_FASTPATH_MODES
+#undef T
+	};
+
+	if (dev->scalar_ena || dev->tx_mark) {
+		pick_tx_func(eth_dev, nix_eth_tx_burst);
+		if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_MULTI_SEGS)
+			pick_tx_func(eth_dev, nix_eth_tx_burst_mseg);
+	} else {
+		pick_tx_func(eth_dev, nix_eth_tx_vec_burst);
+		if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_MULTI_SEGS)
+			pick_tx_func(eth_dev, nix_eth_tx_vec_burst_mseg);
+	}
+#else
+	RTE_SET_USED(eth_dev);
+#endif
+}
+
+static void
+cn20k_eth_set_tx_blk_func(struct rte_eth_dev *eth_dev)
+{
+#if defined(CNXK_DIS_TMPLT_FUNC)
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	if (dev->scalar_ena || dev->tx_mark)
+		eth_dev->tx_pkt_burst = cn20k_nix_xmit_pkts_all_offload;
+	else
+		eth_dev->tx_pkt_burst = cn20k_nix_xmit_pkts_vec_all_offload;
+
+	if (eth_dev->data->dev_started)
+		rte_eth_fp_ops[eth_dev->data->port_id].tx_pkt_burst = eth_dev->tx_pkt_burst;
+#else
+	RTE_SET_USED(eth_dev);
+#endif
+}
+#endif
+
+void
+cn20k_eth_set_tx_function(struct rte_eth_dev *eth_dev)
+{
+#if defined(RTE_ARCH_ARM64)
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	cn20k_eth_set_tx_blk_func(eth_dev);
+	cn20k_eth_set_tx_tmplt_func(eth_dev);
+
+	if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_SECURITY)
+		eth_dev->tx_queue_count = cn20k_nix_tx_queue_sec_count;
+	else
+		eth_dev->tx_queue_count = cn20k_nix_tx_queue_count;
+
+	rte_atomic_thread_fence(rte_memory_order_release);
+#else
+	RTE_SET_USED(eth_dev);
+#endif
+}
diff --git a/drivers/net/cnxk/meson.build b/drivers/net/cnxk/meson.build
index f41238be9c..fcf48f600a 100644
--- a/drivers/net/cnxk/meson.build
+++ b/drivers/net/cnxk/meson.build
@@ -237,6 +237,7 @@ if soc_type == 'cn20k' or soc_type == 'all'
 sources += files(
         'cn20k_ethdev.c',
         'cn20k_rx_select.c',
+        'cn20k_tx_select.c',
 )
 
 if host_machine.cpu_family().startswith('aarch') and not disable_template
@@ -276,9 +277,45 @@ sources += files(
         'rx/cn20k/rx_all_offload.c',
 )
 
+sources += files(
+        'tx/cn20k/tx_0_15.c',
+        'tx/cn20k/tx_16_31.c',
+        'tx/cn20k/tx_32_47.c',
+        'tx/cn20k/tx_48_63.c',
+        'tx/cn20k/tx_64_79.c',
+        'tx/cn20k/tx_80_95.c',
+        'tx/cn20k/tx_96_111.c',
+        'tx/cn20k/tx_112_127.c',
+        'tx/cn20k/tx_0_15_mseg.c',
+        'tx/cn20k/tx_16_31_mseg.c',
+        'tx/cn20k/tx_32_47_mseg.c',
+        'tx/cn20k/tx_48_63_mseg.c',
+        'tx/cn20k/tx_64_79_mseg.c',
+        'tx/cn20k/tx_80_95_mseg.c',
+        'tx/cn20k/tx_96_111_mseg.c',
+        'tx/cn20k/tx_112_127_mseg.c',
+        'tx/cn20k/tx_0_15_vec.c',
+        'tx/cn20k/tx_16_31_vec.c',
+        'tx/cn20k/tx_32_47_vec.c',
+        'tx/cn20k/tx_48_63_vec.c',
+        'tx/cn20k/tx_64_79_vec.c',
+        'tx/cn20k/tx_80_95_vec.c',
+        'tx/cn20k/tx_96_111_vec.c',
+        'tx/cn20k/tx_112_127_vec.c',
+        'tx/cn20k/tx_0_15_vec_mseg.c',
+        'tx/cn20k/tx_16_31_vec_mseg.c',
+        'tx/cn20k/tx_32_47_vec_mseg.c',
+        'tx/cn20k/tx_48_63_vec_mseg.c',
+        'tx/cn20k/tx_64_79_vec_mseg.c',
+        'tx/cn20k/tx_80_95_vec_mseg.c',
+        'tx/cn20k/tx_96_111_vec_mseg.c',
+        'tx/cn20k/tx_112_127_vec_mseg.c',
+        'tx/cn20k/tx_all_offload.c',
+)
 else
 sources += files(
         'rx/cn20k/rx_all_offload.c',
+        'tx/cn20k/tx_all_offload.c',
 )
 endif
 endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_0_15.c b/drivers/net/cnxk/tx/cn20k/tx_0_15.c
new file mode 100644
index 0000000000..2de434ccb4
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_0_15.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_0_15
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c
new file mode 100644
index 0000000000..c928902b02
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_0_15
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c b/drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c
new file mode 100644
index 0000000000..0e82451c7e
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_0_15
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c
new file mode 100644
index 0000000000..b0cd33f781
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_0_15
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_112_127.c b/drivers/net/cnxk/tx/cn20k/tx_112_127.c
new file mode 100644
index 0000000000..c116c48763
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_112_127.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_112_127
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c
new file mode 100644
index 0000000000..5d67426f2b
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_112_127
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c b/drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c
new file mode 100644
index 0000000000..5a3e5c660d
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_112_127
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c
new file mode 100644
index 0000000000..c6918de6df
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_112_127
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_16_31.c b/drivers/net/cnxk/tx/cn20k/tx_16_31.c
new file mode 100644
index 0000000000..953f63b192
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_16_31.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_16_31
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c
new file mode 100644
index 0000000000..cdfd6bf69c
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_16_31
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c b/drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c
new file mode 100644
index 0000000000..6e6ad7c968
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_16_31
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c
new file mode 100644
index 0000000000..a3a0fcace3
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_16_31
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_32_47.c b/drivers/net/cnxk/tx/cn20k/tx_32_47.c
new file mode 100644
index 0000000000..50295fcd16
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_32_47.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_32_47
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c
new file mode 100644
index 0000000000..8b4da505ad
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_32_47
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c b/drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c
new file mode 100644
index 0000000000..3a3298ffa6
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_32_47
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c
new file mode 100644
index 0000000000..93168990a8
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_32_47
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_48_63.c b/drivers/net/cnxk/tx/cn20k/tx_48_63.c
new file mode 100644
index 0000000000..5765b1fe57
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_48_63.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_48_63
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c
new file mode 100644
index 0000000000..5f591eee68
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_48_63
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c b/drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c
new file mode 100644
index 0000000000..06eec15976
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_48_63
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c
new file mode 100644
index 0000000000..220f117c47
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_48_63
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_64_79.c b/drivers/net/cnxk/tx/cn20k/tx_64_79.c
new file mode 100644
index 0000000000..c05ef2a238
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_64_79.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_64_79
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c
new file mode 100644
index 0000000000..79d40a09ed
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_64_79
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c b/drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c
new file mode 100644
index 0000000000..a4fac7e73e
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_64_79
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c
new file mode 100644
index 0000000000..90d6b4f2f9
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_64_79
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_80_95.c b/drivers/net/cnxk/tx/cn20k/tx_80_95.c
new file mode 100644
index 0000000000..8a09ff842b
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_80_95.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_80_95
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c
new file mode 100644
index 0000000000..59f959b29f
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_80_95
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c b/drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c
new file mode 100644
index 0000000000..ca78d42344
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_80_95
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c
new file mode 100644
index 0000000000..a3a9856783
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_80_95
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_96_111.c b/drivers/net/cnxk/tx/cn20k/tx_96_111.c
new file mode 100644
index 0000000000..fab39f8fcc
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_96_111.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_96_111
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c
new file mode 100644
index 0000000000..11b6814223
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_96_111
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c b/drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c
new file mode 100644
index 0000000000..e1e3b1bca3
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_96_111
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c
new file mode 100644
index 0000000000..b6af4e34c0
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_96_111
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_all_offload.c b/drivers/net/cnxk/tx/cn20k/tx_all_offload.c
new file mode 100644
index 0000000000..c7258b5df7
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_all_offload.c
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if defined(CNXK_DIS_TMPLT_FUNC)
+
+uint16_t __rte_noinline __rte_hot
+cn20k_nix_xmit_pkts_all_offload(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts)
+{
+	uint64_t cmd[8 + CNXK_NIX_TX_MSEG_SG_DWORDS - 2];
+
+	return cn20k_nix_xmit_pkts_mseg(
+		tx_queue, NULL, tx_pkts, pkts, cmd,
+		NIX_TX_OFFLOAD_L3_L4_CSUM_F | NIX_TX_OFFLOAD_OL3_OL4_CSUM_F |
+			NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_MBUF_NOFF_F |
+			NIX_TX_OFFLOAD_TSO_F | NIX_TX_OFFLOAD_TSTAMP_F | NIX_TX_OFFLOAD_SECURITY_F |
+			NIX_TX_MULTI_SEG_F);
+}
+
+uint16_t __rte_noinline __rte_hot
+cn20k_nix_xmit_pkts_vec_all_offload(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts)
+{
+	uint64_t cmd[8 + CNXK_NIX_TX_MSEG_SG_DWORDS - 2];
+
+	return cn20k_nix_xmit_pkts_vector(
+		tx_queue, NULL, tx_pkts, pkts, cmd,
+		NIX_TX_OFFLOAD_L3_L4_CSUM_F | NIX_TX_OFFLOAD_OL3_OL4_CSUM_F |
+			NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_MBUF_NOFF_F |
+			NIX_TX_OFFLOAD_TSO_F | NIX_TX_OFFLOAD_TSTAMP_F | NIX_TX_OFFLOAD_SECURITY_F |
+			NIX_TX_MULTI_SEG_F);
+}
+
+#endif
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 21/33] net/cnxk: support Rx burst scalar for cn20k
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (19 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 20/33] net/cnxk: support Tx " Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 22/33] net/cnxk: support Rx burst vector " Nithin Dabilpuram
                   ` (14 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rahul Bhansali, Pavan Nikhilesh

Add Rx burst support scalar version for cn20k.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 drivers/net/cnxk/cn20k_ethdev.c    | 126 +++++++++
 drivers/net/cnxk/cn20k_rx.h        | 394 ++++++++++++++++++++++++++++-
 drivers/net/cnxk/cn20k_rx_select.c |   6 +-
 drivers/net/cnxk/cn20k_rxtx.h      | 156 ++++++++++++
 4 files changed, 674 insertions(+), 8 deletions(-)

diff --git a/drivers/net/cnxk/cn20k_ethdev.c b/drivers/net/cnxk/cn20k_ethdev.c
index 4b2f04ba31..cad7b1316a 100644
--- a/drivers/net/cnxk/cn20k_ethdev.c
+++ b/drivers/net/cnxk/cn20k_ethdev.c
@@ -330,6 +330,33 @@ cn20k_nix_rx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid, uint16_t nb_
 	return 0;
 }
 
+static void
+cn20k_nix_rx_queue_meta_aura_update(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct cnxk_eth_rxq_sp *rxq_sp;
+	struct cn20k_eth_rxq *rxq;
+	struct roc_nix_rq *rq;
+	int i;
+
+	/* Update Aura handle for fastpath rx queues */
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		rq = &dev->rqs[i];
+		rxq = eth_dev->data->rx_queues[i];
+		rxq->meta_aura = rq->meta_aura_handle;
+		rxq->meta_pool = dev->nix.meta_mempool;
+		/* Assume meta packet from normal aura if meta aura is not setup
+		 */
+		if (!rxq->meta_aura) {
+			rxq_sp = cnxk_eth_rxq_to_sp(rxq);
+			rxq->meta_aura = rxq_sp->qconf.mp->pool_id;
+			rxq->meta_pool = (uintptr_t)rxq_sp->qconf.mp;
+		}
+	}
+	/* Store mempool in lookup mem */
+	cnxk_nix_lookup_mem_metapool_set(dev);
+}
+
 static int
 cn20k_nix_tx_queue_stop(struct rte_eth_dev *eth_dev, uint16_t qidx)
 {
@@ -371,6 +398,74 @@ cn20k_nix_configure(struct rte_eth_dev *eth_dev)
 	return 0;
 }
 
+/* Function to enable ptp config for VFs */
+static void
+nix_ptp_enable_vf(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	if (nix_recalc_mtu(eth_dev))
+		plt_err("Failed to set MTU size for ptp");
+
+	dev->rx_offload_flags |= NIX_RX_OFFLOAD_TSTAMP_F;
+
+	/* Setting up the function pointers as per new offload flags */
+	cn20k_eth_set_rx_function(eth_dev);
+	cn20k_eth_set_tx_function(eth_dev);
+}
+
+static uint16_t
+nix_ptp_vf_burst(void *queue, struct rte_mbuf **mbufs, uint16_t pkts)
+{
+	struct cn20k_eth_rxq *rxq = queue;
+	struct cnxk_eth_rxq_sp *rxq_sp;
+	struct rte_eth_dev *eth_dev;
+
+	RTE_SET_USED(mbufs);
+	RTE_SET_USED(pkts);
+
+	rxq_sp = cnxk_eth_rxq_to_sp(rxq);
+	eth_dev = rxq_sp->dev->eth_dev;
+	nix_ptp_enable_vf(eth_dev);
+
+	return 0;
+}
+
+static int
+cn20k_nix_ptp_info_update_cb(struct roc_nix *nix, bool ptp_en)
+{
+	struct cnxk_eth_dev *dev = (struct cnxk_eth_dev *)nix;
+	struct rte_eth_dev *eth_dev;
+	struct cn20k_eth_rxq *rxq;
+	int i;
+
+	if (!dev)
+		return -EINVAL;
+
+	eth_dev = dev->eth_dev;
+	if (!eth_dev)
+		return -EINVAL;
+
+	dev->ptp_en = ptp_en;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		rxq = eth_dev->data->rx_queues[i];
+		rxq->mbuf_initializer = cnxk_nix_rxq_mbuf_setup(dev);
+	}
+
+	if (roc_nix_is_vf_or_sdp(nix) && !(roc_nix_is_sdp(nix)) && !(roc_nix_is_lbk(nix))) {
+		/* In case of VF, setting of MTU cannot be done directly in this
+		 * function as this is running as part of MBOX request(PF->VF)
+		 * and MTU setting also requires MBOX message to be
+		 * sent(VF->PF)
+		 */
+		eth_dev->rx_pkt_burst = nix_ptp_vf_burst;
+		rte_mb();
+	}
+
+	return 0;
+}
+
 static int
 cn20k_nix_timesync_enable(struct rte_eth_dev *eth_dev)
 {
@@ -451,11 +546,21 @@ cn20k_nix_dev_start(struct rte_eth_dev *eth_dev)
 	if (rc)
 		return rc;
 
+	/* Update VF about data off shifted by 8 bytes if PTP already
+	 * enabled in PF owning this VF
+	 */
+	if (dev->ptp_en && (!roc_nix_is_pf(nix) && (!roc_nix_is_sdp(nix))))
+		nix_ptp_enable_vf(eth_dev);
+
 	/* Setting up the rx[tx]_offload_flags due to change
 	 * in rx[tx]_offloads.
 	 */
 	dev->rx_offload_flags |= nix_rx_offload_flags(eth_dev);
 	dev->tx_offload_flags |= nix_tx_offload_flags(eth_dev);
+
+	if (dev->rx_offload_flags & NIX_RX_OFFLOAD_SECURITY_F)
+		cn20k_nix_rx_queue_meta_aura_update(eth_dev);
+
 	/* Set flags for Rx Inject feature */
 	if (roc_idev_nix_rx_inject_get(nix->port_id))
 		dev->rx_offload_flags |= NIX_RX_SEC_REASSEMBLY_F;
@@ -621,6 +726,20 @@ nix_tm_ops_override(void)
 	if (init_once)
 		return;
 	init_once = 1;
+
+	/* Update platform specific ops */
+}
+
+static void
+npc_flow_ops_override(void)
+{
+	static int init_once;
+
+	if (init_once)
+		return;
+	init_once = 1;
+
+	/* Update platform specific ops */
 }
 
 static int
@@ -633,6 +752,7 @@ static int
 cn20k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 {
 	struct rte_eth_dev *eth_dev;
+	struct cnxk_eth_dev *dev;
 	int rc;
 
 	rc = roc_plt_init();
@@ -643,6 +763,7 @@ cn20k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 
 	nix_eth_dev_ops_override();
 	nix_tm_ops_override();
+	npc_flow_ops_override();
 
 	/* Common probe */
 	rc = cnxk_nix_probe(pci_drv, pci_dev);
@@ -665,6 +786,11 @@ cn20k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		return 0;
 	}
 
+	dev = cnxk_eth_pmd_priv(eth_dev);
+
+	/* Register up msg callbacks for PTP information */
+	roc_nix_ptp_info_cb_register(&dev->nix, cn20k_nix_ptp_info_update_cb);
+
 	return 0;
 }
 
diff --git a/drivers/net/cnxk/cn20k_rx.h b/drivers/net/cnxk/cn20k_rx.h
index 2cb77c0b46..22abf7bbd8 100644
--- a/drivers/net/cnxk/cn20k_rx.h
+++ b/drivers/net/cnxk/cn20k_rx.h
@@ -29,8 +29,397 @@
 #define NIX_RX_VWQE_F	   BIT(13)
 #define NIX_RX_MULTI_SEG_F BIT(14)
 
+#define CNXK_NIX_CQ_ENTRY_SZ 128
+#define NIX_DESCS_PER_LOOP   4
+#define CQE_CAST(x)	     ((struct nix_cqe_hdr_s *)(x))
+#define CQE_SZ(x)	     ((x) * CNXK_NIX_CQ_ENTRY_SZ)
+
+#define CQE_PTR_OFF(b, i, o, f)                                                                    \
+	(((f) & NIX_RX_VWQE_F) ? (uint64_t *)(((uintptr_t)((uint64_t *)(b))[i]) + (o)) :           \
+				 (uint64_t *)(((uintptr_t)(b)) + CQE_SZ(i) + (o)))
+#define CQE_PTR_DIFF(b, i, o, f)                                                                   \
+	(((f) & NIX_RX_VWQE_F) ? (uint64_t *)(((uintptr_t)((uint64_t *)(b))[i]) - (o)) :           \
+				 (uint64_t *)(((uintptr_t)(b)) + CQE_SZ(i) - (o)))
+
+#define NIX_RX_SEC_UCC_CONST                                                                       \
+	((RTE_MBUF_F_RX_IP_CKSUM_BAD >> 1) |                                                       \
+	 ((RTE_MBUF_F_RX_IP_CKSUM_GOOD | RTE_MBUF_F_RX_L4_CKSUM_GOOD) >> 1) << 8 |                 \
+	 ((RTE_MBUF_F_RX_IP_CKSUM_GOOD | RTE_MBUF_F_RX_L4_CKSUM_BAD) >> 1) << 16 |                 \
+	 ((RTE_MBUF_F_RX_IP_CKSUM_GOOD | RTE_MBUF_F_RX_L4_CKSUM_GOOD) >> 1) << 32 |                \
+	 ((RTE_MBUF_F_RX_IP_CKSUM_GOOD | RTE_MBUF_F_RX_L4_CKSUM_GOOD) >> 1) << 48)
+
+#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
+static inline void
+nix_mbuf_validate_next(struct rte_mbuf *m)
+{
+	if (m->nb_segs == 1 && m->next) {
+		rte_panic("mbuf->next[%p] valid when mbuf->nb_segs is %d", m->next, m->nb_segs);
+	}
+}
+#else
+static inline void
+nix_mbuf_validate_next(struct rte_mbuf *m)
+{
+	RTE_SET_USED(m);
+}
+#endif
+
 #define NIX_RX_SEC_REASSEMBLY_F (NIX_RX_REAS_F | NIX_RX_OFFLOAD_SECURITY_F)
 
+static inline rte_eth_ip_reassembly_dynfield_t *
+cnxk_ip_reassembly_dynfield(struct rte_mbuf *mbuf, int ip_reassembly_dynfield_offset)
+{
+	return RTE_MBUF_DYNFIELD(mbuf, ip_reassembly_dynfield_offset,
+				 rte_eth_ip_reassembly_dynfield_t *);
+}
+
+union mbuf_initializer {
+	struct {
+		uint16_t data_off;
+		uint16_t refcnt;
+		uint16_t nb_segs;
+		uint16_t port;
+	} fields;
+	uint64_t value;
+};
+
+static __rte_always_inline uint64_t
+nix_clear_data_off(uint64_t oldval)
+{
+	union mbuf_initializer mbuf_init = {.value = oldval};
+
+	mbuf_init.fields.data_off = 0;
+	return mbuf_init.value;
+}
+
+static __rte_always_inline struct rte_mbuf *
+nix_get_mbuf_from_cqe(void *cq, const uint64_t data_off)
+{
+	rte_iova_t buff;
+
+	/* Skip CQE, NIX_RX_PARSE_S and SG HDR(9 DWORDs) and peek buff addr */
+	buff = *((rte_iova_t *)((uint64_t *)cq + 9));
+	return (struct rte_mbuf *)(buff - data_off);
+}
+
+static __rte_always_inline uint32_t
+nix_ptype_get(const void *const lookup_mem, const uint64_t in)
+{
+	const uint16_t *const ptype = lookup_mem;
+	const uint16_t lh_lg_lf = (in & 0xFFF0000000000000) >> 52;
+	const uint16_t tu_l2 = ptype[(in & 0x000FFFF000000000) >> 36];
+	const uint16_t il4_tu = ptype[PTYPE_NON_TUNNEL_ARRAY_SZ + lh_lg_lf];
+
+	return (il4_tu << PTYPE_NON_TUNNEL_WIDTH) | tu_l2;
+}
+
+static __rte_always_inline uint32_t
+nix_rx_olflags_get(const void *const lookup_mem, const uint64_t in)
+{
+	const uint32_t *const ol_flags =
+		(const uint32_t *)((const uint8_t *)lookup_mem + PTYPE_ARRAY_SZ);
+
+	return ol_flags[(in & 0xfff00000) >> 20];
+}
+
+static inline uint64_t
+nix_update_match_id(const uint16_t match_id, uint64_t ol_flags, struct rte_mbuf *mbuf)
+{
+	/* There is no separate bit to check match_id
+	 * is valid or not? and no flag to identify it is an
+	 * RTE_FLOW_ACTION_TYPE_FLAG vs RTE_FLOW_ACTION_TYPE_MARK
+	 * action. The former case addressed through 0 being invalid
+	 * value and inc/dec match_id pair when MARK is activated.
+	 * The later case addressed through defining
+	 * CNXK_FLOW_MARK_DEFAULT as value for
+	 * RTE_FLOW_ACTION_TYPE_MARK.
+	 * This would translate to not use
+	 * CNXK_FLOW_ACTION_FLAG_DEFAULT - 1 and
+	 * CNXK_FLOW_ACTION_FLAG_DEFAULT for match_id.
+	 * i.e valid mark_id's are from
+	 * 0 to CNXK_FLOW_ACTION_FLAG_DEFAULT - 2
+	 */
+	if (likely(match_id)) {
+		ol_flags |= RTE_MBUF_F_RX_FDIR;
+		if (match_id != CNXK_FLOW_ACTION_FLAG_DEFAULT) {
+			ol_flags |= RTE_MBUF_F_RX_FDIR_ID;
+			mbuf->hash.fdir.hi = match_id - 1;
+		}
+	}
+
+	return ol_flags;
+}
+
+static __rte_always_inline void
+nix_cqe_xtract_mseg(const union nix_rx_parse_u *rx, struct rte_mbuf *mbuf, uint64_t rearm,
+		    uintptr_t cpth, uintptr_t sa_base, const uint16_t flags)
+{
+	const rte_iova_t *iova_list;
+	uint16_t later_skip = 0;
+	struct rte_mbuf *head;
+	const rte_iova_t *eol;
+	uint8_t nb_segs;
+	uint16_t sg_len;
+	int64_t len;
+	uint64_t sg;
+	uintptr_t p;
+
+	(void)cpth;
+	(void)sa_base;
+
+	sg = *(const uint64_t *)(rx + 1);
+	nb_segs = (sg >> 48) & 0x3;
+
+	if (nb_segs == 1)
+		return;
+
+	len = rx->pkt_lenm1 + 1;
+
+	mbuf->pkt_len = len - (flags & NIX_RX_OFFLOAD_TSTAMP_F ? CNXK_NIX_TIMESYNC_RX_OFFSET : 0);
+	mbuf->nb_segs = nb_segs;
+	head = mbuf;
+	mbuf->data_len =
+		(sg & 0xFFFF) - (flags & NIX_RX_OFFLOAD_TSTAMP_F ? CNXK_NIX_TIMESYNC_RX_OFFSET : 0);
+	eol = ((const rte_iova_t *)(rx + 1) + ((rx->desc_sizem1 + 1) << 1));
+
+	len -= mbuf->data_len;
+	sg = sg >> 16;
+	/* Skip SG_S and first IOVA*/
+	iova_list = ((const rte_iova_t *)(rx + 1)) + 2;
+	nb_segs--;
+
+	later_skip = (uintptr_t)mbuf->buf_addr - (uintptr_t)mbuf;
+
+	while (nb_segs) {
+		mbuf->next = (struct rte_mbuf *)(*iova_list - later_skip);
+		mbuf = mbuf->next;
+
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf->pool, (void **)&mbuf, 1, 1);
+
+		sg_len = sg & 0XFFFF;
+
+		mbuf->data_len = sg_len;
+		sg = sg >> 16;
+		p = (uintptr_t)&mbuf->rearm_data;
+		*(uint64_t *)p = rearm & ~0xFFFF;
+		nb_segs--;
+		iova_list++;
+
+		if (!nb_segs && (iova_list + 1 < eol)) {
+			sg = *(const uint64_t *)(iova_list);
+			nb_segs = (sg >> 48) & 0x3;
+			head->nb_segs += nb_segs;
+			iova_list = (const rte_iova_t *)(iova_list + 1);
+		}
+	}
+}
+
+static __rte_always_inline void
+cn20k_nix_cqe_to_mbuf(const struct nix_cqe_hdr_s *cq, const uint32_t tag, struct rte_mbuf *mbuf,
+		      const void *lookup_mem, const uint64_t val, const uintptr_t cpth,
+		      const uintptr_t sa_base, const uint16_t flag)
+{
+	const union nix_rx_parse_u *rx = (const union nix_rx_parse_u *)((const uint64_t *)cq + 1);
+	const uint64_t w1 = *(const uint64_t *)rx;
+	uint16_t len = rx->pkt_lenm1 + 1;
+	uint64_t ol_flags = 0;
+	uintptr_t p;
+
+	if (flag & NIX_RX_OFFLOAD_PTYPE_F)
+		mbuf->packet_type = nix_ptype_get(lookup_mem, w1);
+	else
+		mbuf->packet_type = 0;
+
+	if (flag & NIX_RX_OFFLOAD_RSS_F) {
+		mbuf->hash.rss = tag;
+		ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
+	}
+
+	/* Skip rx ol flags extraction for Security packets */
+	ol_flags |= (uint64_t)nix_rx_olflags_get(lookup_mem, w1);
+
+	if (flag & NIX_RX_OFFLOAD_VLAN_STRIP_F) {
+		if (rx->vtag0_gone) {
+			ol_flags |= RTE_MBUF_F_RX_VLAN | RTE_MBUF_F_RX_VLAN_STRIPPED;
+			mbuf->vlan_tci = rx->vtag0_tci;
+		}
+		if (rx->vtag1_gone) {
+			ol_flags |= RTE_MBUF_F_RX_QINQ | RTE_MBUF_F_RX_QINQ_STRIPPED;
+			mbuf->vlan_tci_outer = rx->vtag1_tci;
+		}
+	}
+
+	if (flag & NIX_RX_OFFLOAD_MARK_UPDATE_F)
+		ol_flags = nix_update_match_id(rx->match_id, ol_flags, mbuf);
+
+	mbuf->ol_flags = ol_flags;
+	mbuf->pkt_len = len;
+	mbuf->data_len = len;
+	p = (uintptr_t)&mbuf->rearm_data;
+	*(uint64_t *)p = val;
+
+	if (flag & NIX_RX_MULTI_SEG_F)
+		/*
+		 * For multi segment packets, mbuf length correction according
+		 * to Rx timestamp length will be handled later during
+		 * timestamp data process.
+		 * Hence, timestamp flag argument is not required.
+		 */
+		nix_cqe_xtract_mseg(rx, mbuf, val, cpth, sa_base, flag & ~NIX_RX_OFFLOAD_TSTAMP_F);
+}
+
+static inline uint16_t
+nix_rx_nb_pkts(struct cn20k_eth_rxq *rxq, const uint64_t wdata, const uint16_t pkts,
+	       const uint32_t qmask)
+{
+	uint32_t available = rxq->available;
+
+	/* Update the available count if cached value is not enough */
+	if (unlikely(available < pkts)) {
+		uint64_t reg, head, tail;
+
+		/* Use LDADDA version to avoid reorder */
+		reg = roc_atomic64_add_sync(wdata, rxq->cq_status);
+		/* CQ_OP_STATUS operation error */
+		if (reg & BIT_ULL(NIX_CQ_OP_STAT_OP_ERR) || reg & BIT_ULL(NIX_CQ_OP_STAT_CQ_ERR))
+			return 0;
+
+		tail = reg & 0xFFFFF;
+		head = (reg >> 20) & 0xFFFFF;
+		if (tail < head)
+			available = tail - head + qmask + 1;
+		else
+			available = tail - head;
+
+		rxq->available = available;
+	}
+
+	return RTE_MIN(pkts, available);
+}
+
+static __rte_always_inline void
+cn20k_nix_mbuf_to_tstamp(struct rte_mbuf *mbuf, struct cnxk_timesync_info *tstamp,
+			 const uint8_t ts_enable, uint64_t *tstamp_ptr)
+{
+	if (ts_enable) {
+		mbuf->pkt_len -= CNXK_NIX_TIMESYNC_RX_OFFSET;
+		mbuf->data_len -= CNXK_NIX_TIMESYNC_RX_OFFSET;
+
+		/* Reading the rx timestamp inserted by CGX, viz at
+		 * starting of the packet data.
+		 */
+		*tstamp_ptr = ((*tstamp_ptr >> 32) * NSEC_PER_SEC) + (*tstamp_ptr & 0xFFFFFFFFUL);
+		*cnxk_nix_timestamp_dynfield(mbuf, tstamp) = rte_be_to_cpu_64(*tstamp_ptr);
+		/* RTE_MBUF_F_RX_IEEE1588_TMST flag needs to be set only in case
+		 * PTP packets are received.
+		 */
+		if (mbuf->packet_type == RTE_PTYPE_L2_ETHER_TIMESYNC) {
+			tstamp->rx_tstamp = *cnxk_nix_timestamp_dynfield(mbuf, tstamp);
+			tstamp->rx_ready = 1;
+			mbuf->ol_flags |= RTE_MBUF_F_RX_IEEE1588_PTP | RTE_MBUF_F_RX_IEEE1588_TMST |
+					  tstamp->rx_tstamp_dynflag;
+		}
+	}
+}
+
+static __rte_always_inline uint16_t
+cn20k_nix_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts, const uint16_t flags)
+{
+	struct cn20k_eth_rxq *rxq = rx_queue;
+	const uint64_t mbuf_init = rxq->mbuf_initializer;
+	const void *lookup_mem = rxq->lookup_mem;
+	const uint64_t data_off = rxq->data_off;
+	const uintptr_t desc = rxq->desc;
+	const uint64_t wdata = rxq->wdata;
+	const uint32_t qmask = rxq->qmask;
+	uint16_t packets = 0, nb_pkts;
+	uint32_t head = rxq->head;
+	struct nix_cqe_hdr_s *cq;
+	struct rte_mbuf *mbuf;
+	uint64_t sa_base = 0;
+	uintptr_t cpth = 0;
+
+	nb_pkts = nix_rx_nb_pkts(rxq, wdata, pkts, qmask);
+
+	while (packets < nb_pkts) {
+		/* Prefetch N desc ahead */
+		rte_prefetch_non_temporal((void *)(desc + (CQE_SZ((head + 2) & qmask))));
+		cq = (struct nix_cqe_hdr_s *)(desc + CQE_SZ(head));
+
+		mbuf = nix_get_mbuf_from_cqe(cq, data_off);
+
+		/* Mark mempool obj as "get" as it is alloc'ed by NIX */
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf->pool, (void **)&mbuf, 1, 1);
+
+		cn20k_nix_cqe_to_mbuf(cq, cq->tag, mbuf, lookup_mem, mbuf_init, cpth, sa_base,
+				      flags);
+		cn20k_nix_mbuf_to_tstamp(mbuf, rxq->tstamp, (flags & NIX_RX_OFFLOAD_TSTAMP_F),
+					 (uint64_t *)((uint8_t *)mbuf + data_off));
+		rx_pkts[packets++] = mbuf;
+		roc_prefetch_store_keep(mbuf);
+		head++;
+		head &= qmask;
+	}
+
+	rxq->head = head;
+	rxq->available -= nb_pkts;
+
+	/* Free all the CQs that we've processed */
+	plt_write64((wdata | nb_pkts), rxq->cq_door);
+
+	return nb_pkts;
+}
+
+static __rte_always_inline uint16_t
+cn20k_nix_flush_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts,
+			  const uint16_t flags)
+{
+	struct cn20k_eth_rxq *rxq = rx_queue;
+	const uint64_t mbuf_init = rxq->mbuf_initializer;
+	const void *lookup_mem = rxq->lookup_mem;
+	const uint64_t data_off = rxq->data_off;
+	const uint64_t wdata = rxq->wdata;
+	const uint32_t qmask = rxq->qmask;
+	const uintptr_t desc = rxq->desc;
+	uint16_t packets = 0, nb_pkts;
+	uint16_t lmt_id __rte_unused;
+	uint32_t head = rxq->head;
+	struct nix_cqe_hdr_s *cq;
+	struct rte_mbuf *mbuf;
+	uint64_t sa_base = 0;
+	uintptr_t cpth = 0;
+
+	nb_pkts = nix_rx_nb_pkts(rxq, wdata, pkts, qmask);
+
+	while (packets < nb_pkts) {
+		/* Prefetch N desc ahead */
+		rte_prefetch_non_temporal((void *)(desc + (CQE_SZ((head + 2) & qmask))));
+		cq = (struct nix_cqe_hdr_s *)(desc + CQE_SZ(head));
+
+		mbuf = nix_get_mbuf_from_cqe(cq, data_off);
+
+		/* Mark mempool obj as "get" as it is alloc'ed by NIX */
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf->pool, (void **)&mbuf, 1, 1);
+
+		cn20k_nix_cqe_to_mbuf(cq, cq->tag, mbuf, lookup_mem, mbuf_init, cpth, sa_base,
+				      flags);
+		cn20k_nix_mbuf_to_tstamp(mbuf, rxq->tstamp, (flags & NIX_RX_OFFLOAD_TSTAMP_F),
+					 (uint64_t *)((uint8_t *)mbuf + data_off));
+		rx_pkts[packets++] = mbuf;
+		roc_prefetch_store_keep(mbuf);
+		head++;
+		head &= qmask;
+	}
+
+	rxq->head = head;
+	rxq->available -= nb_pkts;
+
+	/* Free all the CQs that we've processed */
+	plt_write64((wdata | nb_pkts), rxq->cq_door);
+
+	return nb_pkts;
+}
+
 #define RSS_F	  NIX_RX_OFFLOAD_RSS_F
 #define PTYPE_F	  NIX_RX_OFFLOAD_PTYPE_F
 #define CKSUM_F	  NIX_RX_OFFLOAD_CHECKSUM_F
@@ -220,10 +609,7 @@ NIX_RX_FASTPATH_MODES
 	uint16_t __rte_noinline __rte_hot fn(void *rx_queue, struct rte_mbuf **rx_pkts,            \
 					     uint16_t pkts)                                        \
 	{                                                                                          \
-		RTE_SET_USED(rx_queue);                                                            \
-		RTE_SET_USED(rx_pkts);                                                             \
-		RTE_SET_USED(pkts);                                                                \
-		return 0;                                                                          \
+		return cn20k_nix_recv_pkts(rx_queue, rx_pkts, pkts, (flags));                      \
 	}
 
 #define NIX_RX_RECV_MSEG(fn, flags) NIX_RX_RECV(fn, flags | NIX_RX_MULTI_SEG_F)
diff --git a/drivers/net/cnxk/cn20k_rx_select.c b/drivers/net/cnxk/cn20k_rx_select.c
index 82e06a62ef..25c79434cd 100644
--- a/drivers/net/cnxk/cn20k_rx_select.c
+++ b/drivers/net/cnxk/cn20k_rx_select.c
@@ -22,10 +22,8 @@ pick_rx_func(struct rte_eth_dev *eth_dev, const eth_rx_burst_t rx_burst[NIX_RX_O
 static uint16_t __rte_noinline __rte_hot __rte_unused
 cn20k_nix_flush_rx(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)
 {
-	RTE_SET_USED(rx_queue);
-	RTE_SET_USED(rx_pkts);
-	RTE_SET_USED(pkts);
-	return 0;
+	const uint16_t flags = NIX_RX_MULTI_SEG_F | NIX_RX_REAS_F | NIX_RX_OFFLOAD_SECURITY_F;
+	return cn20k_nix_flush_recv_pkts(rx_queue, rx_pkts, pkts, flags);
 }
 
 #if defined(RTE_ARCH_ARM64)
diff --git a/drivers/net/cnxk/cn20k_rxtx.h b/drivers/net/cnxk/cn20k_rxtx.h
index 5cc445d4b1..03eaf34d64 100644
--- a/drivers/net/cnxk/cn20k_rxtx.h
+++ b/drivers/net/cnxk/cn20k_rxtx.h
@@ -83,7 +83,163 @@ struct cn20k_eth_rxq {
 	struct cnxk_timesync_info *tstamp;
 } __plt_cache_aligned;
 
+/* Private data in sw rsvd area of struct roc_ot_ipsec_inb_sa */
+struct cn20k_inb_priv_data {
+	void *userdata;
+	int reass_dynfield_off;
+	int reass_dynflag_bit;
+	struct cnxk_eth_sec_sess *eth_sec;
+};
+
+struct cn20k_sec_sess_priv {
+	union {
+		struct {
+			uint32_t sa_idx;
+			uint8_t inb_sa : 1;
+			uint8_t outer_ip_ver : 1;
+			uint8_t mode : 1;
+			uint8_t roundup_byte : 5;
+			uint8_t roundup_len;
+			uint16_t partial_len : 10;
+			uint16_t chksum : 2;
+			uint16_t dec_ttl : 1;
+			uint16_t nixtx_off : 1;
+			uint16_t rsvd : 2;
+		};
+
+		uint64_t u64;
+	};
+} __rte_packed;
+
 #define LMT_OFF(lmt_addr, lmt_num, offset)                                                         \
 	(void *)((uintptr_t)(lmt_addr) + ((uint64_t)(lmt_num) << ROC_LMT_LINE_SIZE_LOG2) + (offset))
 
+static inline uint16_t
+nix_tx_compl_nb_pkts(struct cn20k_eth_txq *txq, const uint64_t wdata, const uint32_t qmask)
+{
+	uint16_t available = txq->tx_compl.available;
+
+	/* Update the available count if cached value is not enough */
+	if (!unlikely(available)) {
+		uint64_t reg, head, tail;
+
+		/* Use LDADDA version to avoid reorder */
+		reg = roc_atomic64_add_sync(wdata, txq->tx_compl.cq_status);
+		/* CQ_OP_STATUS operation error */
+		if (reg & BIT_ULL(NIX_CQ_OP_STAT_OP_ERR) || reg & BIT_ULL(NIX_CQ_OP_STAT_CQ_ERR))
+			return 0;
+
+		tail = reg & 0xFFFFF;
+		head = (reg >> 20) & 0xFFFFF;
+		if (tail < head)
+			available = tail - head + qmask + 1;
+		else
+			available = tail - head;
+
+		txq->tx_compl.available = available;
+	}
+	return available;
+}
+
+static inline void
+handle_tx_completion_pkts(struct cn20k_eth_txq *txq, uint8_t mt_safe)
+{
+#define CNXK_NIX_CQ_ENTRY_SZ 128
+#define CQE_SZ(x)	     ((x) * CNXK_NIX_CQ_ENTRY_SZ)
+
+	uint16_t tx_pkts = 0, nb_pkts;
+	const uintptr_t desc = txq->tx_compl.desc_base;
+	const uint64_t wdata = txq->tx_compl.wdata;
+	const uint32_t qmask = txq->tx_compl.qmask;
+	uint32_t head = txq->tx_compl.head;
+	struct nix_cqe_hdr_s *tx_compl_cq;
+	struct nix_send_comp_s *tx_compl_s0;
+	struct rte_mbuf *m_next, *m;
+
+	if (mt_safe)
+		rte_spinlock_lock(&txq->tx_compl.ext_buf_lock);
+
+	nb_pkts = nix_tx_compl_nb_pkts(txq, wdata, qmask);
+	while (tx_pkts < nb_pkts) {
+		rte_prefetch_non_temporal((void *)(desc + (CQE_SZ((head + 2) & qmask))));
+		tx_compl_cq = (struct nix_cqe_hdr_s *)(desc + CQE_SZ(head));
+		tx_compl_s0 = (struct nix_send_comp_s *)((uint64_t *)tx_compl_cq + 1);
+		m = txq->tx_compl.ptr[tx_compl_s0->sqe_id];
+		while (m->next != NULL) {
+			m_next = m->next;
+			rte_pktmbuf_free_seg(m);
+			m = m_next;
+		}
+		rte_pktmbuf_free_seg(m);
+		txq->tx_compl.ptr[tx_compl_s0->sqe_id] = NULL;
+
+		head++;
+		head &= qmask;
+		tx_pkts++;
+	}
+	txq->tx_compl.head = head;
+	txq->tx_compl.available -= nb_pkts;
+
+	plt_write64((wdata | nb_pkts), txq->tx_compl.cq_door);
+
+	if (mt_safe)
+		rte_spinlock_unlock(&txq->tx_compl.ext_buf_lock);
+}
+
+static __rte_always_inline uint64_t
+cn20k_cpt_tx_steor_data(void)
+{
+	/* We have two CPT instructions per LMTLine TODO */
+	const uint64_t dw_m1 = ROC_CN10K_TWO_CPT_INST_DW_M1;
+	uint64_t data;
+
+	/* This will be moved to addr area */
+	data = dw_m1 << 16;
+	data |= dw_m1 << 19;
+	data |= dw_m1 << 22;
+	data |= dw_m1 << 25;
+	data |= dw_m1 << 28;
+	data |= dw_m1 << 31;
+	data |= dw_m1 << 34;
+	data |= dw_m1 << 37;
+	data |= dw_m1 << 40;
+	data |= dw_m1 << 43;
+	data |= dw_m1 << 46;
+	data |= dw_m1 << 49;
+	data |= dw_m1 << 52;
+	data |= dw_m1 << 55;
+	data |= dw_m1 << 58;
+	data |= dw_m1 << 61;
+
+	return data;
+}
+
+static __rte_always_inline void
+cn20k_nix_sec_steorl(uintptr_t io_addr, uint32_t lmt_id, uint8_t lnum, uint8_t loff, uint8_t shft)
+{
+	uint64_t data;
+	uintptr_t pa;
+
+	/* Check if there is any CPT instruction to submit */
+	if (!lnum && !loff)
+		return;
+
+	data = cn20k_cpt_tx_steor_data();
+	/* Update lmtline use for partial end line */
+	if (loff) {
+		data &= ~(0x7ULL << shft);
+		/* Update it to half full i.e 64B */
+		data |= (0x3UL << shft);
+	}
+
+	pa = io_addr | ((data >> 16) & 0x7) << 4;
+	data &= ~(0x7ULL << 16);
+	/* Update lines - 1 that contain valid data */
+	data |= ((uint64_t)(lnum + loff - 1)) << 12;
+	data |= (uint64_t)lmt_id;
+
+	/* STEOR */
+	roc_lmt_submit_steorl(data, pa);
+}
+
 #endif /* __CN20K_RXTX_H__ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 22/33] net/cnxk: support Rx burst vector for cn20k
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (20 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 21/33] net/cnxk: support Rx burst scalar " Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:58 ` [PATCH 23/33] net/cnxk: support Tx burst scalar " Nithin Dabilpuram
                   ` (13 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rahul Bhansali, Pavan Nikhilesh

Add Rx vector support for cn20k

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 drivers/net/cnxk/cn20k_rx.h | 463 +++++++++++++++++++++++++++++++++++-
 1 file changed, 459 insertions(+), 4 deletions(-)

diff --git a/drivers/net/cnxk/cn20k_rx.h b/drivers/net/cnxk/cn20k_rx.h
index 22abf7bbd8..d1bf0c615e 100644
--- a/drivers/net/cnxk/cn20k_rx.h
+++ b/drivers/net/cnxk/cn20k_rx.h
@@ -420,6 +420,463 @@ cn20k_nix_flush_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pk
 	return nb_pkts;
 }
 
+#if defined(RTE_ARCH_ARM64)
+
+static __rte_always_inline uint64_t
+nix_vlan_update(const uint64_t w2, uint64_t ol_flags, uint8x16_t *f)
+{
+	if (w2 & BIT_ULL(21) /* vtag0_gone */) {
+		ol_flags |= RTE_MBUF_F_RX_VLAN | RTE_MBUF_F_RX_VLAN_STRIPPED;
+		*f = vsetq_lane_u16((uint16_t)(w2 >> 32), *f, 5);
+	}
+
+	return ol_flags;
+}
+
+static __rte_always_inline uint64_t
+nix_qinq_update(const uint64_t w2, uint64_t ol_flags, struct rte_mbuf *mbuf)
+{
+	if (w2 & BIT_ULL(23) /* vtag1_gone */) {
+		ol_flags |= RTE_MBUF_F_RX_QINQ | RTE_MBUF_F_RX_QINQ_STRIPPED;
+		mbuf->vlan_tci_outer = (uint16_t)(w2 >> 48);
+	}
+
+	return ol_flags;
+}
+
+#define NIX_PUSH_META_TO_FREE(_mbuf, _laddr, _loff_p)                                              \
+	do {                                                                                       \
+		*(uint64_t *)((_laddr) + (*(_loff_p) << 3)) = (uint64_t)_mbuf;                     \
+		*(_loff_p) = *(_loff_p) + 1;                                                       \
+		/* Mark meta mbuf as put */                                                        \
+		RTE_MEMPOOL_CHECK_COOKIES(_mbuf->pool, (void **)&_mbuf, 1, 0);                     \
+	} while (0)
+
+static __rte_always_inline uint16_t
+cn20k_nix_recv_pkts_vector(void *args, struct rte_mbuf **mbufs, uint16_t pkts, const uint16_t flags,
+			   void *lookup_mem, struct cnxk_timesync_info *tstamp, uintptr_t lmt_base,
+			   uint64_t meta_aura)
+{
+	struct cn20k_eth_rxq *rxq = args;
+	const uint64_t mbuf_initializer =
+		(flags & NIX_RX_VWQE_F) ? *(uint64_t *)args : rxq->mbuf_initializer;
+	const uint64x2_t data_off = flags & NIX_RX_VWQE_F ? vdupq_n_u64(RTE_PKTMBUF_HEADROOM) :
+							    vdupq_n_u64(rxq->data_off);
+	const uint32_t qmask = flags & NIX_RX_VWQE_F ? 0 : rxq->qmask;
+	const uint64_t wdata = flags & NIX_RX_VWQE_F ? 0 : rxq->wdata;
+	const uintptr_t desc = flags & NIX_RX_VWQE_F ? 0 : rxq->desc;
+	uint64x2_t cq0_w8, cq1_w8, cq2_w8, cq3_w8, mbuf01, mbuf23;
+	uintptr_t cpth0 = 0, cpth1 = 0, cpth2 = 0, cpth3 = 0;
+	uint64_t ol_flags0, ol_flags1, ol_flags2, ol_flags3;
+	uint64x2_t rearm0 = vdupq_n_u64(mbuf_initializer);
+	uint64x2_t rearm1 = vdupq_n_u64(mbuf_initializer);
+	uint64x2_t rearm2 = vdupq_n_u64(mbuf_initializer);
+	uint64x2_t rearm3 = vdupq_n_u64(mbuf_initializer);
+	struct rte_mbuf *mbuf0, *mbuf1, *mbuf2, *mbuf3;
+	uint8x16_t f0, f1, f2, f3;
+	uintptr_t sa_base = 0;
+	uint16_t packets = 0;
+	uint16_t pkts_left;
+	uint32_t head;
+	uintptr_t cq0;
+
+	(void)lmt_base;
+	(void)meta_aura;
+
+	if (!(flags & NIX_RX_VWQE_F)) {
+		lookup_mem = rxq->lookup_mem;
+		head = rxq->head;
+
+		pkts = nix_rx_nb_pkts(rxq, wdata, pkts, qmask);
+		pkts_left = pkts & (NIX_DESCS_PER_LOOP - 1);
+		/* Packets has to be floor-aligned to NIX_DESCS_PER_LOOP */
+		pkts = RTE_ALIGN_FLOOR(pkts, NIX_DESCS_PER_LOOP);
+		if (flags & NIX_RX_OFFLOAD_TSTAMP_F)
+			tstamp = rxq->tstamp;
+
+		cq0 = desc + CQE_SZ(head);
+		rte_prefetch0(CQE_PTR_OFF(cq0, 0, 64, flags));
+		rte_prefetch0(CQE_PTR_OFF(cq0, 1, 64, flags));
+		rte_prefetch0(CQE_PTR_OFF(cq0, 2, 64, flags));
+		rte_prefetch0(CQE_PTR_OFF(cq0, 3, 64, flags));
+	} else {
+		RTE_SET_USED(head);
+	}
+
+	while (packets < pkts) {
+		if (!(flags & NIX_RX_VWQE_F)) {
+			/* Exit loop if head is about to wrap and become
+			 * unaligned.
+			 */
+			if (((head + NIX_DESCS_PER_LOOP - 1) & qmask) < NIX_DESCS_PER_LOOP) {
+				pkts_left += (pkts - packets);
+				break;
+			}
+
+			cq0 = desc + CQE_SZ(head);
+		} else {
+			cq0 = (uintptr_t)&mbufs[packets];
+		}
+
+		if (flags & NIX_RX_VWQE_F) {
+			if (pkts - packets > 4) {
+				rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 4, 0, flags));
+				rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 5, 0, flags));
+				rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 6, 0, flags));
+				rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 7, 0, flags));
+
+				if (likely(pkts - packets > 8)) {
+					rte_prefetch1(CQE_PTR_OFF(cq0, 8, 0, flags));
+					rte_prefetch1(CQE_PTR_OFF(cq0, 9, 0, flags));
+					rte_prefetch1(CQE_PTR_OFF(cq0, 10, 0, flags));
+					rte_prefetch1(CQE_PTR_OFF(cq0, 11, 0, flags));
+					if (pkts - packets > 12) {
+						rte_prefetch1(CQE_PTR_OFF(cq0, 12, 0, flags));
+						rte_prefetch1(CQE_PTR_OFF(cq0, 13, 0, flags));
+						rte_prefetch1(CQE_PTR_OFF(cq0, 14, 0, flags));
+						rte_prefetch1(CQE_PTR_OFF(cq0, 15, 0, flags));
+					}
+				}
+
+				rte_prefetch0(CQE_PTR_DIFF(cq0, 4, RTE_PKTMBUF_HEADROOM, flags));
+				rte_prefetch0(CQE_PTR_DIFF(cq0, 5, RTE_PKTMBUF_HEADROOM, flags));
+				rte_prefetch0(CQE_PTR_DIFF(cq0, 6, RTE_PKTMBUF_HEADROOM, flags));
+				rte_prefetch0(CQE_PTR_DIFF(cq0, 7, RTE_PKTMBUF_HEADROOM, flags));
+
+				if (likely(pkts - packets > 8)) {
+					rte_prefetch0(
+						CQE_PTR_DIFF(cq0, 8, RTE_PKTMBUF_HEADROOM, flags));
+					rte_prefetch0(
+						CQE_PTR_DIFF(cq0, 9, RTE_PKTMBUF_HEADROOM, flags));
+					rte_prefetch0(
+						CQE_PTR_DIFF(cq0, 10, RTE_PKTMBUF_HEADROOM, flags));
+					rte_prefetch0(
+						CQE_PTR_DIFF(cq0, 11, RTE_PKTMBUF_HEADROOM, flags));
+				}
+			}
+		} else {
+			if (pkts - packets > 8) {
+				if (flags) {
+					rte_prefetch0(CQE_PTR_OFF(cq0, 8, 0, flags));
+					rte_prefetch0(CQE_PTR_OFF(cq0, 9, 0, flags));
+					rte_prefetch0(CQE_PTR_OFF(cq0, 10, 0, flags));
+					rte_prefetch0(CQE_PTR_OFF(cq0, 11, 0, flags));
+				}
+				rte_prefetch0(CQE_PTR_OFF(cq0, 8, 64, flags));
+				rte_prefetch0(CQE_PTR_OFF(cq0, 9, 64, flags));
+				rte_prefetch0(CQE_PTR_OFF(cq0, 10, 64, flags));
+				rte_prefetch0(CQE_PTR_OFF(cq0, 11, 64, flags));
+			}
+		}
+
+		if (!(flags & NIX_RX_VWQE_F)) {
+			/* Get NIX_RX_SG_S for size and buffer pointer */
+			cq0_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 0, 64, flags));
+			cq1_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 1, 64, flags));
+			cq2_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 2, 64, flags));
+			cq3_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 3, 64, flags));
+
+			/* Extract mbuf from NIX_RX_SG_S */
+			mbuf01 = vzip2q_u64(cq0_w8, cq1_w8);
+			mbuf23 = vzip2q_u64(cq2_w8, cq3_w8);
+			mbuf01 = vqsubq_u64(mbuf01, data_off);
+			mbuf23 = vqsubq_u64(mbuf23, data_off);
+		} else {
+			mbuf01 = vsubq_u64(vld1q_u64((uint64_t *)cq0),
+					   vdupq_n_u64(sizeof(struct rte_mbuf)));
+			mbuf23 = vsubq_u64(vld1q_u64((uint64_t *)(cq0 + 16)),
+					   vdupq_n_u64(sizeof(struct rte_mbuf)));
+		}
+
+		/* Move mbufs to scalar registers for future use */
+		mbuf0 = (struct rte_mbuf *)vgetq_lane_u64(mbuf01, 0);
+		mbuf1 = (struct rte_mbuf *)vgetq_lane_u64(mbuf01, 1);
+		mbuf2 = (struct rte_mbuf *)vgetq_lane_u64(mbuf23, 0);
+		mbuf3 = (struct rte_mbuf *)vgetq_lane_u64(mbuf23, 1);
+
+		/* Mark mempool obj as "get" as it is alloc'ed by NIX */
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf0->pool, (void **)&mbuf0, 1, 1);
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf1->pool, (void **)&mbuf1, 1, 1);
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf2->pool, (void **)&mbuf2, 1, 1);
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf3->pool, (void **)&mbuf3, 1, 1);
+
+		if (!(flags & NIX_RX_VWQE_F)) {
+			/* Mask to get packet len from NIX_RX_SG_S */
+			const uint8x16_t shuf_msk = {
+				0xFF, 0xFF, /* pkt_type set as unknown */
+				0xFF, 0xFF, /* pkt_type set as unknown */
+				0,    1,    /* octet 1~0, low 16 bits pkt_len */
+				0xFF, 0xFF, /* skip high 16it pkt_len, zero out */
+				0,    1,    /* octet 1~0, 16 bits data_len */
+				0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF};
+
+			/* Form the rx_descriptor_fields1 with pkt_len and data_len */
+			f0 = vqtbl1q_u8(cq0_w8, shuf_msk);
+			f1 = vqtbl1q_u8(cq1_w8, shuf_msk);
+			f2 = vqtbl1q_u8(cq2_w8, shuf_msk);
+			f3 = vqtbl1q_u8(cq3_w8, shuf_msk);
+		}
+
+		/* Load CQE word0 and word 1 */
+		const uint64_t cq0_w0 = *CQE_PTR_OFF(cq0, 0, 0, flags);
+		const uint64_t cq0_w1 = *CQE_PTR_OFF(cq0, 0, 8, flags);
+		const uint64_t cq0_w2 = *CQE_PTR_OFF(cq0, 0, 16, flags);
+		const uint64_t cq1_w0 = *CQE_PTR_OFF(cq0, 1, 0, flags);
+		const uint64_t cq1_w1 = *CQE_PTR_OFF(cq0, 1, 8, flags);
+		const uint64_t cq1_w2 = *CQE_PTR_OFF(cq0, 1, 16, flags);
+		const uint64_t cq2_w0 = *CQE_PTR_OFF(cq0, 2, 0, flags);
+		const uint64_t cq2_w1 = *CQE_PTR_OFF(cq0, 2, 8, flags);
+		const uint64_t cq2_w2 = *CQE_PTR_OFF(cq0, 2, 16, flags);
+		const uint64_t cq3_w0 = *CQE_PTR_OFF(cq0, 3, 0, flags);
+		const uint64_t cq3_w1 = *CQE_PTR_OFF(cq0, 3, 8, flags);
+		const uint64_t cq3_w2 = *CQE_PTR_OFF(cq0, 3, 16, flags);
+
+		if (flags & NIX_RX_VWQE_F) {
+			uint16_t psize0, psize1, psize2, psize3;
+
+			psize0 = (cq0_w2 & 0xFFFF) + 1;
+			psize1 = (cq1_w2 & 0xFFFF) + 1;
+			psize2 = (cq2_w2 & 0xFFFF) + 1;
+			psize3 = (cq3_w2 & 0xFFFF) + 1;
+
+			f0 = vdupq_n_u64(0);
+			f1 = vdupq_n_u64(0);
+			f2 = vdupq_n_u64(0);
+			f3 = vdupq_n_u64(0);
+
+			f0 = vsetq_lane_u16(psize0, f0, 2);
+			f0 = vsetq_lane_u16(psize0, f0, 4);
+
+			f1 = vsetq_lane_u16(psize1, f1, 2);
+			f1 = vsetq_lane_u16(psize1, f1, 4);
+
+			f2 = vsetq_lane_u16(psize2, f2, 2);
+			f2 = vsetq_lane_u16(psize2, f2, 4);
+
+			f3 = vsetq_lane_u16(psize3, f3, 2);
+			f3 = vsetq_lane_u16(psize3, f3, 4);
+		}
+
+		if (flags & NIX_RX_OFFLOAD_RSS_F) {
+			/* Fill rss in the rx_descriptor_fields1 */
+			f0 = vsetq_lane_u32(cq0_w0, f0, 3);
+			f1 = vsetq_lane_u32(cq1_w0, f1, 3);
+			f2 = vsetq_lane_u32(cq2_w0, f2, 3);
+			f3 = vsetq_lane_u32(cq3_w0, f3, 3);
+			ol_flags0 = RTE_MBUF_F_RX_RSS_HASH;
+			ol_flags1 = RTE_MBUF_F_RX_RSS_HASH;
+			ol_flags2 = RTE_MBUF_F_RX_RSS_HASH;
+			ol_flags3 = RTE_MBUF_F_RX_RSS_HASH;
+		} else {
+			ol_flags0 = 0;
+			ol_flags1 = 0;
+			ol_flags2 = 0;
+			ol_flags3 = 0;
+		}
+
+		if (flags & NIX_RX_OFFLOAD_PTYPE_F) {
+			/* Fill packet_type in the rx_descriptor_fields1 */
+			f0 = vsetq_lane_u32(nix_ptype_get(lookup_mem, cq0_w1), f0, 0);
+			f1 = vsetq_lane_u32(nix_ptype_get(lookup_mem, cq1_w1), f1, 0);
+			f2 = vsetq_lane_u32(nix_ptype_get(lookup_mem, cq2_w1), f2, 0);
+			f3 = vsetq_lane_u32(nix_ptype_get(lookup_mem, cq3_w1), f3, 0);
+		}
+
+		if (flags & NIX_RX_OFFLOAD_CHECKSUM_F) {
+			ol_flags0 |= (uint64_t)nix_rx_olflags_get(lookup_mem, cq0_w1);
+			ol_flags1 |= (uint64_t)nix_rx_olflags_get(lookup_mem, cq1_w1);
+			ol_flags2 |= (uint64_t)nix_rx_olflags_get(lookup_mem, cq2_w1);
+			ol_flags3 |= (uint64_t)nix_rx_olflags_get(lookup_mem, cq3_w1);
+		}
+
+		if (flags & NIX_RX_OFFLOAD_VLAN_STRIP_F) {
+			ol_flags0 = nix_vlan_update(cq0_w2, ol_flags0, &f0);
+			ol_flags1 = nix_vlan_update(cq1_w2, ol_flags1, &f1);
+			ol_flags2 = nix_vlan_update(cq2_w2, ol_flags2, &f2);
+			ol_flags3 = nix_vlan_update(cq3_w2, ol_flags3, &f3);
+
+			ol_flags0 = nix_qinq_update(cq0_w2, ol_flags0, mbuf0);
+			ol_flags1 = nix_qinq_update(cq1_w2, ol_flags1, mbuf1);
+			ol_flags2 = nix_qinq_update(cq2_w2, ol_flags2, mbuf2);
+			ol_flags3 = nix_qinq_update(cq3_w2, ol_flags3, mbuf3);
+		}
+
+		if (flags & NIX_RX_OFFLOAD_MARK_UPDATE_F) {
+			ol_flags0 = nix_update_match_id(*(uint16_t *)CQE_PTR_OFF(cq0, 0, 38, flags),
+							ol_flags0, mbuf0);
+			ol_flags1 = nix_update_match_id(*(uint16_t *)CQE_PTR_OFF(cq0, 1, 38, flags),
+							ol_flags1, mbuf1);
+			ol_flags2 = nix_update_match_id(*(uint16_t *)CQE_PTR_OFF(cq0, 2, 38, flags),
+							ol_flags2, mbuf2);
+			ol_flags3 = nix_update_match_id(*(uint16_t *)CQE_PTR_OFF(cq0, 3, 38, flags),
+							ol_flags3, mbuf3);
+		}
+
+		if ((flags & NIX_RX_OFFLOAD_TSTAMP_F) && ((flags & NIX_RX_VWQE_F) && tstamp)) {
+			const uint16x8_t len_off = {0,				 /* ptype   0:15 */
+						    0,				 /* ptype  16:32 */
+						    CNXK_NIX_TIMESYNC_RX_OFFSET, /* pktlen  0:15*/
+						    0,				 /* pktlen 16:32 */
+						    CNXK_NIX_TIMESYNC_RX_OFFSET, /* datalen 0:15 */
+						    0,
+						    0,
+						    0};
+			const uint32x4_t ptype = {
+				RTE_PTYPE_L2_ETHER_TIMESYNC, RTE_PTYPE_L2_ETHER_TIMESYNC,
+				RTE_PTYPE_L2_ETHER_TIMESYNC, RTE_PTYPE_L2_ETHER_TIMESYNC};
+			const uint64_t ts_olf = RTE_MBUF_F_RX_IEEE1588_PTP |
+						RTE_MBUF_F_RX_IEEE1588_TMST |
+						tstamp->rx_tstamp_dynflag;
+			const uint32x4_t and_mask = {0x1, 0x2, 0x4, 0x8};
+			uint64x2_t ts01, ts23, mask;
+			uint64_t ts[4];
+			uint8_t res;
+
+			/* Subtract timesync length from total pkt length. */
+			f0 = vsubq_u16(f0, len_off);
+			f1 = vsubq_u16(f1, len_off);
+			f2 = vsubq_u16(f2, len_off);
+			f3 = vsubq_u16(f3, len_off);
+
+			/* Get the address of actual timestamp. */
+			ts01 = vaddq_u64(mbuf01, data_off);
+			ts23 = vaddq_u64(mbuf23, data_off);
+			/* Load timestamp from address. */
+			ts01 = vsetq_lane_u64(*(uint64_t *)vgetq_lane_u64(ts01, 0), ts01, 0);
+			ts01 = vsetq_lane_u64(*(uint64_t *)vgetq_lane_u64(ts01, 1), ts01, 1);
+			ts23 = vsetq_lane_u64(*(uint64_t *)vgetq_lane_u64(ts23, 0), ts23, 0);
+			ts23 = vsetq_lane_u64(*(uint64_t *)vgetq_lane_u64(ts23, 1), ts23, 1);
+			/* Convert from be to cpu byteorder. */
+			ts01 = vrev64q_u8(ts01);
+			ts23 = vrev64q_u8(ts23);
+			/* Store timestamp into scalar for later use. */
+			ts[0] = vgetq_lane_u64(ts01, 0);
+			ts[1] = vgetq_lane_u64(ts01, 1);
+			ts[2] = vgetq_lane_u64(ts23, 0);
+			ts[3] = vgetq_lane_u64(ts23, 1);
+
+			/* Store timestamp into dynfield. */
+			*cnxk_nix_timestamp_dynfield(mbuf0, tstamp) = ts[0];
+			*cnxk_nix_timestamp_dynfield(mbuf1, tstamp) = ts[1];
+			*cnxk_nix_timestamp_dynfield(mbuf2, tstamp) = ts[2];
+			*cnxk_nix_timestamp_dynfield(mbuf3, tstamp) = ts[3];
+
+			/* Generate ptype mask to filter L2 ether timesync */
+			mask = vdupq_n_u32(vgetq_lane_u32(f0, 0));
+			mask = vsetq_lane_u32(vgetq_lane_u32(f1, 0), mask, 1);
+			mask = vsetq_lane_u32(vgetq_lane_u32(f2, 0), mask, 2);
+			mask = vsetq_lane_u32(vgetq_lane_u32(f3, 0), mask, 3);
+
+			/* Match against L2 ether timesync. */
+			mask = vceqq_u32(mask, ptype);
+			/* Convert from vector from scalar mask */
+			res = vaddvq_u32(vandq_u32(mask, and_mask));
+			res &= 0xF;
+
+			if (res) {
+				/* Fill in the ol_flags for any packets that
+				 * matched.
+				 */
+				ol_flags0 |= ((res & 0x1) ? ts_olf : 0);
+				ol_flags1 |= ((res & 0x2) ? ts_olf : 0);
+				ol_flags2 |= ((res & 0x4) ? ts_olf : 0);
+				ol_flags3 |= ((res & 0x8) ? ts_olf : 0);
+
+				/* Update Rxq timestamp with the latest
+				 * timestamp.
+				 */
+				tstamp->rx_ready = 1;
+				tstamp->rx_tstamp = ts[31 - rte_clz32(res)];
+			}
+		}
+
+		/* Form rearm_data with ol_flags */
+		rearm0 = vsetq_lane_u64(ol_flags0, rearm0, 1);
+		rearm1 = vsetq_lane_u64(ol_flags1, rearm1, 1);
+		rearm2 = vsetq_lane_u64(ol_flags2, rearm2, 1);
+		rearm3 = vsetq_lane_u64(ol_flags3, rearm3, 1);
+
+		/* Update rx_descriptor_fields1 */
+		vst1q_u64((uint64_t *)mbuf0->rx_descriptor_fields1, f0);
+		vst1q_u64((uint64_t *)mbuf1->rx_descriptor_fields1, f1);
+		vst1q_u64((uint64_t *)mbuf2->rx_descriptor_fields1, f2);
+		vst1q_u64((uint64_t *)mbuf3->rx_descriptor_fields1, f3);
+
+		/* Update rearm_data */
+		vst1q_u64((uint64_t *)mbuf0->rearm_data, rearm0);
+		vst1q_u64((uint64_t *)mbuf1->rearm_data, rearm1);
+		vst1q_u64((uint64_t *)mbuf2->rearm_data, rearm2);
+		vst1q_u64((uint64_t *)mbuf3->rearm_data, rearm3);
+
+		if (flags & NIX_RX_MULTI_SEG_F) {
+			/* Multi segment is enable build mseg list for
+			 * individual mbufs in scalar mode.
+			 */
+			nix_cqe_xtract_mseg((union nix_rx_parse_u *)(CQE_PTR_OFF(cq0, 0, 8, flags)),
+					    mbuf0, mbuf_initializer, cpth0, sa_base, flags);
+			nix_cqe_xtract_mseg((union nix_rx_parse_u *)(CQE_PTR_OFF(cq0, 1, 8, flags)),
+					    mbuf1, mbuf_initializer, cpth1, sa_base, flags);
+			nix_cqe_xtract_mseg((union nix_rx_parse_u *)(CQE_PTR_OFF(cq0, 2, 8, flags)),
+					    mbuf2, mbuf_initializer, cpth2, sa_base, flags);
+			nix_cqe_xtract_mseg((union nix_rx_parse_u *)(CQE_PTR_OFF(cq0, 3, 8, flags)),
+					    mbuf3, mbuf_initializer, cpth3, sa_base, flags);
+		}
+
+		/* Store the mbufs to rx_pkts */
+		vst1q_u64((uint64_t *)&mbufs[packets], mbuf01);
+		vst1q_u64((uint64_t *)&mbufs[packets + 2], mbuf23);
+
+		nix_mbuf_validate_next(mbuf0);
+		nix_mbuf_validate_next(mbuf1);
+		nix_mbuf_validate_next(mbuf2);
+		nix_mbuf_validate_next(mbuf3);
+
+		packets += NIX_DESCS_PER_LOOP;
+
+		if (!(flags & NIX_RX_VWQE_F)) {
+			/* Advance head pointer and packets */
+			head += NIX_DESCS_PER_LOOP;
+			head &= qmask;
+		}
+	}
+
+	if (flags & NIX_RX_VWQE_F)
+		return packets;
+
+	rxq->head = head;
+	rxq->available -= packets;
+
+	rte_io_wmb();
+	/* Free all the CQs that we've processed */
+	plt_write64((rxq->wdata | packets), rxq->cq_door);
+
+	if (unlikely(pkts_left))
+		packets += cn20k_nix_recv_pkts(args, &mbufs[packets], pkts_left, flags);
+
+	return packets;
+}
+
+#else
+
+static inline uint16_t
+cn20k_nix_recv_pkts_vector(void *args, struct rte_mbuf **mbufs, uint16_t pkts, const uint16_t flags,
+			   void *lookup_mem, struct cnxk_timesync_info *tstamp, uintptr_t lmt_base,
+			   uint64_t meta_aura)
+{
+	RTE_SET_USED(args);
+	RTE_SET_USED(mbufs);
+	RTE_SET_USED(pkts);
+	RTE_SET_USED(flags);
+	RTE_SET_USED(lookup_mem);
+	RTE_SET_USED(tstamp);
+	RTE_SET_USED(lmt_base);
+	RTE_SET_USED(meta_aura);
+
+	return 0;
+}
+
+#endif
+
 #define RSS_F	  NIX_RX_OFFLOAD_RSS_F
 #define PTYPE_F	  NIX_RX_OFFLOAD_PTYPE_F
 #define CKSUM_F	  NIX_RX_OFFLOAD_CHECKSUM_F
@@ -618,10 +1075,8 @@ NIX_RX_FASTPATH_MODES
 	uint16_t __rte_noinline __rte_hot fn(void *rx_queue, struct rte_mbuf **rx_pkts,            \
 					     uint16_t pkts)                                        \
 	{                                                                                          \
-		RTE_SET_USED(rx_queue);                                                            \
-		RTE_SET_USED(rx_pkts);                                                             \
-		RTE_SET_USED(pkts);                                                                \
-		return 0;                                                                          \
+		return cn20k_nix_recv_pkts_vector(rx_queue, rx_pkts, pkts, (flags), NULL, NULL, 0, \
+						  0);                                              \
 	}
 
 #define NIX_RX_RECV_VEC_MSEG(fn, flags) NIX_RX_RECV_VEC(fn, flags | NIX_RX_MULTI_SEG_F)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 23/33] net/cnxk: support Tx burst scalar for cn20k
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (21 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 22/33] net/cnxk: support Rx burst vector " Nithin Dabilpuram
@ 2024-09-10  8:58 ` Nithin Dabilpuram
  2024-09-10  8:59 ` [PATCH 24/33] net/cnxk: support Tx multi-seg in cn20k Nithin Dabilpuram
                   ` (12 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:58 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rahul Bhansali, Pavan Nikhilesh

Add scalar Tx support for cn20k.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 drivers/net/cnxk/cn20k_ethdev.c | 127 ++++
 drivers/net/cnxk/cn20k_tx.h     | 987 +++++++++++++++++++++++++++++++-
 2 files changed, 1110 insertions(+), 4 deletions(-)

diff --git a/drivers/net/cnxk/cn20k_ethdev.c b/drivers/net/cnxk/cn20k_ethdev.c
index cad7b1316a..011c5f8362 100644
--- a/drivers/net/cnxk/cn20k_ethdev.c
+++ b/drivers/net/cnxk/cn20k_ethdev.c
@@ -361,6 +361,10 @@ static int
 cn20k_nix_tx_queue_stop(struct rte_eth_dev *eth_dev, uint16_t qidx)
 {
 	struct cn20k_eth_txq *txq = eth_dev->data->tx_queues[qidx];
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	uint16_t flags = dev->tx_offload_flags;
+	struct roc_nix *nix = &dev->nix;
+	uint32_t head = 0, tail = 0;
 	int rc;
 
 	rc = cnxk_nix_tx_queue_stop(eth_dev, qidx);
@@ -370,6 +374,20 @@ cn20k_nix_tx_queue_stop(struct rte_eth_dev *eth_dev, uint16_t qidx)
 	/* Clear fc cache pkts to trigger worker stop */
 	txq->fc_cache_pkts = 0;
 
+	if ((flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) && txq->tx_compl.ena) {
+		struct roc_nix_sq *sq = &dev->sqs[qidx];
+		do {
+			handle_tx_completion_pkts(txq, flags & NIX_TX_VWQE_F);
+			/* Check if SQ is empty */
+			roc_nix_sq_head_tail_get(nix, sq->qid, &head, &tail);
+			if (head != tail)
+				continue;
+
+			/* Check if completion CQ is empty */
+			roc_nix_cq_head_tail_get(nix, sq->cqid, &head, &tail);
+		} while (head != tail);
+	}
+
 	return 0;
 }
 
@@ -690,6 +708,112 @@ cn20k_rx_descriptor_dump(const struct rte_eth_dev *eth_dev, uint16_t qid, uint16
 	return 0;
 }
 
+static int
+cn20k_nix_tm_mark_vlan_dei(struct rte_eth_dev *eth_dev, int mark_green, int mark_yellow,
+			   int mark_red, struct rte_tm_error *error)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct roc_nix *roc_nix = &dev->nix;
+	uint64_t mark_fmt, mark_flag;
+	int rc, i;
+
+	rc = cnxk_nix_tm_mark_vlan_dei(eth_dev, mark_green, mark_yellow, mark_red, error);
+
+	if (rc)
+		goto exit;
+
+	mark_fmt = roc_nix_tm_mark_format_get(roc_nix, &mark_flag);
+	if (mark_flag) {
+		dev->tx_offload_flags |= NIX_TX_OFFLOAD_VLAN_QINQ_F;
+		dev->tx_mark = true;
+	} else {
+		dev->tx_mark = false;
+		if (!(dev->tx_offloads & RTE_ETH_TX_OFFLOAD_VLAN_INSERT ||
+		      dev->tx_offloads & RTE_ETH_TX_OFFLOAD_QINQ_INSERT))
+			dev->tx_offload_flags &= ~NIX_TX_OFFLOAD_VLAN_QINQ_F;
+	}
+
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		struct cn20k_eth_txq *txq = eth_dev->data->tx_queues[i];
+
+		txq->mark_flag = mark_flag & CNXK_TM_MARK_MASK;
+		txq->mark_fmt = mark_fmt & CNXK_TX_MARK_FMT_MASK;
+	}
+	cn20k_eth_set_tx_function(eth_dev);
+exit:
+	return rc;
+}
+
+static int
+cn20k_nix_tm_mark_ip_ecn(struct rte_eth_dev *eth_dev, int mark_green, int mark_yellow, int mark_red,
+			 struct rte_tm_error *error)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct roc_nix *roc_nix = &dev->nix;
+	uint64_t mark_fmt, mark_flag;
+	int rc, i;
+
+	rc = cnxk_nix_tm_mark_ip_ecn(eth_dev, mark_green, mark_yellow, mark_red, error);
+	if (rc)
+		goto exit;
+
+	mark_fmt = roc_nix_tm_mark_format_get(roc_nix, &mark_flag);
+	if (mark_flag) {
+		dev->tx_offload_flags |= NIX_TX_OFFLOAD_VLAN_QINQ_F;
+		dev->tx_mark = true;
+	} else {
+		dev->tx_mark = false;
+		if (!(dev->tx_offloads & RTE_ETH_TX_OFFLOAD_VLAN_INSERT ||
+		      dev->tx_offloads & RTE_ETH_TX_OFFLOAD_QINQ_INSERT))
+			dev->tx_offload_flags &= ~NIX_TX_OFFLOAD_VLAN_QINQ_F;
+	}
+
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		struct cn20k_eth_txq *txq = eth_dev->data->tx_queues[i];
+
+		txq->mark_flag = mark_flag & CNXK_TM_MARK_MASK;
+		txq->mark_fmt = mark_fmt & CNXK_TX_MARK_FMT_MASK;
+	}
+	cn20k_eth_set_tx_function(eth_dev);
+exit:
+	return rc;
+}
+
+static int
+cn20k_nix_tm_mark_ip_dscp(struct rte_eth_dev *eth_dev, int mark_green, int mark_yellow,
+			  int mark_red, struct rte_tm_error *error)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct roc_nix *roc_nix = &dev->nix;
+	uint64_t mark_fmt, mark_flag;
+	int rc, i;
+
+	rc = cnxk_nix_tm_mark_ip_dscp(eth_dev, mark_green, mark_yellow, mark_red, error);
+	if (rc)
+		goto exit;
+
+	mark_fmt = roc_nix_tm_mark_format_get(roc_nix, &mark_flag);
+	if (mark_flag) {
+		dev->tx_offload_flags |= NIX_TX_OFFLOAD_VLAN_QINQ_F;
+		dev->tx_mark = true;
+	} else {
+		dev->tx_mark = false;
+		if (!(dev->tx_offloads & RTE_ETH_TX_OFFLOAD_VLAN_INSERT ||
+		      dev->tx_offloads & RTE_ETH_TX_OFFLOAD_QINQ_INSERT))
+			dev->tx_offload_flags &= ~NIX_TX_OFFLOAD_VLAN_QINQ_F;
+	}
+
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		struct cn20k_eth_txq *txq = eth_dev->data->tx_queues[i];
+
+		txq->mark_flag = mark_flag & CNXK_TM_MARK_MASK;
+		txq->mark_fmt = mark_fmt & CNXK_TX_MARK_FMT_MASK;
+	}
+	cn20k_eth_set_tx_function(eth_dev);
+exit:
+	return rc;
+}
+
 /* Update platform specific eth dev ops */
 static void
 nix_eth_dev_ops_override(void)
@@ -728,6 +852,9 @@ nix_tm_ops_override(void)
 	init_once = 1;
 
 	/* Update platform specific ops */
+	cnxk_tm_ops.mark_vlan_dei = cn20k_nix_tm_mark_vlan_dei;
+	cnxk_tm_ops.mark_ip_ecn = cn20k_nix_tm_mark_ip_ecn;
+	cnxk_tm_ops.mark_ip_dscp = cn20k_nix_tm_mark_ip_dscp;
 }
 
 static void
diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
index 9fd925ac34..610d64f21b 100644
--- a/drivers/net/cnxk/cn20k_tx.h
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -32,6 +32,984 @@
 #define NIX_TX_NEED_EXT_HDR                                                                        \
 	(NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSTAMP_F | NIX_TX_OFFLOAD_TSO_F)
 
+#define NIX_XMIT_FC_OR_RETURN(txq, pkts)                                                           \
+	do {                                                                                       \
+		int64_t avail;                                                                     \
+		/* Cached value is low, Update the fc_cache_pkts */                                \
+		if (unlikely((txq)->fc_cache_pkts < (pkts))) {                                     \
+			avail = txq->nb_sqb_bufs_adj - *txq->fc_mem;                               \
+			/* Multiply with sqe_per_sqb to express in pkts */                         \
+			(txq)->fc_cache_pkts = (avail << (txq)->sqes_per_sqb_log2) - avail;        \
+			/* Check it again for the room */                                          \
+			if (unlikely((txq)->fc_cache_pkts < (pkts)))                               \
+				return 0;                                                          \
+		}                                                                                  \
+	} while (0)
+
+#define NIX_XMIT_FC_OR_RETURN_MTS(txq, pkts)                                                       \
+	do {                                                                                       \
+		int64_t *fc_cache = &(txq)->fc_cache_pkts;                                         \
+		uint8_t retry_count = 8;                                                           \
+		int64_t val, newval;                                                               \
+	retry:                                                                                     \
+		/* Reduce the cached count */                                                      \
+		val = (int64_t)__atomic_fetch_sub(fc_cache, pkts, __ATOMIC_RELAXED);               \
+		val -= pkts;                                                                       \
+		/* Cached value is low, Update the fc_cache_pkts */                                \
+		if (unlikely(val < 0)) {                                                           \
+			/* Multiply with sqe_per_sqb to express in pkts */                         \
+			newval = txq->nb_sqb_bufs_adj -                                            \
+				 __atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED);                   \
+			newval = (newval << (txq)->sqes_per_sqb_log2) - newval;                    \
+			newval -= pkts;                                                            \
+			if (!__atomic_compare_exchange_n(fc_cache, &val, newval, false,            \
+							 __ATOMIC_RELAXED, __ATOMIC_RELAXED)) {    \
+				if (retry_count) {                                                 \
+					retry_count--;                                             \
+					goto retry;                                                \
+				} else                                                             \
+					return 0;                                                  \
+			}                                                                          \
+			/* Update and check it again for the room */                               \
+			if (unlikely(newval < 0))                                                  \
+				return 0;                                                          \
+		}                                                                                  \
+	} while (0)
+
+#define NIX_XMIT_FC_CHECK_RETURN(txq, pkts)                                                        \
+	do {                                                                                       \
+		if (unlikely((txq)->flag))                                                         \
+			NIX_XMIT_FC_OR_RETURN_MTS(txq, pkts);                                      \
+		else {                                                                             \
+			NIX_XMIT_FC_OR_RETURN(txq, pkts);                                          \
+			/* Reduce the cached count */                                              \
+			txq->fc_cache_pkts -= pkts;                                                \
+		}                                                                                  \
+	} while (0)
+
+/* Encoded number of segments to number of dwords macro, each value of nb_segs
+ * is encoded as 4bits.
+ */
+#define NIX_SEGDW_MAGIC 0x76654432210ULL
+
+#define NIX_NB_SEGS_TO_SEGDW(x) ((NIX_SEGDW_MAGIC >> ((x) << 2)) & 0xF)
+
+static __plt_always_inline uint8_t
+cn20k_nix_mbuf_sg_dwords(struct rte_mbuf *m)
+{
+	uint32_t nb_segs = m->nb_segs;
+	uint16_t aura0, aura;
+	int segw, sg_segs;
+
+	aura0 = roc_npa_aura_handle_to_aura(m->pool->pool_id);
+
+	nb_segs--;
+	segw = 2;
+	sg_segs = 1;
+	while (nb_segs) {
+		m = m->next;
+		aura = roc_npa_aura_handle_to_aura(m->pool->pool_id);
+		if (aura != aura0) {
+			segw += 2 + (sg_segs == 2);
+			sg_segs = 0;
+		} else {
+			segw += (sg_segs == 0); /* SUBDC */
+			segw += 1;		/* IOVA */
+			sg_segs += 1;
+			sg_segs %= 3;
+		}
+		nb_segs--;
+	}
+
+	return (segw + 1) / 2;
+}
+
+static __plt_always_inline void
+cn20k_nix_tx_mbuf_validate(struct rte_mbuf *m, const uint32_t flags)
+{
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+	uint16_t segdw;
+
+	segdw = cn20k_nix_mbuf_sg_dwords(m);
+	segdw += 1 + !!(flags & NIX_TX_NEED_EXT_HDR) + !!(flags & NIX_TX_OFFLOAD_TSTAMP_F);
+
+	PLT_ASSERT(segdw <= 8);
+#else
+	RTE_SET_USED(m);
+	RTE_SET_USED(flags);
+#endif
+}
+
+static __plt_always_inline void
+cn20k_nix_vwqe_wait_fc(struct cn20k_eth_txq *txq, uint16_t req)
+{
+	int64_t cached, refill;
+	int64_t pkts;
+
+retry:
+#ifdef RTE_ARCH_ARM64
+
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldxr %[pkts], [%[addr]]			\n"
+		     "		tbz %[pkts], 63, .Ldne%=		\n"
+		     "		sevl					\n"
+		     ".Lrty%=:	wfe					\n"
+		     "		ldxr %[pkts], [%[addr]]			\n"
+		     "		tbnz %[pkts], 63, .Lrty%=		\n"
+		     ".Ldne%=:						\n"
+		     : [pkts] "=&r"(pkts)
+		     : [addr] "r"(&txq->fc_cache_pkts)
+		     : "memory");
+#else
+	RTE_SET_USED(pkts);
+	while (__atomic_load_n(&txq->fc_cache_pkts, __ATOMIC_RELAXED) < 0)
+		;
+#endif
+	cached = __atomic_fetch_sub(&txq->fc_cache_pkts, req, __ATOMIC_ACQUIRE) - req;
+	/* Check if there is enough space, else update and retry. */
+	if (cached >= 0)
+		return;
+
+	/* Check if we have space else retry. */
+#ifdef RTE_ARCH_ARM64
+	int64_t val;
+
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldxr %[val], [%[addr]]			\n"
+		     "		sub %[val], %[adj], %[val]		\n"
+		     "		lsl %[refill], %[val], %[shft]		\n"
+		     "		sub %[refill], %[refill], %[val]	\n"
+		     "		sub %[refill], %[refill], %[sub]	\n"
+		     "		cmp %[refill], #0x0			\n"
+		     "		b.ge .Ldne%=				\n"
+		     "		sevl					\n"
+		     ".Lrty%=:	wfe					\n"
+		     "		ldxr %[val], [%[addr]]			\n"
+		     "		sub %[val], %[adj], %[val]		\n"
+		     "		lsl %[refill], %[val], %[shft]		\n"
+		     "		sub %[refill], %[refill], %[val]	\n"
+		     "		sub %[refill], %[refill], %[sub]	\n"
+		     "		cmp %[refill], #0x0			\n"
+		     "		b.lt .Lrty%=				\n"
+		     ".Ldne%=:						\n"
+		     : [refill] "=&r"(refill), [val] "=&r" (val)
+		     : [addr] "r"(txq->fc_mem), [adj] "r"(txq->nb_sqb_bufs_adj),
+		       [shft] "r"(txq->sqes_per_sqb_log2), [sub] "r"(req)
+		     : "memory");
+#else
+	do {
+		refill = (txq->nb_sqb_bufs_adj - __atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED));
+		refill = (refill << txq->sqes_per_sqb_log2) - refill;
+		refill -= req;
+	} while (refill < 0);
+#endif
+	if (!__atomic_compare_exchange(&txq->fc_cache_pkts, &cached, &refill, 0, __ATOMIC_RELEASE,
+				       __ATOMIC_RELAXED))
+		goto retry;
+}
+
+/* Function to determine no of tx subdesc required in case ext
+ * sub desc is enabled.
+ */
+static __rte_always_inline int
+cn20k_nix_tx_ext_subs(const uint16_t flags)
+{
+	return (flags & NIX_TX_OFFLOAD_TSTAMP_F) ?
+		       2 :
+		       ((flags & (NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSO_F)) ? 1 : 0);
+}
+
+static __rte_always_inline uint64_t
+cn20k_nix_tx_steor_data(const uint16_t flags)
+{
+	const uint64_t dw_m1 = cn20k_nix_tx_ext_subs(flags) + 1;
+	uint64_t data;
+
+	/* This will be moved to addr area */
+	data = dw_m1;
+	/* 15 vector sizes for single seg */
+	data |= dw_m1 << 19;
+	data |= dw_m1 << 22;
+	data |= dw_m1 << 25;
+	data |= dw_m1 << 28;
+	data |= dw_m1 << 31;
+	data |= dw_m1 << 34;
+	data |= dw_m1 << 37;
+	data |= dw_m1 << 40;
+	data |= dw_m1 << 43;
+	data |= dw_m1 << 46;
+	data |= dw_m1 << 49;
+	data |= dw_m1 << 52;
+	data |= dw_m1 << 55;
+	data |= dw_m1 << 58;
+	data |= dw_m1 << 61;
+
+	return data;
+}
+
+static __rte_always_inline void
+cn20k_nix_tx_skeleton(struct cn20k_eth_txq *txq, uint64_t *cmd, const uint16_t flags,
+		      const uint16_t static_sz)
+{
+	if (static_sz)
+		cmd[0] = txq->send_hdr_w0;
+	else
+		cmd[0] = (txq->send_hdr_w0 & 0xFFFFF00000000000) |
+			 ((uint64_t)(cn20k_nix_tx_ext_subs(flags) + 1) << 40);
+	cmd[1] = 0;
+
+	if (flags & NIX_TX_NEED_EXT_HDR) {
+		if (flags & NIX_TX_OFFLOAD_TSTAMP_F)
+			cmd[2] = (NIX_SUBDC_EXT << 60) | BIT_ULL(15);
+		else
+			cmd[2] = NIX_SUBDC_EXT << 60;
+		cmd[3] = 0;
+		if (!(flags & NIX_TX_OFFLOAD_MBUF_NOFF_F))
+			cmd[4] = (NIX_SUBDC_SG << 60) | (NIX_SENDLDTYPE_LDWB << 58) | BIT_ULL(48);
+		else
+			cmd[4] = (NIX_SUBDC_SG << 60) | BIT_ULL(48);
+	} else {
+		if (!(flags & NIX_TX_OFFLOAD_MBUF_NOFF_F))
+			cmd[2] = (NIX_SUBDC_SG << 60) | (NIX_SENDLDTYPE_LDWB << 58) | BIT_ULL(48);
+		else
+			cmd[2] = (NIX_SUBDC_SG << 60) | BIT_ULL(48);
+	}
+}
+
+static __rte_always_inline void
+cn20k_nix_sec_fc_wait(struct cn20k_eth_txq *txq, uint16_t nb_pkts)
+{
+	int32_t nb_desc, val, newval;
+	int32_t *fc_sw;
+	uint64_t *fc;
+
+	/* Check if there is any CPT instruction to submit */
+	if (!nb_pkts)
+		return;
+
+again:
+	fc_sw = txq->cpt_fc_sw;
+#ifdef RTE_ARCH_ARM64
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldxr %w[pkts], [%[addr]]		\n"
+		     "		tbz %w[pkts], 31, .Ldne%=		\n"
+		     "		sevl					\n"
+		     ".Lrty%=:	wfe					\n"
+		     "		ldxr %w[pkts], [%[addr]]		\n"
+		     "		tbnz %w[pkts], 31, .Lrty%=		\n"
+		     ".Ldne%=:						\n"
+		     : [pkts] "=&r"(val)
+		     : [addr] "r"(fc_sw)
+		     : "memory");
+#else
+	/* Wait for primary core to refill FC. */
+	while (__atomic_load_n(fc_sw, __ATOMIC_RELAXED) < 0)
+		;
+#endif
+
+	val = __atomic_fetch_sub(fc_sw, nb_pkts, __ATOMIC_ACQUIRE) - nb_pkts;
+	if (likely(val >= 0))
+		return;
+
+	nb_desc = txq->cpt_desc;
+	fc = txq->cpt_fc;
+#ifdef RTE_ARCH_ARM64
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldxr %[refill], [%[addr]]		\n"
+		     "		sub %[refill], %[desc], %[refill]	\n"
+		     "		sub %[refill], %[refill], %[pkts]	\n"
+		     "		cmp %[refill], #0x0			\n"
+		     "		b.ge .Ldne%=				\n"
+		     "		sevl					\n"
+		     ".Lrty%=:	wfe					\n"
+		     "		ldxr %[refill], [%[addr]]		\n"
+		     "		sub %[refill], %[desc], %[refill]	\n"
+		     "		sub %[refill], %[refill], %[pkts]	\n"
+		     "		cmp %[refill], #0x0			\n"
+		     "		b.lt .Lrty%=				\n"
+		     ".Ldne%=:						\n"
+		     : [refill] "=&r"(newval)
+		     : [addr] "r"(fc), [desc] "r"(nb_desc), [pkts] "r"(nb_pkts)
+		     : "memory");
+#else
+	while (true) {
+		newval = nb_desc - __atomic_load_n(fc, __ATOMIC_RELAXED);
+		newval -= nb_pkts;
+		if (newval >= 0)
+			break;
+	}
+#endif
+
+	if (!__atomic_compare_exchange_n(fc_sw, &val, newval, false, __ATOMIC_RELEASE,
+					 __ATOMIC_RELAXED))
+		goto again;
+}
+
+#if defined(RTE_ARCH_ARM64)
+
+static __rte_always_inline void
+cn20k_nix_prep_sec(struct rte_mbuf *m, uint64_t *cmd, uintptr_t *nixtx_addr, uintptr_t lbase,
+		   uint8_t *lnum, uint8_t *loff, uint8_t *shft, uint64_t sa_base,
+		   const uint16_t flags)
+{
+	struct cn20k_sec_sess_priv sess_priv;
+	uint32_t pkt_len, dlen_adj, rlen;
+	struct nix_send_hdr_s *send_hdr;
+	uint8_t l3l4type, chksum;
+	uint64x2_t cmd01, cmd23;
+	union nix_send_sg_s *sg;
+	uint8_t l2_len, l3_len;
+	uintptr_t dptr, nixtx;
+	uint64_t ucode_cmd[4];
+	uint64_t *laddr, w0;
+	uint16_t tag;
+	uint64_t sa;
+
+	/* Move to our line from base */
+	sess_priv.u64 = *rte_security_dynfield(m);
+	send_hdr = (struct nix_send_hdr_s *)cmd;
+	if (flags & NIX_TX_NEED_EXT_HDR)
+		sg = (union nix_send_sg_s *)&cmd[4];
+	else
+		sg = (union nix_send_sg_s *)&cmd[2];
+
+	if (flags & NIX_TX_NEED_SEND_HDR_W1) {
+		/* Extract l3l4type either from il3il4type or ol3ol4type */
+		if (flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F && flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) {
+			l2_len = (cmd[1] >> 16) & 0xFF;
+			/* L4 ptr from send hdr includes l2 and l3 len */
+			l3_len = ((cmd[1] >> 24) & 0xFF) - l2_len;
+			l3l4type = (cmd[1] >> 40) & 0xFF;
+		} else {
+			l2_len = cmd[1] & 0xFF;
+			/* L4 ptr from send hdr includes l2 and l3 len */
+			l3_len = ((cmd[1] >> 8) & 0xFF) - l2_len;
+			l3l4type = (cmd[1] >> 32) & 0xFF;
+		}
+
+		chksum = (l3l4type & 0x1) << 1 | !!(l3l4type & 0x30);
+		chksum = ~chksum;
+		sess_priv.chksum = sess_priv.chksum & chksum;
+		/* Clear SEND header flags */
+		cmd[1] &= ~(0xFFFFUL << 32);
+	} else {
+		l2_len = m->l2_len;
+		l3_len = m->l3_len;
+	}
+
+	/* Retrieve DPTR */
+	dptr = *(uint64_t *)(sg + 1);
+	pkt_len = send_hdr->w0.total;
+
+	/* Calculate dlen adj */
+	dlen_adj = pkt_len - l2_len;
+	/* Exclude l3 len from roundup for transport mode */
+	dlen_adj -= sess_priv.mode ? 0 : l3_len;
+	rlen = (dlen_adj + sess_priv.roundup_len) + (sess_priv.roundup_byte - 1);
+	rlen &= ~(uint64_t)(sess_priv.roundup_byte - 1);
+	rlen += sess_priv.partial_len;
+	dlen_adj = rlen - dlen_adj;
+
+	/* Update send descriptors. Security is single segment only */
+	send_hdr->w0.total = pkt_len + dlen_adj;
+
+	/* CPT word 5 and word 6 */
+	w0 = 0;
+	ucode_cmd[2] = 0;
+	if (flags & NIX_TX_MULTI_SEG_F && m->nb_segs > 1) {
+		struct rte_mbuf *last = rte_pktmbuf_lastseg(m);
+
+		/* Get area where NIX descriptor needs to be stored */
+		nixtx = rte_pktmbuf_mtod_offset(last, uintptr_t, last->data_len + dlen_adj);
+		nixtx += BIT_ULL(7);
+		nixtx = (nixtx - 1) & ~(BIT_ULL(7) - 1);
+		nixtx += 16;
+
+		dptr = nixtx + ((flags & NIX_TX_NEED_EXT_HDR) ? 32 : 16);
+
+		/* Set l2 length as data offset */
+		w0 = (uint64_t)l2_len << 16;
+		w0 |= cn20k_nix_tx_ext_subs(flags) + NIX_NB_SEGS_TO_SEGDW(m->nb_segs);
+		ucode_cmd[1] = dptr | ((uint64_t)m->nb_segs << 60);
+	} else {
+		/* Get area where NIX descriptor needs to be stored */
+		nixtx = dptr + pkt_len + dlen_adj;
+		nixtx += BIT_ULL(7);
+		nixtx = (nixtx - 1) & ~(BIT_ULL(7) - 1);
+		nixtx += 16;
+
+		w0 |= cn20k_nix_tx_ext_subs(flags) + 1ULL;
+		dptr += l2_len;
+		ucode_cmd[1] = dptr;
+		sg->seg1_size = pkt_len + dlen_adj;
+		pkt_len -= l2_len;
+	}
+	w0 |= ((((int64_t)nixtx - (int64_t)dptr) & 0xFFFFF) << 32);
+	/* CPT word 0 and 1 */
+	cmd01 = vdupq_n_u64(0);
+	cmd01 = vsetq_lane_u64(w0, cmd01, 0);
+	/* CPT_RES_S is 16B above NIXTX */
+	cmd01 = vsetq_lane_u64(nixtx - 16, cmd01, 1);
+
+	/* Return nixtx addr */
+	*nixtx_addr = nixtx;
+
+	/* CPT Word 4 and Word 7 */
+	tag = sa_base & 0xFFFFUL;
+	sa_base &= ~0xFFFFUL;
+	sa = (uintptr_t)roc_nix_inl_ot_ipsec_outb_sa(sa_base, sess_priv.sa_idx);
+	ucode_cmd[3] = (ROC_CPT_DFLT_ENG_GRP_SE_IE << 61 | 1UL << 60 | sa);
+	ucode_cmd[0] = (ROC_IE_OT_MAJOR_OP_PROCESS_OUTBOUND_IPSEC << 48 | 1UL << 54 |
+			((uint64_t)sess_priv.chksum) << 32 | ((uint64_t)sess_priv.dec_ttl) << 34 |
+			pkt_len);
+
+	/* CPT word 2 and 3 */
+	cmd23 = vdupq_n_u64(0);
+	cmd23 = vsetq_lane_u64(
+		(((uint64_t)RTE_EVENT_TYPE_CPU << 28) | tag | CNXK_ETHDEV_SEC_OUTB_EV_SUB << 20),
+		cmd23, 0);
+	cmd23 = vsetq_lane_u64((uintptr_t)m | 1, cmd23, 1);
+
+	/* Move to our line */
+	laddr = LMT_OFF(lbase, *lnum, *loff ? 64 : 0);
+
+	/* Write CPT instruction to lmt line */
+	vst1q_u64(laddr, cmd01);
+	vst1q_u64((laddr + 2), cmd23);
+
+	*(__uint128_t *)(laddr + 4) = *(__uint128_t *)ucode_cmd;
+	*(__uint128_t *)(laddr + 6) = *(__uint128_t *)(ucode_cmd + 2);
+
+	/* Move to next line for every other CPT inst */
+	*loff = !(*loff);
+	*lnum = *lnum + (*loff ? 0 : 1);
+	*shft = *shft + (*loff ? 0 : 3);
+}
+
+#else
+
+static __rte_always_inline void
+cn20k_nix_prep_sec(struct rte_mbuf *m, uint64_t *cmd, uintptr_t *nixtx_addr, uintptr_t lbase,
+		   uint8_t *lnum, uint8_t *loff, uint8_t *shft, uint64_t sa_base,
+		   const uint16_t flags)
+{
+	RTE_SET_USED(m);
+	RTE_SET_USED(cmd);
+	RTE_SET_USED(nixtx_addr);
+	RTE_SET_USED(lbase);
+	RTE_SET_USED(lnum);
+	RTE_SET_USED(loff);
+	RTE_SET_USED(shft);
+	RTE_SET_USED(sa_base);
+	RTE_SET_USED(flags);
+}
+#endif
+
+static inline void
+cn20k_nix_free_extmbuf(struct rte_mbuf *m)
+{
+	struct rte_mbuf *m_next;
+	while (m != NULL) {
+		m_next = m->next;
+		rte_pktmbuf_free_seg(m);
+		m = m_next;
+	}
+}
+
+static __rte_always_inline uint64_t
+cn20k_nix_prefree_seg(struct rte_mbuf *m, struct rte_mbuf **extm, struct cn20k_eth_txq *txq,
+		      struct nix_send_hdr_s *send_hdr, uint64_t *aura)
+{
+	struct rte_mbuf *prev = NULL;
+	uint32_t sqe_id;
+
+	if (RTE_MBUF_HAS_EXTBUF(m)) {
+		if (unlikely(txq->tx_compl.ena == 0)) {
+			m->next = *extm;
+			*extm = m;
+			return 1;
+		}
+		if (send_hdr->w0.pnc) {
+			sqe_id = send_hdr->w1.sqe_id;
+			prev = txq->tx_compl.ptr[sqe_id];
+			m->next = prev;
+			txq->tx_compl.ptr[sqe_id] = m;
+		} else {
+			sqe_id = __atomic_fetch_add(&txq->tx_compl.sqe_id, 1, __ATOMIC_RELAXED);
+			send_hdr->w0.pnc = 1;
+			send_hdr->w1.sqe_id = sqe_id & txq->tx_compl.nb_desc_mask;
+			txq->tx_compl.ptr[send_hdr->w1.sqe_id] = m;
+		}
+		return 1;
+	} else {
+		return cnxk_nix_prefree_seg(m, aura);
+	}
+}
+
+static __rte_always_inline void
+cn20k_nix_xmit_prepare_tso(struct rte_mbuf *m, const uint64_t flags)
+{
+	uint64_t mask, ol_flags = m->ol_flags;
+
+	if (flags & NIX_TX_OFFLOAD_TSO_F && (ol_flags & RTE_MBUF_F_TX_TCP_SEG)) {
+		uintptr_t mdata = rte_pktmbuf_mtod(m, uintptr_t);
+		uint16_t *iplen, *oiplen, *oudplen;
+		uint16_t lso_sb, paylen;
+
+		mask = -!!(ol_flags & (RTE_MBUF_F_TX_OUTER_IPV4 | RTE_MBUF_F_TX_OUTER_IPV6));
+		lso_sb = (mask & (m->outer_l2_len + m->outer_l3_len)) + m->l2_len + m->l3_len +
+			 m->l4_len;
+
+		/* Reduce payload len from base headers */
+		paylen = m->pkt_len - lso_sb;
+
+		/* Get iplen position assuming no tunnel hdr */
+		iplen = (uint16_t *)(mdata + m->l2_len + (2 << !!(ol_flags & RTE_MBUF_F_TX_IPV6)));
+		/* Handle tunnel tso */
+		if ((flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) &&
+		    (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)) {
+			const uint8_t is_udp_tun =
+				(CNXK_NIX_UDP_TUN_BITMASK >>
+				 ((ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) >> 45)) &
+				0x1;
+
+			oiplen = (uint16_t *)(mdata + m->outer_l2_len +
+					      (2 << !!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV6)));
+			*oiplen = rte_cpu_to_be_16(rte_be_to_cpu_16(*oiplen) - paylen);
+
+			/* Update format for UDP tunneled packet */
+			if (is_udp_tun) {
+				oudplen =
+					(uint16_t *)(mdata + m->outer_l2_len + m->outer_l3_len + 4);
+				*oudplen = rte_cpu_to_be_16(rte_be_to_cpu_16(*oudplen) - paylen);
+			}
+
+			/* Update iplen position to inner ip hdr */
+			iplen = (uint16_t *)(mdata + lso_sb - m->l3_len - m->l4_len +
+					     (2 << !!(ol_flags & RTE_MBUF_F_TX_IPV6)));
+		}
+
+		*iplen = rte_cpu_to_be_16(rte_be_to_cpu_16(*iplen) - paylen);
+	}
+}
+
+static __rte_always_inline void
+cn20k_nix_xmit_prepare(struct cn20k_eth_txq *txq, struct rte_mbuf *m, struct rte_mbuf **extm,
+		       uint64_t *cmd, const uint16_t flags, const uint64_t lso_tun_fmt, bool *sec,
+		       uint8_t mark_flag, uint64_t mark_fmt)
+{
+	uint8_t mark_off = 0, mark_vlan = 0, markptr = 0;
+	struct nix_send_ext_s *send_hdr_ext;
+	struct nix_send_hdr_s *send_hdr;
+	uint64_t ol_flags = 0, mask;
+	union nix_send_hdr_w1_u w1;
+	union nix_send_sg_s *sg;
+	uint16_t mark_form = 0;
+
+	send_hdr = (struct nix_send_hdr_s *)cmd;
+	if (flags & NIX_TX_NEED_EXT_HDR) {
+		send_hdr_ext = (struct nix_send_ext_s *)(cmd + 2);
+		sg = (union nix_send_sg_s *)(cmd + 4);
+		/* Clear previous markings */
+		send_hdr_ext->w0.lso = 0;
+		send_hdr_ext->w0.mark_en = 0;
+		send_hdr_ext->w1.u = 0;
+		ol_flags = m->ol_flags;
+	} else {
+		sg = (union nix_send_sg_s *)(cmd + 2);
+	}
+
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F)
+		send_hdr->w0.pnc = 0;
+
+	if (flags & (NIX_TX_NEED_SEND_HDR_W1 | NIX_TX_OFFLOAD_SECURITY_F)) {
+		ol_flags = m->ol_flags;
+		w1.u = 0;
+	}
+
+	if (!(flags & NIX_TX_MULTI_SEG_F))
+		send_hdr->w0.total = m->data_len;
+	else
+		send_hdr->w0.total = m->pkt_len;
+	send_hdr->w0.aura = roc_npa_aura_handle_to_aura(m->pool->pool_id);
+
+	/*
+	 * L3type:  2 => IPV4
+	 *          3 => IPV4 with csum
+	 *          4 => IPV6
+	 * L3type and L3ptr needs to be set for either
+	 * L3 csum or L4 csum or LSO
+	 *
+	 */
+
+	if ((flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) && (flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F)) {
+		const uint8_t csum = !!(ol_flags & RTE_MBUF_F_TX_OUTER_UDP_CKSUM);
+		const uint8_t ol3type = ((!!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV4)) << 1) +
+					((!!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV6)) << 2) +
+					!!(ol_flags & RTE_MBUF_F_TX_OUTER_IP_CKSUM);
+
+		/* Outer L3 */
+		w1.ol3type = ol3type;
+		mask = 0xffffull << ((!!ol3type) << 4);
+		w1.ol3ptr = ~mask & m->outer_l2_len;
+		w1.ol4ptr = ~mask & (w1.ol3ptr + m->outer_l3_len);
+
+		/* Outer L4 */
+		w1.ol4type = csum + (csum << 1);
+
+		/* Inner L3 */
+		w1.il3type = ((!!(ol_flags & RTE_MBUF_F_TX_IPV4)) << 1) +
+			     ((!!(ol_flags & RTE_MBUF_F_TX_IPV6)) << 2);
+		w1.il3ptr = w1.ol4ptr + m->l2_len;
+		w1.il4ptr = w1.il3ptr + m->l3_len;
+		/* Increment it by 1 if it is IPV4 as 3 is with csum */
+		w1.il3type = w1.il3type + !!(ol_flags & RTE_MBUF_F_TX_IP_CKSUM);
+
+		/* Inner L4 */
+		w1.il4type = (ol_flags & RTE_MBUF_F_TX_L4_MASK) >> 52;
+
+		/* In case of no tunnel header use only
+		 * shift IL3/IL4 fields a bit to use
+		 * OL3/OL4 for header checksum
+		 */
+		mask = !ol3type;
+		w1.u = ((w1.u & 0xFFFFFFFF00000000) >> (mask << 3)) |
+		       ((w1.u & 0X00000000FFFFFFFF) >> (mask << 4));
+
+	} else if (flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) {
+		const uint8_t csum = !!(ol_flags & RTE_MBUF_F_TX_OUTER_UDP_CKSUM);
+		const uint8_t outer_l2_len = m->outer_l2_len;
+
+		/* Outer L3 */
+		w1.ol3ptr = outer_l2_len;
+		w1.ol4ptr = outer_l2_len + m->outer_l3_len;
+		/* Increment it by 1 if it is IPV4 as 3 is with csum */
+		w1.ol3type = ((!!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV4)) << 1) +
+			     ((!!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV6)) << 2) +
+			     !!(ol_flags & RTE_MBUF_F_TX_OUTER_IP_CKSUM);
+
+		/* Outer L4 */
+		w1.ol4type = csum + (csum << 1);
+
+	} else if (flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F) {
+		const uint8_t l2_len = m->l2_len;
+
+		/* Always use OLXPTR and OLXTYPE when only
+		 * when one header is present
+		 */
+
+		/* Inner L3 */
+		w1.ol3ptr = l2_len;
+		w1.ol4ptr = l2_len + m->l3_len;
+		/* Increment it by 1 if it is IPV4 as 3 is with csum */
+		w1.ol3type = ((!!(ol_flags & RTE_MBUF_F_TX_IPV4)) << 1) +
+			     ((!!(ol_flags & RTE_MBUF_F_TX_IPV6)) << 2) +
+			     !!(ol_flags & RTE_MBUF_F_TX_IP_CKSUM);
+
+		/* Inner L4 */
+		w1.ol4type = (ol_flags & RTE_MBUF_F_TX_L4_MASK) >> 52;
+	}
+
+	if (flags & NIX_TX_NEED_EXT_HDR && flags & NIX_TX_OFFLOAD_VLAN_QINQ_F) {
+		const uint8_t ipv6 = !!(ol_flags & RTE_MBUF_F_TX_IPV6);
+		const uint8_t ip = !!(ol_flags & (RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IPV6));
+
+		send_hdr_ext->w1.vlan1_ins_ena = !!(ol_flags & RTE_MBUF_F_TX_VLAN);
+		/* HW will update ptr after vlan0 update */
+		send_hdr_ext->w1.vlan1_ins_ptr = 12;
+		send_hdr_ext->w1.vlan1_ins_tci = m->vlan_tci;
+
+		send_hdr_ext->w1.vlan0_ins_ena = !!(ol_flags & RTE_MBUF_F_TX_QINQ);
+		/* 2B before end of l2 header */
+		send_hdr_ext->w1.vlan0_ins_ptr = 12;
+		send_hdr_ext->w1.vlan0_ins_tci = m->vlan_tci_outer;
+		/* Fill for VLAN marking only when VLAN insertion enabled */
+		mark_vlan = ((mark_flag & CNXK_TM_MARK_VLAN_DEI) &
+			     (send_hdr_ext->w1.vlan1_ins_ena || send_hdr_ext->w1.vlan0_ins_ena));
+
+		/* Mask requested flags with packet data information */
+		mark_off = mark_flag & ((ip << 2) | (ip << 1) | mark_vlan);
+		mark_off = ffs(mark_off & CNXK_TM_MARK_MASK);
+
+		mark_form = (mark_fmt >> ((mark_off - !!mark_off) << 4));
+		mark_form = (mark_form >> (ipv6 << 3)) & 0xFF;
+		markptr = m->l2_len + (mark_form >> 7) - (mark_vlan << 2);
+
+		send_hdr_ext->w0.mark_en = !!mark_off;
+		send_hdr_ext->w0.markform = mark_form & 0x7F;
+		send_hdr_ext->w0.markptr = markptr;
+	}
+
+	if (flags & NIX_TX_NEED_EXT_HDR && flags & NIX_TX_OFFLOAD_TSO_F &&
+	    (ol_flags & RTE_MBUF_F_TX_TCP_SEG)) {
+		uint16_t lso_sb;
+		uint64_t mask;
+
+		mask = -(!w1.il3type);
+		lso_sb = (mask & w1.ol4ptr) + (~mask & w1.il4ptr) + m->l4_len;
+
+		send_hdr_ext->w0.lso_sb = lso_sb;
+		send_hdr_ext->w0.lso = 1;
+		send_hdr_ext->w0.lso_mps = m->tso_segsz;
+		send_hdr_ext->w0.lso_format =
+			NIX_LSO_FORMAT_IDX_TSOV4 + !!(ol_flags & RTE_MBUF_F_TX_IPV6);
+		w1.ol4type = NIX_SENDL4TYPE_TCP_CKSUM;
+
+		/* Handle tunnel tso */
+		if ((flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) &&
+		    (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)) {
+			const uint8_t is_udp_tun =
+				(CNXK_NIX_UDP_TUN_BITMASK >>
+				 ((ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) >> 45)) &
+				0x1;
+			uint8_t shift = is_udp_tun ? 32 : 0;
+
+			shift += (!!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV6) << 4);
+			shift += (!!(ol_flags & RTE_MBUF_F_TX_IPV6) << 3);
+
+			w1.il4type = NIX_SENDL4TYPE_TCP_CKSUM;
+			w1.ol4type = is_udp_tun ? NIX_SENDL4TYPE_UDP_CKSUM : 0;
+			/* Update format for UDP tunneled packet */
+			send_hdr_ext->w0.lso_format = (lso_tun_fmt >> shift);
+		}
+	}
+
+	if (flags & NIX_TX_NEED_SEND_HDR_W1)
+		send_hdr->w1.u = w1.u;
+
+	if (!(flags & NIX_TX_MULTI_SEG_F)) {
+		struct rte_mbuf *cookie;
+
+		sg->seg1_size = send_hdr->w0.total;
+		*(rte_iova_t *)(sg + 1) = rte_mbuf_data_iova(m);
+		cookie = RTE_MBUF_DIRECT(m) ? m : rte_mbuf_from_indirect(m);
+
+		if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) {
+			uint64_t aura;
+
+			/* DF bit = 1 if refcount of current mbuf or parent mbuf
+			 *		is greater than 1
+			 * DF bit = 0 otherwise
+			 */
+			aura = send_hdr->w0.aura;
+			send_hdr->w0.df = cn20k_nix_prefree_seg(m, extm, txq, send_hdr, &aura);
+			send_hdr->w0.aura = aura;
+		}
+#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
+		/* Mark mempool object as "put" since it is freed by NIX */
+		if (!send_hdr->w0.df)
+			RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+#else
+		RTE_SET_USED(cookie);
+#endif
+	} else {
+		sg->seg1_size = m->data_len;
+		*(rte_iova_t *)(sg + 1) = rte_mbuf_data_iova(m);
+
+		/* NOFF is handled later for multi-seg */
+	}
+
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+		*sec = !!(ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD);
+}
+
+static __rte_always_inline void
+cn20k_nix_xmit_mv_lmt_base(uintptr_t lmt_addr, uint64_t *cmd, const uint16_t flags)
+{
+	struct nix_send_ext_s *send_hdr_ext;
+	union nix_send_sg_s *sg;
+
+	/* With minimal offloads, 'cmd' being local could be optimized out to
+	 * registers. In other cases, 'cmd' will be in stack. Intent is
+	 * 'cmd' stores content from txq->cmd which is copied only once.
+	 */
+	*((struct nix_send_hdr_s *)lmt_addr) = *(struct nix_send_hdr_s *)cmd;
+	lmt_addr += 16;
+	if (flags & NIX_TX_NEED_EXT_HDR) {
+		send_hdr_ext = (struct nix_send_ext_s *)(cmd + 2);
+		*((struct nix_send_ext_s *)lmt_addr) = *send_hdr_ext;
+		lmt_addr += 16;
+
+		sg = (union nix_send_sg_s *)(cmd + 4);
+	} else {
+		sg = (union nix_send_sg_s *)(cmd + 2);
+	}
+	/* In case of multi-seg, sg template is stored here */
+	*((union nix_send_sg_s *)lmt_addr) = *sg;
+	*(rte_iova_t *)(lmt_addr + 8) = *(rte_iova_t *)(sg + 1);
+}
+
+static __rte_always_inline void
+cn20k_nix_xmit_prepare_tstamp(struct cn20k_eth_txq *txq, uintptr_t lmt_addr,
+			      const uint64_t ol_flags, const uint16_t no_segdw,
+			      const uint16_t flags)
+{
+	if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+		const uint8_t is_ol_tstamp = !(ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST);
+		uint64_t *lmt = (uint64_t *)lmt_addr;
+		uint16_t off = (no_segdw - 1) << 1;
+		struct nix_send_mem_s *send_mem;
+
+		send_mem = (struct nix_send_mem_s *)(lmt + off);
+		/* Packets for which PKT_TX_IEEE1588_TMST is not set, tx tstamp
+		 * should not be recorded, hence changing the alg type to
+		 * NIX_SENDMEMALG_SUB and also changing send mem addr field to
+		 * next 8 bytes as it corrupts the actual Tx tstamp registered
+		 * address.
+		 */
+		send_mem->w0.subdc = NIX_SUBDC_MEM;
+		send_mem->w0.alg = NIX_SENDMEMALG_SETTSTMP + (is_ol_tstamp << 3);
+		send_mem->addr = (rte_iova_t)(((uint64_t *)txq->ts_mem) + is_ol_tstamp);
+	}
+}
+
+static __rte_always_inline uint16_t
+cn20k_nix_xmit_pkts(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, uint16_t pkts,
+		    uint64_t *cmd, const uint16_t flags)
+{
+	struct cn20k_eth_txq *txq = tx_queue;
+	const rte_iova_t io_addr = txq->io_addr;
+	uint8_t lnum, c_lnum, c_shft, c_loff;
+	uintptr_t pa, lbase = txq->lmt_base;
+	uint16_t lmt_id, burst, left, i;
+	struct rte_mbuf *extm = NULL;
+	uintptr_t c_lbase = lbase;
+	uint64_t lso_tun_fmt = 0;
+	uint64_t mark_fmt = 0;
+	uint8_t mark_flag = 0;
+	rte_iova_t c_io_addr;
+	uint16_t c_lmt_id;
+	uint64_t sa_base;
+	uintptr_t laddr;
+	uint64_t data;
+	bool sec;
+
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && txq->tx_compl.ena)
+		handle_tx_completion_pkts(txq, flags & NIX_TX_VWQE_F);
+
+	if (!(flags & NIX_TX_VWQE_F))
+		NIX_XMIT_FC_CHECK_RETURN(txq, pkts);
+
+	/* Get cmd skeleton */
+	cn20k_nix_tx_skeleton(txq, cmd, flags, !(flags & NIX_TX_VWQE_F));
+
+	if (flags & NIX_TX_OFFLOAD_TSO_F)
+		lso_tun_fmt = txq->lso_tun_fmt;
+
+	if (flags & NIX_TX_OFFLOAD_VLAN_QINQ_F) {
+		mark_fmt = txq->mark_fmt;
+		mark_flag = txq->mark_flag;
+	}
+
+	/* Get LMT base address and LMT ID as lcore id */
+	ROC_LMT_BASE_ID_GET(lbase, lmt_id);
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		ROC_LMT_CPT_BASE_ID_GET(c_lbase, c_lmt_id);
+		c_io_addr = txq->cpt_io_addr;
+		sa_base = txq->sa_base;
+	}
+
+	left = pkts;
+again:
+	burst = left > 32 ? 32 : left;
+
+	lnum = 0;
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		c_lnum = 0;
+		c_loff = 0;
+		c_shft = 16;
+	}
+
+	for (i = 0; i < burst; i++) {
+		/* Perform header writes for TSO, barrier at
+		 * lmt steorl will suffice.
+		 */
+		if (flags & NIX_TX_OFFLOAD_TSO_F)
+			cn20k_nix_xmit_prepare_tso(tx_pkts[i], flags);
+
+		cn20k_nix_xmit_prepare(txq, tx_pkts[i], &extm, cmd, flags, lso_tun_fmt, &sec,
+				       mark_flag, mark_fmt);
+
+		laddr = (uintptr_t)LMT_OFF(lbase, lnum, 0);
+
+		/* Prepare CPT instruction and get nixtx addr */
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F && sec)
+			cn20k_nix_prep_sec(tx_pkts[i], cmd, &laddr, c_lbase, &c_lnum, &c_loff,
+					   &c_shft, sa_base, flags);
+
+		/* Move NIX desc to LMT/NIXTX area */
+		cn20k_nix_xmit_mv_lmt_base(laddr, cmd, flags);
+		cn20k_nix_xmit_prepare_tstamp(txq, laddr, tx_pkts[i]->ol_flags, 4, flags);
+		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F) || !sec)
+			lnum++;
+	}
+
+	if ((flags & NIX_TX_VWQE_F) && !(ws[3] & BIT_ULL(35)))
+		ws[3] = roc_sso_hws_head_wait(ws[0]);
+
+	left -= burst;
+	tx_pkts += burst;
+
+	/* Submit CPT instructions if any */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		uint16_t sec_pkts = ((c_lnum << 1) + c_loff);
+
+		/* Reduce pkts to be sent to CPT */
+		burst -= sec_pkts;
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, sec_pkts);
+		cn20k_nix_sec_fc_wait(txq, sec_pkts);
+		cn20k_nix_sec_steorl(c_io_addr, c_lmt_id, c_lnum, c_loff, c_shft);
+	}
+
+	/* Trigger LMTST */
+	if (burst > 16) {
+		data = cn20k_nix_tx_steor_data(flags);
+		pa = io_addr | (data & 0x7) << 4;
+		data &= ~0x7ULL;
+		data |= (15ULL << 12);
+		data |= (uint64_t)lmt_id;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, 16);
+		/* STEOR0 */
+		roc_lmt_submit_steorl(data, pa);
+
+		data = cn20k_nix_tx_steor_data(flags);
+		pa = io_addr | (data & 0x7) << 4;
+		data &= ~0x7ULL;
+		data |= ((uint64_t)(burst - 17)) << 12;
+		data |= (uint64_t)(lmt_id + 16);
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, burst - 16);
+		/* STEOR1 */
+		roc_lmt_submit_steorl(data, pa);
+	} else if (burst) {
+		data = cn20k_nix_tx_steor_data(flags);
+		pa = io_addr | (data & 0x7) << 4;
+		data &= ~0x7ULL;
+		data |= ((uint64_t)(burst - 1)) << 12;
+		data |= (uint64_t)lmt_id;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, burst);
+		/* STEOR0 */
+		roc_lmt_submit_steorl(data, pa);
+	}
+
+	rte_io_wmb();
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && !txq->tx_compl.ena) {
+		cn20k_nix_free_extmbuf(extm);
+		extm = NULL;
+	}
+
+	if (left)
+		goto again;
+
+	return pkts;
+}
+
 #define L3L4CSUM_F   NIX_TX_OFFLOAD_L3_L4_CSUM_F
 #define OL3OL4CSUM_F NIX_TX_OFFLOAD_OL3_OL4_CSUM_F
 #define VLAN_F	     NIX_TX_OFFLOAD_VLAN_QINQ_F
@@ -225,10 +1203,11 @@ NIX_TX_FASTPATH_MODES
 	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
 					     uint16_t pkts)                                        \
 	{                                                                                          \
-		RTE_SET_USED(tx_queue);                                                            \
-		RTE_SET_USED(tx_pkts);                                                             \
-		RTE_SET_USED(pkts);                                                                \
-		return 0;                                                                          \
+		uint64_t cmd[sz];                                                                  \
+		/* For TSO inner checksum is a must */                                             \
+		if (((flags) & NIX_TX_OFFLOAD_TSO_F) && !((flags) & NIX_TX_OFFLOAD_L3_L4_CSUM_F))  \
+			return 0;                                                                  \
+		return cn20k_nix_xmit_pkts(tx_queue, NULL, tx_pkts, pkts, cmd, flags);             \
 	}
 
 #define NIX_TX_XMIT_MSEG(fn, sz, flags)                                                            \
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 24/33] net/cnxk: support Tx multi-seg in cn20k
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (22 preceding siblings ...)
  2024-09-10  8:58 ` [PATCH 23/33] net/cnxk: support Tx burst scalar " Nithin Dabilpuram
@ 2024-09-10  8:59 ` Nithin Dabilpuram
  2024-09-10  8:59 ` [PATCH 25/33] net/cnxk: support Tx burst vector for cn20k Nithin Dabilpuram
                   ` (11 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:59 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rahul Bhansali, Pavan Nikhilesh

Add Tx multi-seg support in scalar for cn20k.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 drivers/net/cnxk/cn20k_tx.h | 352 +++++++++++++++++++++++++++++++++++-
 1 file changed, 347 insertions(+), 5 deletions(-)

diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
index 610d64f21b..3f163285f0 100644
--- a/drivers/net/cnxk/cn20k_tx.h
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -863,6 +863,183 @@ cn20k_nix_xmit_prepare_tstamp(struct cn20k_eth_txq *txq, uintptr_t lmt_addr,
 	}
 }
 
+static __rte_always_inline uint16_t
+cn20k_nix_prepare_mseg(struct cn20k_eth_txq *txq, struct rte_mbuf *m, struct rte_mbuf **extm,
+		       uint64_t *cmd, const uint16_t flags)
+{
+	uint64_t prefree = 0, aura0, aura, nb_segs, segdw;
+	struct nix_send_hdr_s *send_hdr;
+	union nix_send_sg_s *sg, l_sg;
+	union nix_send_sg2_s l_sg2;
+	struct rte_mbuf *cookie;
+	struct rte_mbuf *m_next;
+	uint8_t off, is_sg2;
+	uint64_t len, dlen;
+	uint64_t ol_flags;
+	uint64_t *slist;
+
+	send_hdr = (struct nix_send_hdr_s *)cmd;
+
+	if (flags & NIX_TX_NEED_EXT_HDR)
+		off = 2;
+	else
+		off = 0;
+
+	sg = (union nix_send_sg_s *)&cmd[2 + off];
+	len = send_hdr->w0.total;
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+		ol_flags = m->ol_flags;
+
+	/* Start from second segment, first segment is already there */
+	dlen = m->data_len;
+	is_sg2 = 0;
+	l_sg.u = sg->u;
+	/* Clear l_sg.u first seg length that might be stale from vector path */
+	l_sg.u &= ~0xFFFFUL;
+	l_sg.u |= dlen;
+	len -= dlen;
+	nb_segs = m->nb_segs - 1;
+	m_next = m->next;
+	m->next = NULL;
+	m->nb_segs = 1;
+	slist = &cmd[3 + off + 1];
+
+	cookie = RTE_MBUF_DIRECT(m) ? m : rte_mbuf_from_indirect(m);
+	/* Set invert df if buffer is not to be freed by H/W */
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) {
+		aura = send_hdr->w0.aura;
+		prefree = cn20k_nix_prefree_seg(m, extm, txq, send_hdr, &aura);
+		send_hdr->w0.aura = aura;
+		l_sg.i1 = prefree;
+	}
+
+#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
+	/* Mark mempool object as "put" since it is freed by NIX */
+	if (!prefree)
+		RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+	rte_io_wmb();
+#else
+	RTE_SET_USED(cookie);
+#endif
+
+	/* Quickly handle single segmented packets. With this if-condition
+	 * compiler will completely optimize out the below do-while loop
+	 * from the Tx handler when NIX_TX_MULTI_SEG_F offload is not set.
+	 */
+	if (!(flags & NIX_TX_MULTI_SEG_F))
+		goto done;
+
+	aura0 = send_hdr->w0.aura;
+	m = m_next;
+	if (!m)
+		goto done;
+
+	/* Fill mbuf segments */
+	do {
+		uint64_t iova;
+
+		/* Save the current mbuf properties. These can get cleared in
+		 * cnxk_nix_prefree_seg()
+		 */
+		m_next = m->next;
+		iova = rte_mbuf_data_iova(m);
+		dlen = m->data_len;
+		len -= dlen;
+
+		nb_segs--;
+		aura = aura0;
+		prefree = 0;
+
+		m->next = NULL;
+
+		cookie = RTE_MBUF_DIRECT(m) ? m : rte_mbuf_from_indirect(m);
+		if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) {
+			aura = roc_npa_aura_handle_to_aura(m->pool->pool_id);
+			prefree = cn20k_nix_prefree_seg(m, extm, txq, send_hdr, &aura);
+			is_sg2 = aura != aura0 && !prefree;
+		}
+
+		if (unlikely(is_sg2)) {
+			/* This mbuf belongs to a different pool and
+			 * DF bit is not to be set, so use SG2 subdesc
+			 * so that it is freed to the appropriate pool.
+			 */
+
+			/* Write the previous descriptor out */
+			sg->u = l_sg.u;
+
+			/* If the current SG subdc does not have any
+			 * iovas in it, then the SG2 subdc can overwrite
+			 * that SG subdc.
+			 *
+			 * If the current SG subdc has 2 iovas in it, then
+			 * the current iova word should be left empty.
+			 */
+			slist += (-1 + (int)l_sg.segs);
+			sg = (union nix_send_sg_s *)slist;
+
+			l_sg2.u = l_sg.u & 0xC00000000000000; /* LD_TYPE */
+			l_sg2.subdc = NIX_SUBDC_SG2;
+			l_sg2.aura = aura;
+			l_sg2.seg1_size = dlen;
+			l_sg.u = l_sg2.u;
+
+			slist++;
+			*slist = iova;
+			slist++;
+		} else {
+			*slist = iova;
+			/* Set invert df if buffer is not to be freed by H/W */
+			l_sg.u |= (prefree << (l_sg.segs + 55));
+			/* Set the segment length */
+			l_sg.u |= ((uint64_t)dlen << (l_sg.segs << 4));
+			l_sg.segs += 1;
+			slist++;
+		}
+
+		if ((is_sg2 || l_sg.segs > 2) && nb_segs) {
+			sg->u = l_sg.u;
+			/* Next SG subdesc */
+			sg = (union nix_send_sg_s *)slist;
+			l_sg.u &= 0xC00000000000000; /* LD_TYPE */
+			l_sg.subdc = NIX_SUBDC_SG;
+			slist++;
+		}
+
+#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
+		/* Mark mempool object as "put" since it is freed by NIX
+		 */
+		if (!prefree)
+			RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+#else
+		RTE_SET_USED(cookie);
+#endif
+		m = m_next;
+	} while (nb_segs);
+
+done:
+	/* Add remaining bytes of security data to last seg */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F && ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD && len) {
+		uint8_t shft = (l_sg.subdc == NIX_SUBDC_SG) ? ((l_sg.segs - 1) << 4) : 0;
+
+		dlen = ((l_sg.u >> shft) & 0xFFFFULL) + len;
+		l_sg.u = l_sg.u & ~(0xFFFFULL << shft);
+		l_sg.u |= dlen << shft;
+	}
+
+	/* Write the last subdc out */
+	sg->u = l_sg.u;
+
+	segdw = (uint64_t *)slist - (uint64_t *)&cmd[2 + off];
+	/* Roundup extra dwords to multiple of 2 */
+	segdw = (segdw >> 1) + (segdw & 0x1);
+	/* Default dwords */
+	segdw += (off >> 1) + 1 + !!(flags & NIX_TX_OFFLOAD_TSTAMP_F);
+	send_hdr->w0.sizem1 = segdw - 1;
+
+	return segdw;
+}
+
 static __rte_always_inline uint16_t
 cn20k_nix_xmit_pkts(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, uint16_t pkts,
 		    uint64_t *cmd, const uint16_t flags)
@@ -1010,6 +1187,170 @@ cn20k_nix_xmit_pkts(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, uin
 	return pkts;
 }
 
+static __rte_always_inline uint16_t
+cn20k_nix_xmit_pkts_mseg(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, uint16_t pkts,
+			 uint64_t *cmd, const uint16_t flags)
+{
+	struct cn20k_eth_txq *txq = tx_queue;
+	uintptr_t pa0, pa1, lbase = txq->lmt_base;
+	const rte_iova_t io_addr = txq->io_addr;
+	uint16_t segdw, lmt_id, burst, left, i;
+	struct rte_mbuf *extm = NULL;
+	uint8_t lnum, c_lnum, c_loff;
+	uintptr_t c_lbase = lbase;
+	uint64_t lso_tun_fmt = 0;
+	uint64_t mark_fmt = 0;
+	uint8_t mark_flag = 0;
+	uint64_t data0, data1;
+	rte_iova_t c_io_addr;
+	uint8_t shft, c_shft;
+	__uint128_t data128;
+	uint16_t c_lmt_id;
+	uint64_t sa_base;
+	uintptr_t laddr;
+	bool sec;
+
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && txq->tx_compl.ena)
+		handle_tx_completion_pkts(txq, flags & NIX_TX_VWQE_F);
+
+	if (!(flags & NIX_TX_VWQE_F))
+		NIX_XMIT_FC_CHECK_RETURN(txq, pkts);
+
+	/* Get cmd skeleton */
+	cn20k_nix_tx_skeleton(txq, cmd, flags, !(flags & NIX_TX_VWQE_F));
+
+	if (flags & NIX_TX_OFFLOAD_TSO_F)
+		lso_tun_fmt = txq->lso_tun_fmt;
+
+	if (flags & NIX_TX_OFFLOAD_VLAN_QINQ_F) {
+		mark_fmt = txq->mark_fmt;
+		mark_flag = txq->mark_flag;
+	}
+
+	/* Get LMT base address and LMT ID as lcore id */
+	ROC_LMT_BASE_ID_GET(lbase, lmt_id);
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		ROC_LMT_CPT_BASE_ID_GET(c_lbase, c_lmt_id);
+		c_io_addr = txq->cpt_io_addr;
+		sa_base = txq->sa_base;
+	}
+
+	left = pkts;
+again:
+	burst = left > 32 ? 32 : left;
+	shft = 16;
+	data128 = 0;
+
+	lnum = 0;
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		c_lnum = 0;
+		c_loff = 0;
+		c_shft = 16;
+	}
+
+	for (i = 0; i < burst; i++) {
+		cn20k_nix_tx_mbuf_validate(tx_pkts[i], flags);
+
+		/* Perform header writes for TSO, barrier at
+		 * lmt steorl will suffice.
+		 */
+		if (flags & NIX_TX_OFFLOAD_TSO_F)
+			cn20k_nix_xmit_prepare_tso(tx_pkts[i], flags);
+
+		cn20k_nix_xmit_prepare(txq, tx_pkts[i], &extm, cmd, flags, lso_tun_fmt, &sec,
+				       mark_flag, mark_fmt);
+
+		laddr = (uintptr_t)LMT_OFF(lbase, lnum, 0);
+
+		/* Prepare CPT instruction and get nixtx addr */
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F && sec)
+			cn20k_nix_prep_sec(tx_pkts[i], cmd, &laddr, c_lbase, &c_lnum, &c_loff,
+					   &c_shft, sa_base, flags);
+
+		/* Move NIX desc to LMT/NIXTX area */
+		cn20k_nix_xmit_mv_lmt_base(laddr, cmd, flags);
+		/* Store sg list directly on lmt line */
+		segdw = cn20k_nix_prepare_mseg(txq, tx_pkts[i], &extm, (uint64_t *)laddr, flags);
+		cn20k_nix_xmit_prepare_tstamp(txq, laddr, tx_pkts[i]->ol_flags, segdw, flags);
+		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F) || !sec) {
+			lnum++;
+			data128 |= (((__uint128_t)(segdw - 1)) << shft);
+			shft += 3;
+		}
+	}
+
+	if ((flags & NIX_TX_VWQE_F) && !(ws[3] & BIT_ULL(35)))
+		ws[3] = roc_sso_hws_head_wait(ws[0]);
+
+	left -= burst;
+	tx_pkts += burst;
+
+	/* Submit CPT instructions if any */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		uint16_t sec_pkts = ((c_lnum << 1) + c_loff);
+
+		/* Reduce pkts to be sent to CPT */
+		burst -= sec_pkts;
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, sec_pkts);
+		cn20k_nix_sec_fc_wait(txq, sec_pkts);
+		cn20k_nix_sec_steorl(c_io_addr, c_lmt_id, c_lnum, c_loff, c_shft);
+	}
+
+	data0 = (uint64_t)data128;
+	data1 = (uint64_t)(data128 >> 64);
+	/* Make data0 similar to data1 */
+	data0 >>= 16;
+	/* Trigger LMTST */
+	if (burst > 16) {
+		pa0 = io_addr | (data0 & 0x7) << 4;
+		data0 &= ~0x7ULL;
+		/* Move lmtst1..15 sz to bits 63:19 */
+		data0 <<= 16;
+		data0 |= (15ULL << 12);
+		data0 |= (uint64_t)lmt_id;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, 16);
+		/* STEOR0 */
+		roc_lmt_submit_steorl(data0, pa0);
+
+		pa1 = io_addr | (data1 & 0x7) << 4;
+		data1 &= ~0x7ULL;
+		data1 <<= 16;
+		data1 |= ((uint64_t)(burst - 17)) << 12;
+		data1 |= (uint64_t)(lmt_id + 16);
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, burst - 16);
+		/* STEOR1 */
+		roc_lmt_submit_steorl(data1, pa1);
+	} else if (burst) {
+		pa0 = io_addr | (data0 & 0x7) << 4;
+		data0 &= ~0x7ULL;
+		/* Move lmtst1..15 sz to bits 63:19 */
+		data0 <<= 16;
+		data0 |= ((burst - 1ULL) << 12);
+		data0 |= (uint64_t)lmt_id;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, burst);
+		/* STEOR0 */
+		roc_lmt_submit_steorl(data0, pa0);
+	}
+
+	rte_io_wmb();
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && !txq->tx_compl.ena) {
+		cn20k_nix_free_extmbuf(extm);
+		extm = NULL;
+	}
+
+	if (left)
+		goto again;
+
+	return pkts;
+}
+
 #define L3L4CSUM_F   NIX_TX_OFFLOAD_L3_L4_CSUM_F
 #define OL3OL4CSUM_F NIX_TX_OFFLOAD_OL3_OL4_CSUM_F
 #define VLAN_F	     NIX_TX_OFFLOAD_VLAN_QINQ_F
@@ -1214,10 +1555,12 @@ NIX_TX_FASTPATH_MODES
 	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
 					     uint16_t pkts)                                        \
 	{                                                                                          \
-		RTE_SET_USED(tx_queue);                                                            \
-		RTE_SET_USED(tx_pkts);                                                             \
-		RTE_SET_USED(pkts);                                                                \
-		return 0;                                                                          \
+		uint64_t cmd[(sz) + CNXK_NIX_TX_MSEG_SG_DWORDS - 2];                               \
+		/* For TSO inner checksum is a must */                                             \
+		if (((flags) & NIX_TX_OFFLOAD_TSO_F) && !((flags) & NIX_TX_OFFLOAD_L3_L4_CSUM_F))  \
+			return 0;                                                                  \
+		return cn20k_nix_xmit_pkts_mseg(tx_queue, NULL, tx_pkts, pkts, cmd,                \
+						flags | NIX_TX_MULTI_SEG_F);                       \
 	}
 
 #define NIX_TX_XMIT_VEC(fn, sz, flags)                                                             \
@@ -1247,5 +1590,4 @@ uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_all_offload(void *tx_queue
 uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_vec_all_offload(void *tx_queue,
 								      struct rte_mbuf **tx_pkts,
 								      uint16_t pkts);
-
 #endif /* __CN20K_TX_H__ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 25/33] net/cnxk: support Tx burst vector for cn20k
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (23 preceding siblings ...)
  2024-09-10  8:59 ` [PATCH 24/33] net/cnxk: support Tx multi-seg in cn20k Nithin Dabilpuram
@ 2024-09-10  8:59 ` Nithin Dabilpuram
  2024-09-10  8:59 ` [PATCH 26/33] net/cnxk: support Tx multi-seg in " Nithin Dabilpuram
                   ` (10 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:59 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rahul Bhansali, Pavan Nikhilesh

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 drivers/net/cnxk/cn20k_tx.h | 1445 ++++++++++++++++++++++++++++++++++-
 1 file changed, 1441 insertions(+), 4 deletions(-)

diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
index 3f163285f0..05c8b80fcb 100644
--- a/drivers/net/cnxk/cn20k_tx.h
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -219,6 +219,28 @@ cn20k_nix_tx_ext_subs(const uint16_t flags)
 		       ((flags & (NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSO_F)) ? 1 : 0);
 }
 
+static __rte_always_inline uint8_t
+cn20k_nix_tx_dwords(const uint16_t flags, const uint8_t segdw)
+{
+	if (!(flags & NIX_TX_MULTI_SEG_F))
+		return cn20k_nix_tx_ext_subs(flags) + 2;
+
+	/* Already everything is accounted for in segdw */
+	return segdw;
+}
+
+static __rte_always_inline uint8_t
+cn20k_nix_pkts_per_vec_brst(const uint16_t flags)
+{
+	return ((flags & NIX_TX_NEED_EXT_HDR) ? 2 : 4) << ROC_LMT_LINES_PER_CORE_LOG2;
+}
+
+static __rte_always_inline uint8_t
+cn20k_nix_tx_dwords_per_line(const uint16_t flags)
+{
+	return (flags & NIX_TX_NEED_EXT_HDR) ? ((flags & NIX_TX_OFFLOAD_TSTAMP_F) ? 8 : 6) : 8;
+}
+
 static __rte_always_inline uint64_t
 cn20k_nix_tx_steor_data(const uint16_t flags)
 {
@@ -247,6 +269,40 @@ cn20k_nix_tx_steor_data(const uint16_t flags)
 	return data;
 }
 
+static __rte_always_inline uint8_t
+cn20k_nix_tx_dwords_per_line_seg(const uint16_t flags)
+{
+	return ((flags & NIX_TX_NEED_EXT_HDR) ? (flags & NIX_TX_OFFLOAD_TSTAMP_F) ? 8 : 6 : 4);
+}
+
+static __rte_always_inline uint64_t
+cn20k_nix_tx_steor_vec_data(const uint16_t flags)
+{
+	const uint64_t dw_m1 = cn20k_nix_tx_dwords_per_line(flags) - 1;
+	uint64_t data;
+
+	/* This will be moved to addr area */
+	data = dw_m1;
+	/* 15 vector sizes for single seg */
+	data |= dw_m1 << 19;
+	data |= dw_m1 << 22;
+	data |= dw_m1 << 25;
+	data |= dw_m1 << 28;
+	data |= dw_m1 << 31;
+	data |= dw_m1 << 34;
+	data |= dw_m1 << 37;
+	data |= dw_m1 << 40;
+	data |= dw_m1 << 43;
+	data |= dw_m1 << 46;
+	data |= dw_m1 << 49;
+	data |= dw_m1 << 52;
+	data |= dw_m1 << 55;
+	data |= dw_m1 << 58;
+	data |= dw_m1 << 61;
+
+	return data;
+}
+
 static __rte_always_inline void
 cn20k_nix_tx_skeleton(struct cn20k_eth_txq *txq, uint64_t *cmd, const uint16_t flags,
 		      const uint16_t static_sz)
@@ -276,6 +332,33 @@ cn20k_nix_tx_skeleton(struct cn20k_eth_txq *txq, uint64_t *cmd, const uint16_t f
 	}
 }
 
+static __rte_always_inline void
+cn20k_nix_sec_fc_wait_one(struct cn20k_eth_txq *txq)
+{
+	uint64_t nb_desc = txq->cpt_desc;
+	uint64_t fc;
+
+#ifdef RTE_ARCH_ARM64
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldxr %[space], [%[addr]]		\n"
+		     "		cmp %[nb_desc], %[space]		\n"
+		     "		b.hi .Ldne%=				\n"
+		     "		sevl					\n"
+		     ".Lrty%=:	wfe					\n"
+		     "		ldxr %[space], [%[addr]]		\n"
+		     "		cmp %[nb_desc], %[space]		\n"
+		     "		b.ls .Lrty%=				\n"
+		     ".Ldne%=:						\n"
+		     : [space] "=&r"(fc)
+		     : [nb_desc] "r"(nb_desc), [addr] "r"(txq->cpt_fc)
+		     : "memory");
+#else
+	RTE_SET_USED(fc);
+	while (nb_desc <= __atomic_load_n(txq->cpt_fc, __ATOMIC_RELAXED))
+		;
+#endif
+}
+
 static __rte_always_inline void
 cn20k_nix_sec_fc_wait(struct cn20k_eth_txq *txq, uint16_t nb_pkts)
 {
@@ -346,6 +429,137 @@ cn20k_nix_sec_fc_wait(struct cn20k_eth_txq *txq, uint16_t nb_pkts)
 }
 
 #if defined(RTE_ARCH_ARM64)
+static __rte_always_inline void
+cn20k_nix_prep_sec_vec(struct rte_mbuf *m, uint64x2_t *cmd0, uint64x2_t *cmd1,
+		       uintptr_t *nixtx_addr, uintptr_t lbase, uint8_t *lnum, uint8_t *loff,
+		       uint8_t *shft, uint64_t sa_base, const uint16_t flags)
+{
+	struct cn20k_sec_sess_priv sess_priv;
+	uint32_t pkt_len, dlen_adj, rlen;
+	uint8_t l3l4type, chksum;
+	uint64x2_t cmd01, cmd23;
+	uint8_t l2_len, l3_len;
+	uintptr_t dptr, nixtx;
+	uint64_t ucode_cmd[4];
+	uint64_t *laddr, w0;
+	uint16_t tag;
+	uint64_t sa;
+
+	sess_priv.u64 = *rte_security_dynfield(m);
+
+	if (flags & NIX_TX_NEED_SEND_HDR_W1) {
+		/* Extract l3l4type either from il3il4type or ol3ol4type */
+		if (flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F && flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) {
+			l2_len = vgetq_lane_u8(*cmd0, 10);
+			/* L4 ptr from send hdr includes l2 and l3 len */
+			l3_len = vgetq_lane_u8(*cmd0, 11) - l2_len;
+			l3l4type = vgetq_lane_u8(*cmd0, 13);
+		} else {
+			l2_len = vgetq_lane_u8(*cmd0, 8);
+			/* L4 ptr from send hdr includes l2 and l3 len */
+			l3_len = vgetq_lane_u8(*cmd0, 9) - l2_len;
+			l3l4type = vgetq_lane_u8(*cmd0, 12);
+		}
+
+		chksum = (l3l4type & 0x1) << 1 | !!(l3l4type & 0x30);
+		chksum = ~chksum;
+		sess_priv.chksum = sess_priv.chksum & chksum;
+		/* Clear SEND header flags */
+		*cmd0 = vsetq_lane_u16(0, *cmd0, 6);
+	} else {
+		l2_len = m->l2_len;
+		l3_len = m->l3_len;
+	}
+
+	/* Retrieve DPTR */
+	dptr = vgetq_lane_u64(*cmd1, 1);
+	pkt_len = vgetq_lane_u16(*cmd0, 0);
+
+	/* Calculate dlen adj */
+	dlen_adj = pkt_len - l2_len;
+	/* Exclude l3 len from roundup for transport mode */
+	dlen_adj -= sess_priv.mode ? 0 : l3_len;
+	rlen = (dlen_adj + sess_priv.roundup_len) + (sess_priv.roundup_byte - 1);
+	rlen &= ~(uint64_t)(sess_priv.roundup_byte - 1);
+	rlen += sess_priv.partial_len;
+	dlen_adj = rlen - dlen_adj;
+
+	/* Update send descriptors. Security is single segment only */
+	*cmd0 = vsetq_lane_u16(pkt_len + dlen_adj, *cmd0, 0);
+
+	/* CPT word 5 and word 6 */
+	w0 = 0;
+	ucode_cmd[2] = 0;
+	if (flags & NIX_TX_MULTI_SEG_F && m->nb_segs > 1) {
+		struct rte_mbuf *last = rte_pktmbuf_lastseg(m);
+
+		/* Get area where NIX descriptor needs to be stored */
+		nixtx = rte_pktmbuf_mtod_offset(last, uintptr_t, last->data_len + dlen_adj);
+		nixtx += BIT_ULL(7);
+		nixtx = (nixtx - 1) & ~(BIT_ULL(7) - 1);
+		nixtx += 16;
+
+		dptr = nixtx + ((flags & NIX_TX_NEED_EXT_HDR) ? 32 : 16);
+
+		/* Set l2 length as data offset */
+		w0 = (uint64_t)l2_len << 16;
+		w0 |= cn20k_nix_tx_ext_subs(flags) + NIX_NB_SEGS_TO_SEGDW(m->nb_segs);
+		ucode_cmd[1] = dptr | ((uint64_t)m->nb_segs << 60);
+	} else {
+		/* Get area where NIX descriptor needs to be stored */
+		nixtx = dptr + pkt_len + dlen_adj;
+		nixtx += BIT_ULL(7);
+		nixtx = (nixtx - 1) & ~(BIT_ULL(7) - 1);
+		nixtx += 16;
+
+		w0 |= cn20k_nix_tx_ext_subs(flags) + 1ULL;
+		dptr += l2_len;
+		ucode_cmd[1] = dptr;
+		*cmd1 = vsetq_lane_u16(pkt_len + dlen_adj, *cmd1, 0);
+		/* DLEN passed is excluding L2 HDR */
+		pkt_len -= l2_len;
+	}
+	w0 |= ((((int64_t)nixtx - (int64_t)dptr) & 0xFFFFF) << 32);
+	/* CPT word 0 and 1 */
+	cmd01 = vdupq_n_u64(0);
+	cmd01 = vsetq_lane_u64(w0, cmd01, 0);
+	/* CPT_RES_S is 16B above NIXTX */
+	cmd01 = vsetq_lane_u64(nixtx - 16, cmd01, 1);
+
+	/* Return nixtx addr */
+	*nixtx_addr = nixtx;
+
+	/* CPT Word 4 and Word 7 */
+	tag = sa_base & 0xFFFFUL;
+	sa_base &= ~0xFFFFUL;
+	sa = (uintptr_t)roc_nix_inl_ot_ipsec_outb_sa(sa_base, sess_priv.sa_idx);
+	ucode_cmd[3] = (ROC_CPT_DFLT_ENG_GRP_SE_IE << 61 | 1UL << 60 | sa);
+	ucode_cmd[0] = (ROC_IE_OT_MAJOR_OP_PROCESS_OUTBOUND_IPSEC << 48 | 1UL << 54 |
+			((uint64_t)sess_priv.chksum) << 32 | ((uint64_t)sess_priv.dec_ttl) << 34 |
+			pkt_len);
+
+	/* CPT word 2 and 3 */
+	cmd23 = vdupq_n_u64(0);
+	cmd23 = vsetq_lane_u64(
+		(((uint64_t)RTE_EVENT_TYPE_CPU << 28) | tag | CNXK_ETHDEV_SEC_OUTB_EV_SUB << 20),
+		cmd23, 0);
+	cmd23 = vsetq_lane_u64((uintptr_t)m | 1, cmd23, 1);
+
+	/* Move to our line */
+	laddr = LMT_OFF(lbase, *lnum, *loff ? 64 : 0);
+
+	/* Write CPT instruction to lmt line */
+	vst1q_u64(laddr, cmd01);
+	vst1q_u64((laddr + 2), cmd23);
+
+	*(__uint128_t *)(laddr + 4) = *(__uint128_t *)ucode_cmd;
+	*(__uint128_t *)(laddr + 6) = *(__uint128_t *)(ucode_cmd + 2);
+
+	/* Move to next line for every other CPT inst */
+	*loff = !(*loff);
+	*lnum = *lnum + (*loff ? 0 : 1);
+	*shft = *shft + (*loff ? 0 : 3);
+}
 
 static __rte_always_inline void
 cn20k_nix_prep_sec(struct rte_mbuf *m, uint64_t *cmd, uintptr_t *nixtx_addr, uintptr_t lbase,
@@ -546,6 +760,156 @@ cn20k_nix_prefree_seg(struct rte_mbuf *m, struct rte_mbuf **extm, struct cn20k_e
 	}
 }
 
+#if defined(RTE_ARCH_ARM64)
+/* Only called for first segments of single segmented mbufs */
+static __rte_always_inline void
+cn20k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct rte_mbuf **extm,
+			  struct cn20k_eth_txq *txq, uint64x2_t *senddesc01_w0,
+			  uint64x2_t *senddesc23_w0, uint64x2_t *senddesc01_w1,
+			  uint64x2_t *senddesc23_w1)
+{
+	struct rte_mbuf **tx_compl_ptr = txq->tx_compl.ptr;
+	uint32_t nb_desc_mask = txq->tx_compl.nb_desc_mask;
+	bool tx_compl_ena = txq->tx_compl.ena;
+	struct rte_mbuf *m0, *m1, *m2, *m3;
+	struct rte_mbuf *cookie;
+	uint64_t w0, w1, aura;
+	uint64_t sqe_id;
+
+	m0 = mbufs[0];
+	m1 = mbufs[1];
+	m2 = mbufs[2];
+	m3 = mbufs[3];
+
+	/* mbuf 0 */
+	w0 = vgetq_lane_u64(*senddesc01_w0, 0);
+	if (RTE_MBUF_HAS_EXTBUF(m0)) {
+		w0 |= BIT_ULL(19);
+		w1 = vgetq_lane_u64(*senddesc01_w1, 0);
+		w1 &= ~0xFFFF000000000000UL;
+		if (unlikely(!tx_compl_ena)) {
+			m0->next = *extm;
+			*extm = m0;
+		} else {
+			sqe_id = rte_atomic_fetch_add_explicit(&txq->tx_compl.sqe_id, 1,
+							       rte_memory_order_relaxed);
+			sqe_id = sqe_id & nb_desc_mask;
+			/* Set PNC */
+			w0 |= BIT_ULL(43);
+			w1 |= sqe_id << 48;
+			tx_compl_ptr[sqe_id] = m0;
+			*senddesc01_w1 = vsetq_lane_u64(w1, *senddesc01_w1, 0);
+		}
+	} else {
+		cookie = RTE_MBUF_DIRECT(m0) ? m0 : rte_mbuf_from_indirect(m0);
+		aura = (w0 >> 20) & 0xFFFFF;
+		w0 &= ~0xFFFFF00000UL;
+		w0 |= cnxk_nix_prefree_seg(m0, &aura) << 19;
+		w0 |= aura << 20;
+
+		if ((w0 & BIT_ULL(19)) == 0)
+			RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+	}
+	*senddesc01_w0 = vsetq_lane_u64(w0, *senddesc01_w0, 0);
+
+	/* mbuf1 */
+	w0 = vgetq_lane_u64(*senddesc01_w0, 1);
+	if (RTE_MBUF_HAS_EXTBUF(m1)) {
+		w0 |= BIT_ULL(19);
+		w1 = vgetq_lane_u64(*senddesc01_w1, 1);
+		w1 &= ~0xFFFF000000000000UL;
+		if (unlikely(!tx_compl_ena)) {
+			m1->next = *extm;
+			*extm = m1;
+		} else {
+			sqe_id = rte_atomic_fetch_add_explicit(&txq->tx_compl.sqe_id, 1,
+							       rte_memory_order_relaxed);
+			sqe_id = sqe_id & nb_desc_mask;
+			/* Set PNC */
+			w0 |= BIT_ULL(43);
+			w1 |= sqe_id << 48;
+			tx_compl_ptr[sqe_id] = m1;
+			*senddesc01_w1 = vsetq_lane_u64(w1, *senddesc01_w1, 1);
+		}
+	} else {
+		cookie = RTE_MBUF_DIRECT(m1) ? m1 : rte_mbuf_from_indirect(m1);
+		aura = (w0 >> 20) & 0xFFFFF;
+		w0 &= ~0xFFFFF00000UL;
+		w0 |= cnxk_nix_prefree_seg(m1, &aura) << 19;
+		w0 |= aura << 20;
+
+		if ((w0 & BIT_ULL(19)) == 0)
+			RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+	}
+	*senddesc01_w0 = vsetq_lane_u64(w0, *senddesc01_w0, 1);
+
+	/* mbuf 2 */
+	w0 = vgetq_lane_u64(*senddesc23_w0, 0);
+	if (RTE_MBUF_HAS_EXTBUF(m2)) {
+		w0 |= BIT_ULL(19);
+		w1 = vgetq_lane_u64(*senddesc23_w1, 0);
+		w1 &= ~0xFFFF000000000000UL;
+		if (unlikely(!tx_compl_ena)) {
+			m2->next = *extm;
+			*extm = m2;
+		} else {
+			sqe_id = rte_atomic_fetch_add_explicit(&txq->tx_compl.sqe_id, 1,
+							       rte_memory_order_relaxed);
+			sqe_id = sqe_id & nb_desc_mask;
+			/* Set PNC */
+			w0 |= BIT_ULL(43);
+			w1 |= sqe_id << 48;
+			tx_compl_ptr[sqe_id] = m2;
+			*senddesc23_w1 = vsetq_lane_u64(w1, *senddesc23_w1, 0);
+		}
+	} else {
+		cookie = RTE_MBUF_DIRECT(m2) ? m2 : rte_mbuf_from_indirect(m2);
+		aura = (w0 >> 20) & 0xFFFFF;
+		w0 &= ~0xFFFFF00000UL;
+		w0 |= cnxk_nix_prefree_seg(m2, &aura) << 19;
+		w0 |= aura << 20;
+
+		if ((w0 & BIT_ULL(19)) == 0)
+			RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+	}
+	*senddesc23_w0 = vsetq_lane_u64(w0, *senddesc23_w0, 0);
+
+	/* mbuf3 */
+	w0 = vgetq_lane_u64(*senddesc23_w0, 1);
+	if (RTE_MBUF_HAS_EXTBUF(m3)) {
+		w0 |= BIT_ULL(19);
+		w1 = vgetq_lane_u64(*senddesc23_w1, 1);
+		w1 &= ~0xFFFF000000000000UL;
+		if (unlikely(!tx_compl_ena)) {
+			m3->next = *extm;
+			*extm = m3;
+		} else {
+			sqe_id = rte_atomic_fetch_add_explicit(&txq->tx_compl.sqe_id, 1,
+							       rte_memory_order_relaxed);
+			sqe_id = sqe_id & nb_desc_mask;
+			/* Set PNC */
+			w0 |= BIT_ULL(43);
+			w1 |= sqe_id << 48;
+			tx_compl_ptr[sqe_id] = m3;
+			*senddesc23_w1 = vsetq_lane_u64(w1, *senddesc23_w1, 1);
+		}
+	} else {
+		cookie = RTE_MBUF_DIRECT(m3) ? m3 : rte_mbuf_from_indirect(m3);
+		aura = (w0 >> 20) & 0xFFFFF;
+		w0 &= ~0xFFFFF00000UL;
+		w0 |= cnxk_nix_prefree_seg(m3, &aura) << 19;
+		w0 |= aura << 20;
+
+		if ((w0 & BIT_ULL(19)) == 0)
+			RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+	}
+	*senddesc23_w0 = vsetq_lane_u64(w0, *senddesc23_w0, 1);
+#ifndef RTE_LIBRTE_MEMPOOL_DEBUG
+	RTE_SET_USED(cookie);
+#endif
+}
+#endif
+
 static __rte_always_inline void
 cn20k_nix_xmit_prepare_tso(struct rte_mbuf *m, const uint64_t flags)
 {
@@ -1351,6 +1715,1078 @@ cn20k_nix_xmit_pkts_mseg(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts
 	return pkts;
 }
 
+#if defined(RTE_ARCH_ARM64)
+
+#define NIX_DESCS_PER_LOOP 4
+
+static __rte_always_inline void
+cn20k_nix_lmt_next(uint8_t dw, uintptr_t laddr, uint8_t *lnum, uint8_t *loff, uint8_t *shift,
+		   __uint128_t *data128, uintptr_t *next)
+{
+	/* Go to next line if we are out of space */
+	if ((*loff + (dw << 4)) > 128) {
+		*data128 = *data128 | (((__uint128_t)((*loff >> 4) - 1)) << *shift);
+		*shift = *shift + 3;
+		*loff = 0;
+		*lnum = *lnum + 1;
+	}
+
+	*next = (uintptr_t)LMT_OFF(laddr, *lnum, *loff);
+	*loff = *loff + (dw << 4);
+}
+
+static __rte_always_inline void
+cn20k_nix_xmit_store(struct cn20k_eth_txq *txq, struct rte_mbuf *mbuf, struct rte_mbuf **extm,
+		     uint8_t segdw, uintptr_t laddr, uint64x2_t cmd0, uint64x2_t cmd1,
+		     uint64x2_t cmd2, uint64x2_t cmd3, const uint16_t flags)
+{
+	RTE_SET_USED(txq);
+	RTE_SET_USED(mbuf);
+	RTE_SET_USED(extm);
+	RTE_SET_USED(segdw);
+
+	if (flags & NIX_TX_NEED_EXT_HDR) {
+		/* Store the prepared send desc to LMT lines */
+		if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+			vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
+			vst1q_u64(LMT_OFF(laddr, 0, 16), cmd2);
+			vst1q_u64(LMT_OFF(laddr, 0, 32), cmd1);
+			vst1q_u64(LMT_OFF(laddr, 0, 48), cmd3);
+		} else {
+			vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
+			vst1q_u64(LMT_OFF(laddr, 0, 16), cmd2);
+			vst1q_u64(LMT_OFF(laddr, 0, 32), cmd1);
+		}
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf->pool, (void **)&mbuf, 1, 0);
+	} else {
+		/* Store the prepared send desc to LMT lines */
+		vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
+		vst1q_u64(LMT_OFF(laddr, 0, 16), cmd1);
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf->pool, (void **)&mbuf, 1, 0);
+	}
+}
+
+static __rte_always_inline uint16_t
+cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, uint16_t pkts,
+			   uint64_t *cmd, const uint16_t flags)
+{
+	uint64x2_t dataoff_iova0, dataoff_iova1, dataoff_iova2, dataoff_iova3;
+	uint64x2_t len_olflags0, len_olflags1, len_olflags2, len_olflags3;
+	uint64x2_t cmd0[NIX_DESCS_PER_LOOP], cmd1[NIX_DESCS_PER_LOOP], cmd2[NIX_DESCS_PER_LOOP],
+		cmd3[NIX_DESCS_PER_LOOP];
+	uint16_t left, scalar, burst, i, lmt_id, c_lmt_id;
+	uint64_t *mbuf0, *mbuf1, *mbuf2, *mbuf3, pa;
+	uint64x2_t senddesc01_w0, senddesc23_w0;
+	uint64x2_t senddesc01_w1, senddesc23_w1;
+	uint64x2_t sendext01_w0, sendext23_w0;
+	uint64x2_t sendext01_w1, sendext23_w1;
+	uint64x2_t sendmem01_w0, sendmem23_w0;
+	uint64x2_t sendmem01_w1, sendmem23_w1;
+	uint8_t segdw[NIX_DESCS_PER_LOOP + 1];
+	uint64x2_t sgdesc01_w0, sgdesc23_w0;
+	uint64x2_t sgdesc01_w1, sgdesc23_w1;
+	struct cn20k_eth_txq *txq = tx_queue;
+	rte_iova_t io_addr = txq->io_addr;
+	uint8_t lnum, shift = 0, loff = 0;
+	uintptr_t laddr = txq->lmt_base;
+	uint8_t c_lnum, c_shft, c_loff;
+	uint64x2_t ltypes01, ltypes23;
+	uint64x2_t xtmp128, ytmp128;
+	uint64x2_t xmask01, xmask23;
+	uintptr_t c_laddr = laddr;
+	rte_iova_t c_io_addr;
+	uint64_t sa_base;
+	union wdata {
+		__uint128_t data128;
+		uint64_t data[2];
+	} wd;
+	struct rte_mbuf *extm = NULL;
+
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && txq->tx_compl.ena)
+		handle_tx_completion_pkts(txq, flags & NIX_TX_VWQE_F);
+
+	if (!(flags & NIX_TX_VWQE_F)) {
+		scalar = pkts & (NIX_DESCS_PER_LOOP - 1);
+		pkts = RTE_ALIGN_FLOOR(pkts, NIX_DESCS_PER_LOOP);
+		NIX_XMIT_FC_CHECK_RETURN(txq, pkts);
+	} else {
+		scalar = pkts & (NIX_DESCS_PER_LOOP - 1);
+		pkts = RTE_ALIGN_FLOOR(pkts, NIX_DESCS_PER_LOOP);
+	}
+
+	if (!(flags & NIX_TX_VWQE_F)) {
+		senddesc01_w0 = vld1q_dup_u64(&txq->send_hdr_w0);
+	} else {
+		uint64_t w0 = (txq->send_hdr_w0 & 0xFFFFF00000000000) |
+			      ((uint64_t)(cn20k_nix_tx_ext_subs(flags) + 1) << 40);
+
+		senddesc01_w0 = vdupq_n_u64(w0);
+	}
+	senddesc23_w0 = senddesc01_w0;
+
+	senddesc01_w1 = vdupq_n_u64(0);
+	senddesc23_w1 = senddesc01_w1;
+	if (!(flags & NIX_TX_OFFLOAD_MBUF_NOFF_F))
+		sgdesc01_w0 = vdupq_n_u64((NIX_SUBDC_SG << 60) | (NIX_SENDLDTYPE_LDWB << 58) |
+					  BIT_ULL(48));
+	else
+		sgdesc01_w0 = vdupq_n_u64((NIX_SUBDC_SG << 60) | BIT_ULL(48));
+	sgdesc23_w0 = sgdesc01_w0;
+
+	if (flags & NIX_TX_NEED_EXT_HDR) {
+		if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+			sendext01_w0 = vdupq_n_u64((NIX_SUBDC_EXT << 60) | BIT_ULL(15));
+			sendmem01_w0 = vdupq_n_u64((NIX_SUBDC_MEM << 60) |
+						   (NIX_SENDMEMALG_SETTSTMP << 56));
+			sendmem23_w0 = sendmem01_w0;
+			sendmem01_w1 = vdupq_n_u64(txq->ts_mem);
+			sendmem23_w1 = sendmem01_w1;
+		} else {
+			sendext01_w0 = vdupq_n_u64((NIX_SUBDC_EXT << 60));
+		}
+		sendext23_w0 = sendext01_w0;
+
+		if (flags & NIX_TX_OFFLOAD_VLAN_QINQ_F)
+			sendext01_w1 = vdupq_n_u64(12 | 12U << 24);
+		else
+			sendext01_w1 = vdupq_n_u64(0);
+		sendext23_w1 = sendext01_w1;
+	}
+
+	/* Get LMT base address and LMT ID as lcore id */
+	ROC_LMT_BASE_ID_GET(laddr, lmt_id);
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		ROC_LMT_CPT_BASE_ID_GET(c_laddr, c_lmt_id);
+		c_io_addr = txq->cpt_io_addr;
+		sa_base = txq->sa_base;
+	}
+
+	left = pkts;
+again:
+	/* Number of packets to prepare depends on offloads enabled. */
+	burst = left > cn20k_nix_pkts_per_vec_brst(flags) ? cn20k_nix_pkts_per_vec_brst(flags) :
+							    left;
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		wd.data128 = 0;
+		shift = 16;
+	}
+	lnum = 0;
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		loff = 0;
+		c_loff = 0;
+		c_lnum = 0;
+		c_shft = 16;
+	}
+
+	for (i = 0; i < burst; i += NIX_DESCS_PER_LOOP) {
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F &&
+		    (((int)((16 - c_lnum) << 1) - c_loff) < 4)) {
+			burst = i;
+			break;
+		}
+
+		/* Clear lower 32bit of SEND_HDR_W0 and SEND_SG_W0 */
+		senddesc01_w0 = vbicq_u64(senddesc01_w0, vdupq_n_u64(0x800FFFFFFFF));
+		sgdesc01_w0 = vbicq_u64(sgdesc01_w0, vdupq_n_u64(0xFFFFFFFF));
+
+		senddesc23_w0 = senddesc01_w0;
+		sgdesc23_w0 = sgdesc01_w0;
+
+		/* Clear vlan enables. */
+		if (flags & NIX_TX_NEED_EXT_HDR) {
+			sendext01_w1 = vbicq_u64(sendext01_w1, vdupq_n_u64(0x3FFFF00FFFF00));
+			sendext23_w1 = sendext01_w1;
+		}
+
+		if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+			/* Reset send mem alg to SETTSTMP from SUB*/
+			sendmem01_w0 = vbicq_u64(sendmem01_w0, vdupq_n_u64(BIT_ULL(59)));
+			/* Reset send mem address to default. */
+			sendmem01_w1 = vbicq_u64(sendmem01_w1, vdupq_n_u64(0xF));
+			sendmem23_w0 = sendmem01_w0;
+			sendmem23_w1 = sendmem01_w1;
+		}
+
+		/* Move mbufs to iova */
+		mbuf0 = (uint64_t *)tx_pkts[0];
+		mbuf1 = (uint64_t *)tx_pkts[1];
+		mbuf2 = (uint64_t *)tx_pkts[2];
+		mbuf3 = (uint64_t *)tx_pkts[3];
+
+		/*
+		 * Get mbuf's, olflags, iova, pktlen, dataoff
+		 * dataoff_iovaX.D[0] = iova,
+		 * dataoff_iovaX.D[1](15:0) = mbuf->dataoff
+		 * len_olflagsX.D[0] = ol_flags,
+		 * len_olflagsX.D[1](63:32) = mbuf->pkt_len
+		 */
+		dataoff_iova0 =
+			vsetq_lane_u64(((struct rte_mbuf *)mbuf0)->data_off, vld1q_u64(mbuf0), 1);
+		len_olflags0 = vld1q_u64(mbuf0 + 3);
+		dataoff_iova1 =
+			vsetq_lane_u64(((struct rte_mbuf *)mbuf1)->data_off, vld1q_u64(mbuf1), 1);
+		len_olflags1 = vld1q_u64(mbuf1 + 3);
+		dataoff_iova2 =
+			vsetq_lane_u64(((struct rte_mbuf *)mbuf2)->data_off, vld1q_u64(mbuf2), 1);
+		len_olflags2 = vld1q_u64(mbuf2 + 3);
+		dataoff_iova3 =
+			vsetq_lane_u64(((struct rte_mbuf *)mbuf3)->data_off, vld1q_u64(mbuf3), 1);
+		len_olflags3 = vld1q_u64(mbuf3 + 3);
+
+		/* Move mbufs to point pool */
+		mbuf0 = (uint64_t *)((uintptr_t)mbuf0 + offsetof(struct rte_mbuf, pool));
+		mbuf1 = (uint64_t *)((uintptr_t)mbuf1 + offsetof(struct rte_mbuf, pool));
+		mbuf2 = (uint64_t *)((uintptr_t)mbuf2 + offsetof(struct rte_mbuf, pool));
+		mbuf3 = (uint64_t *)((uintptr_t)mbuf3 + offsetof(struct rte_mbuf, pool));
+
+		if (flags & (NIX_TX_OFFLOAD_OL3_OL4_CSUM_F | NIX_TX_OFFLOAD_L3_L4_CSUM_F)) {
+			/* Get tx_offload for ol2, ol3, l2, l3 lengths */
+			/*
+			 * E(8):OL2_LEN(7):OL3_LEN(9):E(24):L3_LEN(9):L2_LEN(7)
+			 * E(8):OL2_LEN(7):OL3_LEN(9):E(24):L3_LEN(9):L2_LEN(7)
+			 */
+
+			asm volatile("LD1 {%[a].D}[0],[%[in]]\n\t"
+				     : [a] "+w"(senddesc01_w1)
+				     : [in] "r"(mbuf0 + 2)
+				     : "memory");
+
+			asm volatile("LD1 {%[a].D}[1],[%[in]]\n\t"
+				     : [a] "+w"(senddesc01_w1)
+				     : [in] "r"(mbuf1 + 2)
+				     : "memory");
+
+			asm volatile("LD1 {%[b].D}[0],[%[in]]\n\t"
+				     : [b] "+w"(senddesc23_w1)
+				     : [in] "r"(mbuf2 + 2)
+				     : "memory");
+
+			asm volatile("LD1 {%[b].D}[1],[%[in]]\n\t"
+				     : [b] "+w"(senddesc23_w1)
+				     : [in] "r"(mbuf3 + 2)
+				     : "memory");
+
+			/* Get pool pointer alone */
+			mbuf0 = (uint64_t *)*mbuf0;
+			mbuf1 = (uint64_t *)*mbuf1;
+			mbuf2 = (uint64_t *)*mbuf2;
+			mbuf3 = (uint64_t *)*mbuf3;
+		} else {
+			/* Get pool pointer alone */
+			mbuf0 = (uint64_t *)*mbuf0;
+			mbuf1 = (uint64_t *)*mbuf1;
+			mbuf2 = (uint64_t *)*mbuf2;
+			mbuf3 = (uint64_t *)*mbuf3;
+		}
+
+		const uint8x16_t shuf_mask2 = {
+			0x4, 0x5, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+			0xc, 0xd, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+		};
+		xtmp128 = vzip2q_u64(len_olflags0, len_olflags1);
+		ytmp128 = vzip2q_u64(len_olflags2, len_olflags3);
+
+		/*
+		 * Pick only 16 bits of pktlen preset at bits 63:32
+		 * and place them at bits 15:0.
+		 */
+		xtmp128 = vqtbl1q_u8(xtmp128, shuf_mask2);
+		ytmp128 = vqtbl1q_u8(ytmp128, shuf_mask2);
+
+		/* Add pairwise to get dataoff + iova in sgdesc_w1 */
+		sgdesc01_w1 = vpaddq_u64(dataoff_iova0, dataoff_iova1);
+		sgdesc23_w1 = vpaddq_u64(dataoff_iova2, dataoff_iova3);
+
+		/* Orr both sgdesc_w0 and senddesc_w0 with 16 bits of
+		 * pktlen at 15:0 position.
+		 */
+		sgdesc01_w0 = vorrq_u64(sgdesc01_w0, xtmp128);
+		sgdesc23_w0 = vorrq_u64(sgdesc23_w0, ytmp128);
+		senddesc01_w0 = vorrq_u64(senddesc01_w0, xtmp128);
+		senddesc23_w0 = vorrq_u64(senddesc23_w0, ytmp128);
+
+		/* Move mbuf to point to pool_id. */
+		mbuf0 = (uint64_t *)((uintptr_t)mbuf0 + offsetof(struct rte_mempool, pool_id));
+		mbuf1 = (uint64_t *)((uintptr_t)mbuf1 + offsetof(struct rte_mempool, pool_id));
+		mbuf2 = (uint64_t *)((uintptr_t)mbuf2 + offsetof(struct rte_mempool, pool_id));
+		mbuf3 = (uint64_t *)((uintptr_t)mbuf3 + offsetof(struct rte_mempool, pool_id));
+
+		if ((flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F) &&
+		    !(flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F)) {
+			/*
+			 * Lookup table to translate ol_flags to
+			 * il3/il4 types. But we still use ol3/ol4 types in
+			 * senddesc_w1 as only one header processing is enabled.
+			 */
+			const uint8x16_t tbl = {
+				/* [0-15] = il4type:il3type */
+				0x04, /* none (IPv6 assumed) */
+				0x14, /* RTE_MBUF_F_TX_TCP_CKSUM (IPv6 assumed) */
+				0x24, /* RTE_MBUF_F_TX_SCTP_CKSUM (IPv6 assumed) */
+				0x34, /* RTE_MBUF_F_TX_UDP_CKSUM (IPv6 assumed) */
+				0x03, /* RTE_MBUF_F_TX_IP_CKSUM */
+				0x13, /* RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_TCP_CKSUM */
+				0x23, /* RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_SCTP_CKSUM */
+				0x33, /* RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_UDP_CKSUM */
+				0x02, /* RTE_MBUF_F_TX_IPV4  */
+				0x12, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_TCP_CKSUM */
+				0x22, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_SCTP_CKSUM */
+				0x32, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_UDP_CKSUM */
+				0x03, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM */
+				0x13, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM |
+				       * RTE_MBUF_F_TX_TCP_CKSUM
+				       */
+				0x23, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM |
+				       * RTE_MBUF_F_TX_SCTP_CKSUM
+				       */
+				0x33, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM |
+				       * RTE_MBUF_F_TX_UDP_CKSUM
+				       */
+			};
+
+			/* Extract olflags to translate to iltypes */
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			/*
+			 * E(47):L3_LEN(9):L2_LEN(7+z)
+			 * E(47):L3_LEN(9):L2_LEN(7+z)
+			 */
+			senddesc01_w1 = vshlq_n_u64(senddesc01_w1, 1);
+			senddesc23_w1 = vshlq_n_u64(senddesc23_w1, 1);
+
+			/* Move OLFLAGS bits 55:52 to 51:48
+			 * with zeros preprended on the byte and rest
+			 * don't care
+			 */
+			xtmp128 = vshrq_n_u8(xtmp128, 4);
+			ytmp128 = vshrq_n_u8(ytmp128, 4);
+			/*
+			 * E(48):L3_LEN(8):L2_LEN(z+7)
+			 * E(48):L3_LEN(8):L2_LEN(z+7)
+			 */
+			const int8x16_t tshft3 = {
+				-1, 0, 8, 8, 8, 8, 8, 8,
+				-1, 0, 8, 8, 8, 8, 8, 8,
+			};
+
+			senddesc01_w1 = vshlq_u8(senddesc01_w1, tshft3);
+			senddesc23_w1 = vshlq_u8(senddesc23_w1, tshft3);
+
+			/* Do the lookup */
+			ltypes01 = vqtbl1q_u8(tbl, xtmp128);
+			ltypes23 = vqtbl1q_u8(tbl, ytmp128);
+
+			/* Pick only relevant fields i.e Bit 48:55 of iltype
+			 * and place it in ol3/ol4type of senddesc_w1
+			 */
+			const uint8x16_t shuf_mask0 = {
+				0xFF, 0xFF, 0xFF, 0xFF, 0x6, 0xFF, 0xFF, 0xFF,
+				0xFF, 0xFF, 0xFF, 0xFF, 0xE, 0xFF, 0xFF, 0xFF,
+			};
+
+			ltypes01 = vqtbl1q_u8(ltypes01, shuf_mask0);
+			ltypes23 = vqtbl1q_u8(ltypes23, shuf_mask0);
+
+			/* Prepare ol4ptr, ol3ptr from ol3len, ol2len.
+			 * a [E(32):E(16):OL3(8):OL2(8)]
+			 * a = a + (a << 8)
+			 * a [E(32):E(16):(OL3+OL2):OL2]
+			 * => E(32):E(16)::OL4PTR(8):OL3PTR(8)
+			 */
+			senddesc01_w1 = vaddq_u8(senddesc01_w1, vshlq_n_u16(senddesc01_w1, 8));
+			senddesc23_w1 = vaddq_u8(senddesc23_w1, vshlq_n_u16(senddesc23_w1, 8));
+
+			/* Move ltypes to senddesc*_w1 */
+			senddesc01_w1 = vorrq_u64(senddesc01_w1, ltypes01);
+			senddesc23_w1 = vorrq_u64(senddesc23_w1, ltypes23);
+		} else if (!(flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F) &&
+			   (flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F)) {
+			/*
+			 * Lookup table to translate ol_flags to
+			 * ol3/ol4 types.
+			 */
+
+			const uint8x16_t tbl = {
+				/* [0-15] = ol4type:ol3type */
+				0x00, /* none */
+				0x03, /* OUTER_IP_CKSUM */
+				0x02, /* OUTER_IPV4 */
+				0x03, /* OUTER_IPV4 | OUTER_IP_CKSUM */
+				0x04, /* OUTER_IPV6 */
+				0x00, /* OUTER_IPV6 | OUTER_IP_CKSUM */
+				0x00, /* OUTER_IPV6 | OUTER_IPV4 */
+				0x00, /* OUTER_IPV6 | OUTER_IPV4 |
+				       * OUTER_IP_CKSUM
+				       */
+				0x00, /* OUTER_UDP_CKSUM */
+				0x33, /* OUTER_UDP_CKSUM | OUTER_IP_CKSUM */
+				0x32, /* OUTER_UDP_CKSUM | OUTER_IPV4 */
+				0x33, /* OUTER_UDP_CKSUM | OUTER_IPV4 |
+				       * OUTER_IP_CKSUM
+				       */
+				0x34, /* OUTER_UDP_CKSUM | OUTER_IPV6 */
+				0x00, /* OUTER_UDP_CKSUM | OUTER_IPV6 |
+				       * OUTER_IP_CKSUM
+				       */
+				0x00, /* OUTER_UDP_CKSUM | OUTER_IPV6 |
+				       * OUTER_IPV4
+				       */
+				0x00, /* OUTER_UDP_CKSUM | OUTER_IPV6 |
+				       * OUTER_IPV4 | OUTER_IP_CKSUM
+				       */
+			};
+
+			/* Extract olflags to translate to iltypes */
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			/*
+			 * E(47):OL3_LEN(9):OL2_LEN(7+z)
+			 * E(47):OL3_LEN(9):OL2_LEN(7+z)
+			 */
+			const uint8x16_t shuf_mask5 = {
+				0x6, 0x5, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+				0xE, 0xD, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+			};
+			senddesc01_w1 = vqtbl1q_u8(senddesc01_w1, shuf_mask5);
+			senddesc23_w1 = vqtbl1q_u8(senddesc23_w1, shuf_mask5);
+
+			/* Extract outer ol flags only */
+			const uint64x2_t o_cksum_mask = {
+				0x1C00020000000000,
+				0x1C00020000000000,
+			};
+
+			xtmp128 = vandq_u64(xtmp128, o_cksum_mask);
+			ytmp128 = vandq_u64(ytmp128, o_cksum_mask);
+
+			/* Extract OUTER_UDP_CKSUM bit 41 and
+			 * move it to bit 61
+			 */
+
+			xtmp128 = xtmp128 | vshlq_n_u64(xtmp128, 20);
+			ytmp128 = ytmp128 | vshlq_n_u64(ytmp128, 20);
+
+			/* Shift oltype by 2 to start nibble from BIT(56)
+			 * instead of BIT(58)
+			 */
+			xtmp128 = vshrq_n_u8(xtmp128, 2);
+			ytmp128 = vshrq_n_u8(ytmp128, 2);
+			/*
+			 * E(48):L3_LEN(8):L2_LEN(z+7)
+			 * E(48):L3_LEN(8):L2_LEN(z+7)
+			 */
+			const int8x16_t tshft3 = {
+				-1, 0, 8, 8, 8, 8, 8, 8,
+				-1, 0, 8, 8, 8, 8, 8, 8,
+			};
+
+			senddesc01_w1 = vshlq_u8(senddesc01_w1, tshft3);
+			senddesc23_w1 = vshlq_u8(senddesc23_w1, tshft3);
+
+			/* Do the lookup */
+			ltypes01 = vqtbl1q_u8(tbl, xtmp128);
+			ltypes23 = vqtbl1q_u8(tbl, ytmp128);
+
+			/* Pick only relevant fields i.e Bit 56:63 of oltype
+			 * and place it in ol3/ol4type of senddesc_w1
+			 */
+			const uint8x16_t shuf_mask0 = {
+				0xFF, 0xFF, 0xFF, 0xFF, 0x7, 0xFF, 0xFF, 0xFF,
+				0xFF, 0xFF, 0xFF, 0xFF, 0xF, 0xFF, 0xFF, 0xFF,
+			};
+
+			ltypes01 = vqtbl1q_u8(ltypes01, shuf_mask0);
+			ltypes23 = vqtbl1q_u8(ltypes23, shuf_mask0);
+
+			/* Prepare ol4ptr, ol3ptr from ol3len, ol2len.
+			 * a [E(32):E(16):OL3(8):OL2(8)]
+			 * a = a + (a << 8)
+			 * a [E(32):E(16):(OL3+OL2):OL2]
+			 * => E(32):E(16)::OL4PTR(8):OL3PTR(8)
+			 */
+			senddesc01_w1 = vaddq_u8(senddesc01_w1, vshlq_n_u16(senddesc01_w1, 8));
+			senddesc23_w1 = vaddq_u8(senddesc23_w1, vshlq_n_u16(senddesc23_w1, 8));
+
+			/* Move ltypes to senddesc*_w1 */
+			senddesc01_w1 = vorrq_u64(senddesc01_w1, ltypes01);
+			senddesc23_w1 = vorrq_u64(senddesc23_w1, ltypes23);
+		} else if ((flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F) &&
+			   (flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F)) {
+			/* Lookup table to translate ol_flags to
+			 * ol4type, ol3type, il4type, il3type of senddesc_w1
+			 */
+			const uint8x16x2_t tbl = {{
+				{
+					/* [0-15] = il4type:il3type */
+					0x04, /* none (IPv6) */
+					0x14, /* RTE_MBUF_F_TX_TCP_CKSUM (IPv6) */
+					0x24, /* RTE_MBUF_F_TX_SCTP_CKSUM (IPv6) */
+					0x34, /* RTE_MBUF_F_TX_UDP_CKSUM (IPv6) */
+					0x03, /* RTE_MBUF_F_TX_IP_CKSUM */
+					0x13, /* RTE_MBUF_F_TX_IP_CKSUM |
+					       * RTE_MBUF_F_TX_TCP_CKSUM
+					       */
+					0x23, /* RTE_MBUF_F_TX_IP_CKSUM |
+					       * RTE_MBUF_F_TX_SCTP_CKSUM
+					       */
+					0x33, /* RTE_MBUF_F_TX_IP_CKSUM |
+					       * RTE_MBUF_F_TX_UDP_CKSUM
+					       */
+					0x02, /* RTE_MBUF_F_TX_IPV4 */
+					0x12, /* RTE_MBUF_F_TX_IPV4 |
+					       * RTE_MBUF_F_TX_TCP_CKSUM
+					       */
+					0x22, /* RTE_MBUF_F_TX_IPV4 |
+					       * RTE_MBUF_F_TX_SCTP_CKSUM
+					       */
+					0x32, /* RTE_MBUF_F_TX_IPV4 |
+					       * RTE_MBUF_F_TX_UDP_CKSUM
+					       */
+					0x03, /* RTE_MBUF_F_TX_IPV4 |
+					       * RTE_MBUF_F_TX_IP_CKSUM
+					       */
+					0x13, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM |
+					       * RTE_MBUF_F_TX_TCP_CKSUM
+					       */
+					0x23, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM |
+					       * RTE_MBUF_F_TX_SCTP_CKSUM
+					       */
+					0x33, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM |
+					       * RTE_MBUF_F_TX_UDP_CKSUM
+					       */
+				},
+
+				{
+					/* [16-31] = ol4type:ol3type */
+					0x00, /* none */
+					0x03, /* OUTER_IP_CKSUM */
+					0x02, /* OUTER_IPV4 */
+					0x03, /* OUTER_IPV4 | OUTER_IP_CKSUM */
+					0x04, /* OUTER_IPV6 */
+					0x00, /* OUTER_IPV6 | OUTER_IP_CKSUM */
+					0x00, /* OUTER_IPV6 | OUTER_IPV4 */
+					0x00, /* OUTER_IPV6 | OUTER_IPV4 |
+					       * OUTER_IP_CKSUM
+					       */
+					0x00, /* OUTER_UDP_CKSUM */
+					0x33, /* OUTER_UDP_CKSUM |
+					       * OUTER_IP_CKSUM
+					       */
+					0x32, /* OUTER_UDP_CKSUM |
+					       * OUTER_IPV4
+					       */
+					0x33, /* OUTER_UDP_CKSUM |
+					       * OUTER_IPV4 | OUTER_IP_CKSUM
+					       */
+					0x34, /* OUTER_UDP_CKSUM |
+					       * OUTER_IPV6
+					       */
+					0x00, /* OUTER_UDP_CKSUM | OUTER_IPV6 |
+					       * OUTER_IP_CKSUM
+					       */
+					0x00, /* OUTER_UDP_CKSUM | OUTER_IPV6 |
+					       * OUTER_IPV4
+					       */
+					0x00, /* OUTER_UDP_CKSUM | OUTER_IPV6 |
+					       * OUTER_IPV4 | OUTER_IP_CKSUM
+					       */
+				},
+			}};
+
+			/* Extract olflags to translate to oltype & iltype */
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			/*
+			 * E(8):OL2_LN(7):OL3_LN(9):E(23):L3_LN(9):L2_LN(7+z)
+			 * E(8):OL2_LN(7):OL3_LN(9):E(23):L3_LN(9):L2_LN(7+z)
+			 */
+			const uint32x4_t tshft_4 = {
+				1,
+				0,
+				1,
+				0,
+			};
+			senddesc01_w1 = vshlq_u32(senddesc01_w1, tshft_4);
+			senddesc23_w1 = vshlq_u32(senddesc23_w1, tshft_4);
+
+			/*
+			 * E(32):L3_LEN(8):L2_LEN(7+Z):OL3_LEN(8):OL2_LEN(7+Z)
+			 * E(32):L3_LEN(8):L2_LEN(7+Z):OL3_LEN(8):OL2_LEN(7+Z)
+			 */
+			const uint8x16_t shuf_mask5 = {
+				0x6, 0x5, 0x0, 0x1, 0xFF, 0xFF, 0xFF, 0xFF,
+				0xE, 0xD, 0x8, 0x9, 0xFF, 0xFF, 0xFF, 0xFF,
+			};
+			senddesc01_w1 = vqtbl1q_u8(senddesc01_w1, shuf_mask5);
+			senddesc23_w1 = vqtbl1q_u8(senddesc23_w1, shuf_mask5);
+
+			/* Extract outer and inner header ol_flags */
+			const uint64x2_t oi_cksum_mask = {
+				0x1CF0020000000000,
+				0x1CF0020000000000,
+			};
+
+			xtmp128 = vandq_u64(xtmp128, oi_cksum_mask);
+			ytmp128 = vandq_u64(ytmp128, oi_cksum_mask);
+
+			/* Extract OUTER_UDP_CKSUM bit 41 and
+			 * move it to bit 61
+			 */
+
+			xtmp128 = xtmp128 | vshlq_n_u64(xtmp128, 20);
+			ytmp128 = ytmp128 | vshlq_n_u64(ytmp128, 20);
+
+			/* Shift right oltype by 2 and iltype by 4
+			 * to start oltype nibble from BIT(58)
+			 * instead of BIT(56) and iltype nibble from BIT(48)
+			 * instead of BIT(52).
+			 */
+			const int8x16_t tshft5 = {
+				8, 8, 8, 8, 8, 8, -4, -2,
+				8, 8, 8, 8, 8, 8, -4, -2,
+			};
+
+			xtmp128 = vshlq_u8(xtmp128, tshft5);
+			ytmp128 = vshlq_u8(ytmp128, tshft5);
+			/*
+			 * E(32):L3_LEN(8):L2_LEN(8):OL3_LEN(8):OL2_LEN(8)
+			 * E(32):L3_LEN(8):L2_LEN(8):OL3_LEN(8):OL2_LEN(8)
+			 */
+			const int8x16_t tshft3 = {
+				-1, 0, -1, 0, 0, 0, 0, 0,
+				-1, 0, -1, 0, 0, 0, 0, 0,
+			};
+
+			senddesc01_w1 = vshlq_u8(senddesc01_w1, tshft3);
+			senddesc23_w1 = vshlq_u8(senddesc23_w1, tshft3);
+
+			/* Mark Bit(4) of oltype */
+			const uint64x2_t oi_cksum_mask2 = {
+				0x1000000000000000,
+				0x1000000000000000,
+			};
+
+			xtmp128 = vorrq_u64(xtmp128, oi_cksum_mask2);
+			ytmp128 = vorrq_u64(ytmp128, oi_cksum_mask2);
+
+			/* Do the lookup */
+			ltypes01 = vqtbl2q_u8(tbl, xtmp128);
+			ltypes23 = vqtbl2q_u8(tbl, ytmp128);
+
+			/* Pick only relevant fields i.e Bit 48:55 of iltype and
+			 * Bit 56:63 of oltype and place it in corresponding
+			 * place in senddesc_w1.
+			 */
+			const uint8x16_t shuf_mask0 = {
+				0xFF, 0xFF, 0xFF, 0xFF, 0x7, 0x6, 0xFF, 0xFF,
+				0xFF, 0xFF, 0xFF, 0xFF, 0xF, 0xE, 0xFF, 0xFF,
+			};
+
+			ltypes01 = vqtbl1q_u8(ltypes01, shuf_mask0);
+			ltypes23 = vqtbl1q_u8(ltypes23, shuf_mask0);
+
+			/* Prepare l4ptr, l3ptr, ol4ptr, ol3ptr from
+			 * l3len, l2len, ol3len, ol2len.
+			 * a [E(32):L3(8):L2(8):OL3(8):OL2(8)]
+			 * a = a + (a << 8)
+			 * a [E:(L3+L2):(L2+OL3):(OL3+OL2):OL2]
+			 * a = a + (a << 16)
+			 * a [E:(L3+L2+OL3+OL2):(L2+OL3+OL2):(OL3+OL2):OL2]
+			 * => E(32):IL4PTR(8):IL3PTR(8):OL4PTR(8):OL3PTR(8)
+			 */
+			senddesc01_w1 = vaddq_u8(senddesc01_w1, vshlq_n_u32(senddesc01_w1, 8));
+			senddesc23_w1 = vaddq_u8(senddesc23_w1, vshlq_n_u32(senddesc23_w1, 8));
+
+			/* Continue preparing l4ptr, l3ptr, ol4ptr, ol3ptr */
+			senddesc01_w1 = vaddq_u8(senddesc01_w1, vshlq_n_u32(senddesc01_w1, 16));
+			senddesc23_w1 = vaddq_u8(senddesc23_w1, vshlq_n_u32(senddesc23_w1, 16));
+
+			/* Move ltypes to senddesc*_w1 */
+			senddesc01_w1 = vorrq_u64(senddesc01_w1, ltypes01);
+			senddesc23_w1 = vorrq_u64(senddesc23_w1, ltypes23);
+		}
+
+		xmask01 = vdupq_n_u64(0);
+		xmask23 = xmask01;
+		asm volatile("LD1 {%[a].H}[0],[%[in]]\n\t"
+			     : [a] "+w"(xmask01)
+			     : [in] "r"(mbuf0)
+			     : "memory");
+
+		asm volatile("LD1 {%[a].H}[4],[%[in]]\n\t"
+			     : [a] "+w"(xmask01)
+			     : [in] "r"(mbuf1)
+			     : "memory");
+
+		asm volatile("LD1 {%[b].H}[0],[%[in]]\n\t"
+			     : [b] "+w"(xmask23)
+			     : [in] "r"(mbuf2)
+			     : "memory");
+
+		asm volatile("LD1 {%[b].H}[4],[%[in]]\n\t"
+			     : [b] "+w"(xmask23)
+			     : [in] "r"(mbuf3)
+			     : "memory");
+		xmask01 = vshlq_n_u64(xmask01, 20);
+		xmask23 = vshlq_n_u64(xmask23, 20);
+
+		senddesc01_w0 = vorrq_u64(senddesc01_w0, xmask01);
+		senddesc23_w0 = vorrq_u64(senddesc23_w0, xmask23);
+
+		if (flags & NIX_TX_OFFLOAD_VLAN_QINQ_F) {
+			/* Tx ol_flag for vlan. */
+			const uint64x2_t olv = {RTE_MBUF_F_TX_VLAN, RTE_MBUF_F_TX_VLAN};
+			/* Bit enable for VLAN1 */
+			const uint64x2_t mlv = {BIT_ULL(49), BIT_ULL(49)};
+			/* Tx ol_flag for QnQ. */
+			const uint64x2_t olq = {RTE_MBUF_F_TX_QINQ, RTE_MBUF_F_TX_QINQ};
+			/* Bit enable for VLAN0 */
+			const uint64x2_t mlq = {BIT_ULL(48), BIT_ULL(48)};
+			/* Load vlan values from packet. outer is VLAN 0 */
+			uint64x2_t ext01 = {
+				((uint32_t)tx_pkts[0]->vlan_tci_outer) << 8 |
+					((uint64_t)tx_pkts[0]->vlan_tci) << 32,
+				((uint32_t)tx_pkts[1]->vlan_tci_outer) << 8 |
+					((uint64_t)tx_pkts[1]->vlan_tci) << 32,
+			};
+			uint64x2_t ext23 = {
+				((uint32_t)tx_pkts[2]->vlan_tci_outer) << 8 |
+					((uint64_t)tx_pkts[2]->vlan_tci) << 32,
+				((uint32_t)tx_pkts[3]->vlan_tci_outer) << 8 |
+					((uint64_t)tx_pkts[3]->vlan_tci) << 32,
+			};
+
+			/* Get ol_flags of the packets. */
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			/* ORR vlan outer/inner values into cmd. */
+			sendext01_w1 = vorrq_u64(sendext01_w1, ext01);
+			sendext23_w1 = vorrq_u64(sendext23_w1, ext23);
+
+			/* Test for offload enable bits and generate masks. */
+			xtmp128 = vorrq_u64(vandq_u64(vtstq_u64(xtmp128, olv), mlv),
+					    vandq_u64(vtstq_u64(xtmp128, olq), mlq));
+			ytmp128 = vorrq_u64(vandq_u64(vtstq_u64(ytmp128, olv), mlv),
+					    vandq_u64(vtstq_u64(ytmp128, olq), mlq));
+
+			/* Set vlan enable bits into cmd based on mask. */
+			sendext01_w1 = vorrq_u64(sendext01_w1, xtmp128);
+			sendext23_w1 = vorrq_u64(sendext23_w1, ytmp128);
+		}
+
+		if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+			/* Tx ol_flag for timestamp. */
+			const uint64x2_t olf = {RTE_MBUF_F_TX_IEEE1588_TMST,
+						RTE_MBUF_F_TX_IEEE1588_TMST};
+			/* Set send mem alg to SUB. */
+			const uint64x2_t alg = {BIT_ULL(59), BIT_ULL(59)};
+			/* Increment send mem address by 8. */
+			const uint64x2_t addr = {0x8, 0x8};
+
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			/* Check if timestamp is requested and generate inverted
+			 * mask as we need not make any changes to default cmd
+			 * value.
+			 */
+			xtmp128 = vmvnq_u32(vtstq_u64(olf, xtmp128));
+			ytmp128 = vmvnq_u32(vtstq_u64(olf, ytmp128));
+
+			/* Change send mem address to an 8 byte offset when
+			 * TSTMP is disabled.
+			 */
+			sendmem01_w1 = vaddq_u64(sendmem01_w1, vandq_u64(xtmp128, addr));
+			sendmem23_w1 = vaddq_u64(sendmem23_w1, vandq_u64(ytmp128, addr));
+			/* Change send mem alg to SUB when TSTMP is disabled. */
+			sendmem01_w0 = vorrq_u64(sendmem01_w0, vandq_u64(xtmp128, alg));
+			sendmem23_w0 = vorrq_u64(sendmem23_w0, vandq_u64(ytmp128, alg));
+
+			cmd3[0] = vzip1q_u64(sendmem01_w0, sendmem01_w1);
+			cmd3[1] = vzip2q_u64(sendmem01_w0, sendmem01_w1);
+			cmd3[2] = vzip1q_u64(sendmem23_w0, sendmem23_w1);
+			cmd3[3] = vzip2q_u64(sendmem23_w0, sendmem23_w1);
+		}
+
+		if ((flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) && !(flags & NIX_TX_OFFLOAD_SECURITY_F)) {
+			/* Set don't free bit if reference count > 1 */
+			cn20k_nix_prefree_seg_vec(tx_pkts, &extm, txq, &senddesc01_w0,
+						  &senddesc23_w0, &senddesc01_w1, &senddesc23_w1);
+		} else if (!(flags & NIX_TX_MULTI_SEG_F) && !(flags & NIX_TX_OFFLOAD_SECURITY_F)) {
+			/* Move mbufs to iova */
+			mbuf0 = (uint64_t *)tx_pkts[0];
+			mbuf1 = (uint64_t *)tx_pkts[1];
+			mbuf2 = (uint64_t *)tx_pkts[2];
+			mbuf3 = (uint64_t *)tx_pkts[3];
+
+			/* Mark mempool object as "put" since
+			 * it is freed by NIX
+			 */
+			RTE_MEMPOOL_CHECK_COOKIES(((struct rte_mbuf *)mbuf0)->pool, (void **)&mbuf0,
+						  1, 0);
+
+			RTE_MEMPOOL_CHECK_COOKIES(((struct rte_mbuf *)mbuf1)->pool, (void **)&mbuf1,
+						  1, 0);
+
+			RTE_MEMPOOL_CHECK_COOKIES(((struct rte_mbuf *)mbuf2)->pool, (void **)&mbuf2,
+						  1, 0);
+
+			RTE_MEMPOOL_CHECK_COOKIES(((struct rte_mbuf *)mbuf3)->pool, (void **)&mbuf3,
+						  1, 0);
+		}
+
+		/* Create 4W cmd for 4 mbufs (sendhdr, sgdesc) */
+		cmd0[0] = vzip1q_u64(senddesc01_w0, senddesc01_w1);
+		cmd0[1] = vzip2q_u64(senddesc01_w0, senddesc01_w1);
+		cmd0[2] = vzip1q_u64(senddesc23_w0, senddesc23_w1);
+		cmd0[3] = vzip2q_u64(senddesc23_w0, senddesc23_w1);
+
+		cmd1[0] = vzip1q_u64(sgdesc01_w0, sgdesc01_w1);
+		cmd1[1] = vzip2q_u64(sgdesc01_w0, sgdesc01_w1);
+		cmd1[2] = vzip1q_u64(sgdesc23_w0, sgdesc23_w1);
+		cmd1[3] = vzip2q_u64(sgdesc23_w0, sgdesc23_w1);
+
+		if (flags & NIX_TX_NEED_EXT_HDR) {
+			cmd2[0] = vzip1q_u64(sendext01_w0, sendext01_w1);
+			cmd2[1] = vzip2q_u64(sendext01_w0, sendext01_w1);
+			cmd2[2] = vzip1q_u64(sendext23_w0, sendext23_w1);
+			cmd2[3] = vzip2q_u64(sendext23_w0, sendext23_w1);
+		}
+
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+			const uint64x2_t olf = {RTE_MBUF_F_TX_SEC_OFFLOAD,
+						RTE_MBUF_F_TX_SEC_OFFLOAD};
+			uintptr_t next;
+			uint8_t dw;
+
+			/* Extract ol_flags. */
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			xtmp128 = vtstq_u64(olf, xtmp128);
+			ytmp128 = vtstq_u64(olf, ytmp128);
+
+			/* Process mbuf0 */
+			dw = cn20k_nix_tx_dwords(flags, segdw[0]);
+			if (vgetq_lane_u64(xtmp128, 0))
+				cn20k_nix_prep_sec_vec(tx_pkts[0], &cmd0[0], &cmd1[0], &next,
+						       c_laddr, &c_lnum, &c_loff, &c_shft, sa_base,
+						       flags);
+			else
+				cn20k_nix_lmt_next(dw, laddr, &lnum, &loff, &shift, &wd.data128,
+						   &next);
+
+			/* Store mbuf0 to LMTLINE/CPT NIXTX area */
+			cn20k_nix_xmit_store(txq, tx_pkts[0], &extm, segdw[0], next, cmd0[0],
+					     cmd1[0], cmd2[0], cmd3[0], flags);
+
+			/* Process mbuf1 */
+			dw = cn20k_nix_tx_dwords(flags, segdw[1]);
+			if (vgetq_lane_u64(xtmp128, 1))
+				cn20k_nix_prep_sec_vec(tx_pkts[1], &cmd0[1], &cmd1[1], &next,
+						       c_laddr, &c_lnum, &c_loff, &c_shft, sa_base,
+						       flags);
+			else
+				cn20k_nix_lmt_next(dw, laddr, &lnum, &loff, &shift, &wd.data128,
+						   &next);
+
+			/* Store mbuf1 to LMTLINE/CPT NIXTX area */
+			cn20k_nix_xmit_store(txq, tx_pkts[1], &extm, segdw[1], next, cmd0[1],
+					     cmd1[1], cmd2[1], cmd3[1], flags);
+
+			/* Process mbuf2 */
+			dw = cn20k_nix_tx_dwords(flags, segdw[2]);
+			if (vgetq_lane_u64(ytmp128, 0))
+				cn20k_nix_prep_sec_vec(tx_pkts[2], &cmd0[2], &cmd1[2], &next,
+						       c_laddr, &c_lnum, &c_loff, &c_shft, sa_base,
+						       flags);
+			else
+				cn20k_nix_lmt_next(dw, laddr, &lnum, &loff, &shift, &wd.data128,
+						   &next);
+
+			/* Store mbuf2 to LMTLINE/CPT NIXTX area */
+			cn20k_nix_xmit_store(txq, tx_pkts[2], &extm, segdw[2], next, cmd0[2],
+					     cmd1[2], cmd2[2], cmd3[2], flags);
+
+			/* Process mbuf3 */
+			dw = cn20k_nix_tx_dwords(flags, segdw[3]);
+			if (vgetq_lane_u64(ytmp128, 1))
+				cn20k_nix_prep_sec_vec(tx_pkts[3], &cmd0[3], &cmd1[3], &next,
+						       c_laddr, &c_lnum, &c_loff, &c_shft, sa_base,
+						       flags);
+			else
+				cn20k_nix_lmt_next(dw, laddr, &lnum, &loff, &shift, &wd.data128,
+						   &next);
+
+			/* Store mbuf3 to LMTLINE/CPT NIXTX area */
+			cn20k_nix_xmit_store(txq, tx_pkts[3], &extm, segdw[3], next, cmd0[3],
+					     cmd1[3], cmd2[3], cmd3[3], flags);
+
+		} else if (flags & NIX_TX_NEED_EXT_HDR) {
+			/* Store the prepared send desc to LMT lines */
+			if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+				vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd2[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd1[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd3[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd0[1]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd2[1]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 96), cmd1[1]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 112), cmd3[1]);
+				lnum += 1;
+				vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd2[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd1[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd3[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd0[3]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd2[3]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 96), cmd1[3]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 112), cmd3[3]);
+			} else {
+				vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd2[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd1[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd0[1]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd2[1]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd1[1]);
+				lnum += 1;
+				vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd2[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd1[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd0[3]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd2[3]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd1[3]);
+			}
+			lnum += 1;
+		} else {
+			/* Store the prepared send desc to LMT lines */
+			vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[0]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd1[0]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd0[1]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd1[1]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd0[2]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd1[2]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 96), cmd0[3]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 112), cmd1[3]);
+			lnum += 1;
+		}
+
+		tx_pkts = tx_pkts + NIX_DESCS_PER_LOOP;
+	}
+
+	/* Roundup lnum to last line if it is partial */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		lnum = lnum + !!loff;
+		wd.data128 = wd.data128 | (((__uint128_t)(((loff >> 4) - 1) & 0x7) << shift));
+	}
+
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+		wd.data[0] >>= 16;
+
+	if ((flags & NIX_TX_VWQE_F) && !(ws[3] & BIT_ULL(35)))
+		ws[3] = roc_sso_hws_head_wait(ws[0]);
+
+	left -= burst;
+
+	/* Submit CPT instructions if any */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		uint16_t sec_pkts = (c_lnum << 1) + c_loff;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, sec_pkts);
+		cn20k_nix_sec_fc_wait(txq, sec_pkts);
+		cn20k_nix_sec_steorl(c_io_addr, c_lmt_id, c_lnum, c_loff, c_shft);
+	}
+
+	/* Trigger LMTST */
+	if (lnum > 16) {
+		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F))
+			wd.data[0] = cn20k_nix_tx_steor_vec_data(flags);
+
+		pa = io_addr | (wd.data[0] & 0x7) << 4;
+		wd.data[0] &= ~0x7ULL;
+
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+			wd.data[0] <<= 16;
+
+		wd.data[0] |= (15ULL << 12);
+		wd.data[0] |= (uint64_t)lmt_id;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, cn20k_nix_pkts_per_vec_brst(flags) >> 1);
+		/* STEOR0 */
+		roc_lmt_submit_steorl(wd.data[0], pa);
+
+		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F))
+			wd.data[1] = cn20k_nix_tx_steor_vec_data(flags);
+
+		pa = io_addr | (wd.data[1] & 0x7) << 4;
+		wd.data[1] &= ~0x7ULL;
+
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+			wd.data[1] <<= 16;
+
+		wd.data[1] |= ((uint64_t)(lnum - 17)) << 12;
+		wd.data[1] |= (uint64_t)(lmt_id + 16);
+
+		if (flags & NIX_TX_VWQE_F) {
+			cn20k_nix_vwqe_wait_fc(txq,
+					       burst - (cn20k_nix_pkts_per_vec_brst(flags) >> 1));
+		}
+		/* STEOR1 */
+		roc_lmt_submit_steorl(wd.data[1], pa);
+	} else if (lnum) {
+		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F))
+			wd.data[0] = cn20k_nix_tx_steor_vec_data(flags);
+
+		pa = io_addr | (wd.data[0] & 0x7) << 4;
+		wd.data[0] &= ~0x7ULL;
+
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+			wd.data[0] <<= 16;
+
+		wd.data[0] |= ((uint64_t)(lnum - 1)) << 12;
+		wd.data[0] |= (uint64_t)lmt_id;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, burst);
+		/* STEOR0 */
+		roc_lmt_submit_steorl(wd.data[0], pa);
+	}
+
+	rte_io_wmb();
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && !txq->tx_compl.ena) {
+		cn20k_nix_free_extmbuf(extm);
+		extm = NULL;
+	}
+
+	if (left)
+		goto again;
+
+	if (unlikely(scalar))
+		pkts += cn20k_nix_xmit_pkts(tx_queue, ws, tx_pkts, scalar, cmd, flags);
+	return pkts;
+}
+
+#else
+static __rte_always_inline uint16_t
+cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, uint16_t pkts,
+			   uint64_t *cmd, const uint16_t flags)
+{
+	RTE_SET_USED(ws);
+	RTE_SET_USED(tx_queue);
+	RTE_SET_USED(tx_pkts);
+	RTE_SET_USED(pkts);
+	RTE_SET_USED(cmd);
+	RTE_SET_USED(flags);
+	return 0;
+}
+#endif
+
 #define L3L4CSUM_F   NIX_TX_OFFLOAD_L3_L4_CSUM_F
 #define OL3OL4CSUM_F NIX_TX_OFFLOAD_OL3_OL4_CSUM_F
 #define VLAN_F	     NIX_TX_OFFLOAD_VLAN_QINQ_F
@@ -1567,10 +3003,11 @@ NIX_TX_FASTPATH_MODES
 	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
 					     uint16_t pkts)                                        \
 	{                                                                                          \
-		RTE_SET_USED(tx_queue);                                                            \
-		RTE_SET_USED(tx_pkts);                                                             \
-		RTE_SET_USED(pkts);                                                                \
-		return 0;                                                                          \
+		uint64_t cmd[sz];                                                                  \
+		/* For TSO inner checksum is a must */                                             \
+		if (((flags) & NIX_TX_OFFLOAD_TSO_F) && !((flags) & NIX_TX_OFFLOAD_L3_L4_CSUM_F))  \
+			return 0;                                                                  \
+		return cn20k_nix_xmit_pkts_vector(tx_queue, NULL, tx_pkts, pkts, cmd, (flags));    \
 	}
 
 #define NIX_TX_XMIT_VEC_MSEG(fn, sz, flags)                                                        \
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 26/33] net/cnxk: support Tx multi-seg in vector for cn20k
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (24 preceding siblings ...)
  2024-09-10  8:59 ` [PATCH 25/33] net/cnxk: support Tx burst vector for cn20k Nithin Dabilpuram
@ 2024-09-10  8:59 ` Nithin Dabilpuram
  2024-09-10  8:59 ` [PATCH 27/33] common/cnxk: add flush wait after write of inline ctx Nithin Dabilpuram
                   ` (9 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:59 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rahul Bhansali, Pavan Nikhilesh

Add Tx multi-seg support for cn20k.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 drivers/net/cnxk/cn20k_tx.h | 485 ++++++++++++++++++++++++++++++++++--
 1 file changed, 463 insertions(+), 22 deletions(-)

diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
index 05c8b80fcb..9b6a2e62bd 100644
--- a/drivers/net/cnxk/cn20k_tx.h
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -1717,8 +1717,301 @@ cn20k_nix_xmit_pkts_mseg(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts
 
 #if defined(RTE_ARCH_ARM64)
 
+static __rte_always_inline void
+cn20k_nix_prepare_tso(struct rte_mbuf *m, union nix_send_hdr_w1_u *w1, union nix_send_ext_w0_u *w0,
+		      uint64_t ol_flags, const uint64_t flags, const uint64_t lso_tun_fmt)
+{
+	uint16_t lso_sb;
+	uint64_t mask;
+
+	if (!(ol_flags & RTE_MBUF_F_TX_TCP_SEG))
+		return;
+
+	mask = -(!w1->il3type);
+	lso_sb = (mask & w1->ol4ptr) + (~mask & w1->il4ptr) + m->l4_len;
+
+	w0->u |= BIT(14);
+	w0->lso_sb = lso_sb;
+	w0->lso_mps = m->tso_segsz;
+	w0->lso_format = NIX_LSO_FORMAT_IDX_TSOV4 + !!(ol_flags & RTE_MBUF_F_TX_IPV6);
+	w1->ol4type = NIX_SENDL4TYPE_TCP_CKSUM;
+
+	/* Handle tunnel tso */
+	if ((flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) && (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)) {
+		const uint8_t is_udp_tun = (CNXK_NIX_UDP_TUN_BITMASK >>
+					    ((ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) >> 45)) &
+					   0x1;
+		uint8_t shift = is_udp_tun ? 32 : 0;
+
+		shift += (!!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV6) << 4);
+		shift += (!!(ol_flags & RTE_MBUF_F_TX_IPV6) << 3);
+
+		w1->il4type = NIX_SENDL4TYPE_TCP_CKSUM;
+		w1->ol4type = is_udp_tun ? NIX_SENDL4TYPE_UDP_CKSUM : 0;
+		/* Update format for UDP tunneled packet */
+
+		w0->lso_format = (lso_tun_fmt >> shift);
+	}
+}
+
+static __rte_always_inline uint16_t
+cn20k_nix_prepare_mseg_vec_noff(struct cn20k_eth_txq *txq, struct rte_mbuf *m,
+				struct rte_mbuf **extm, uint64_t *cmd, uint64x2_t *cmd0,
+				uint64x2_t *cmd1, uint64x2_t *cmd2, uint64x2_t *cmd3,
+				const uint32_t flags)
+{
+	uint16_t segdw;
+
+	vst1q_u64(cmd, *cmd0); /* Send hdr */
+	if (flags & NIX_TX_NEED_EXT_HDR) {
+		vst1q_u64(cmd + 2, *cmd2); /* ext hdr */
+		vst1q_u64(cmd + 4, *cmd1); /* sg */
+	} else {
+		vst1q_u64(cmd + 2, *cmd1); /* sg */
+	}
+
+	segdw = cn20k_nix_prepare_mseg(txq, m, extm, cmd, flags);
+
+	if (flags & NIX_TX_OFFLOAD_TSTAMP_F)
+		vst1q_u64(cmd + segdw * 2 - 2, *cmd3);
+
+	return segdw;
+}
+
+static __rte_always_inline void
+cn20k_nix_prepare_mseg_vec_list(struct rte_mbuf *m, uint64_t *cmd, union nix_send_hdr_w0_u *sh,
+				union nix_send_sg_s *sg, const uint32_t flags)
+{
+	struct rte_mbuf *m_next;
+	uint64_t ol_flags, len;
+	uint64_t *slist, sg_u;
+	uint16_t nb_segs;
+	uint64_t dlen;
+	int i = 1;
+
+	len = m->pkt_len;
+	ol_flags = m->ol_flags;
+	/* For security we would have already populated the right length */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F && ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD)
+		len = sh->total;
+	sh->total = len;
+	/* Clear sg->u header before use */
+	sg->u &= 0xFC00000000000000;
+	sg_u = sg->u;
+	slist = &cmd[0];
+
+	dlen = m->data_len;
+	len -= dlen;
+	sg_u = sg_u | ((uint64_t)dlen);
+
+	/* Mark mempool object as "put" since it is freed by NIX */
+	RTE_MEMPOOL_CHECK_COOKIES(m->pool, (void **)&m, 1, 0);
+
+	nb_segs = m->nb_segs - 1;
+	m_next = m->next;
+	m->next = NULL;
+	m->nb_segs = 1;
+	m = m_next;
+	/* Fill mbuf segments */
+	do {
+		m_next = m->next;
+		dlen = m->data_len;
+		len -= dlen;
+		sg_u = sg_u | ((uint64_t)dlen << (i << 4));
+		*slist = rte_mbuf_data_iova(m);
+		slist++;
+		i++;
+		nb_segs--;
+		if (i > 2 && nb_segs) {
+			i = 0;
+			/* Next SG subdesc */
+			*(uint64_t *)slist = sg_u & 0xFC00000000000000;
+			sg->u = sg_u;
+			sg->segs = 3;
+			sg = (union nix_send_sg_s *)slist;
+			sg_u = sg->u;
+			slist++;
+		}
+		m->next = NULL;
+		/* Mark mempool object as "put" since it is freed by NIX */
+		RTE_MEMPOOL_CHECK_COOKIES(m->pool, (void **)&m, 1, 0);
+
+		m = m_next;
+	} while (nb_segs);
+
+	/* Add remaining bytes of security data to last seg */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F && ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD && len) {
+		uint8_t shft = ((i - 1) << 4);
+
+		dlen = ((sg_u >> shft) & 0xFFFF) + len;
+		sg_u = sg_u & ~(0xFFFFULL << shft);
+		sg_u |= dlen << shft;
+	}
+	sg->u = sg_u;
+	sg->segs = i;
+}
+
+static __rte_always_inline void
+cn20k_nix_prepare_mseg_vec(struct rte_mbuf *m, uint64_t *cmd, uint64x2_t *cmd0, uint64x2_t *cmd1,
+			   const uint8_t segdw, const uint32_t flags)
+{
+	union nix_send_hdr_w0_u sh;
+	union nix_send_sg_s sg;
+
+	if (m->nb_segs == 1) {
+		/* Mark mempool object as "put" since it is freed by NIX */
+		RTE_MEMPOOL_CHECK_COOKIES(m->pool, (void **)&m, 1, 0);
+		return;
+	}
+
+	sh.u = vgetq_lane_u64(cmd0[0], 0);
+	sg.u = vgetq_lane_u64(cmd1[0], 0);
+
+	cn20k_nix_prepare_mseg_vec_list(m, cmd, &sh, &sg, flags);
+
+	sh.sizem1 = segdw - 1;
+	cmd0[0] = vsetq_lane_u64(sh.u, cmd0[0], 0);
+	cmd1[0] = vsetq_lane_u64(sg.u, cmd1[0], 0);
+}
+
 #define NIX_DESCS_PER_LOOP 4
 
+static __rte_always_inline uint8_t
+cn20k_nix_prep_lmt_mseg_vector(struct cn20k_eth_txq *txq, struct rte_mbuf **mbufs,
+			       struct rte_mbuf **extm, uint64x2_t *cmd0, uint64x2_t *cmd1,
+			       uint64x2_t *cmd2, uint64x2_t *cmd3, uint8_t *segdw,
+			       uint64_t *lmt_addr, __uint128_t *data128, uint8_t *shift,
+			       const uint16_t flags)
+{
+	uint8_t j, off, lmt_used = 0;
+
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) {
+		off = 0;
+		for (j = 0; j < NIX_DESCS_PER_LOOP; j++) {
+			if (off + segdw[j] > 8) {
+				*data128 |= ((__uint128_t)off - 1) << *shift;
+				*shift += 3;
+				lmt_used++;
+				lmt_addr += 16;
+				off = 0;
+			}
+			off += cn20k_nix_prepare_mseg_vec_noff(txq, mbufs[j], extm,
+							       lmt_addr + off * 2, &cmd0[j],
+							       &cmd1[j], &cmd2[j], &cmd3[j], flags);
+		}
+		*data128 |= ((__uint128_t)off - 1) << *shift;
+		*shift += 3;
+		lmt_used++;
+		return lmt_used;
+	}
+
+	if (!(flags & NIX_TX_NEED_EXT_HDR) && !(flags & NIX_TX_OFFLOAD_TSTAMP_F)) {
+		/* No segments in 4 consecutive packets. */
+		if ((segdw[0] + segdw[1] + segdw[2] + segdw[3]) <= 8) {
+			vst1q_u64(lmt_addr, cmd0[0]);
+			vst1q_u64(lmt_addr + 2, cmd1[0]);
+			vst1q_u64(lmt_addr + 4, cmd0[1]);
+			vst1q_u64(lmt_addr + 6, cmd1[1]);
+			vst1q_u64(lmt_addr + 8, cmd0[2]);
+			vst1q_u64(lmt_addr + 10, cmd1[2]);
+			vst1q_u64(lmt_addr + 12, cmd0[3]);
+			vst1q_u64(lmt_addr + 14, cmd1[3]);
+
+			*data128 |= ((__uint128_t)7) << *shift;
+			*shift += 3;
+
+			/* Mark mempool object as "put" since it is freed by NIX */
+			RTE_MEMPOOL_CHECK_COOKIES(mbufs[0]->pool, (void **)&mbufs[0], 1, 0);
+			RTE_MEMPOOL_CHECK_COOKIES(mbufs[1]->pool, (void **)&mbufs[1], 1, 0);
+			RTE_MEMPOOL_CHECK_COOKIES(mbufs[2]->pool, (void **)&mbufs[2], 1, 0);
+			RTE_MEMPOOL_CHECK_COOKIES(mbufs[3]->pool, (void **)&mbufs[3], 1, 0);
+			return 1;
+		}
+	}
+
+	for (j = 0; j < NIX_DESCS_PER_LOOP;) {
+		/* Fit consecutive packets in same LMTLINE. */
+		if ((segdw[j] + segdw[j + 1]) <= 8) {
+			if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+				/* TSTAMP takes 4 each, no segs. */
+				vst1q_u64(lmt_addr, cmd0[j]);
+				vst1q_u64(lmt_addr + 2, cmd2[j]);
+				vst1q_u64(lmt_addr + 4, cmd1[j]);
+				vst1q_u64(lmt_addr + 6, cmd3[j]);
+
+				vst1q_u64(lmt_addr + 8, cmd0[j + 1]);
+				vst1q_u64(lmt_addr + 10, cmd2[j + 1]);
+				vst1q_u64(lmt_addr + 12, cmd1[j + 1]);
+				vst1q_u64(lmt_addr + 14, cmd3[j + 1]);
+
+				/* Mark mempool object as "put" since it is freed by NIX */
+				RTE_MEMPOOL_CHECK_COOKIES(mbufs[j]->pool, (void **)&mbufs[j], 1, 0);
+				RTE_MEMPOOL_CHECK_COOKIES(mbufs[j + 1]->pool,
+							  (void **)&mbufs[j + 1], 1, 0);
+			} else if (flags & NIX_TX_NEED_EXT_HDR) {
+				/* EXT header take 3 each, space for 2 segs.*/
+				cn20k_nix_prepare_mseg_vec(mbufs[j], lmt_addr + 6, &cmd0[j],
+							   &cmd1[j], segdw[j], flags);
+				vst1q_u64(lmt_addr, cmd0[j]);
+				vst1q_u64(lmt_addr + 2, cmd2[j]);
+				vst1q_u64(lmt_addr + 4, cmd1[j]);
+				off = segdw[j] - 3;
+				off <<= 1;
+				cn20k_nix_prepare_mseg_vec(mbufs[j + 1], lmt_addr + 12 + off,
+							   &cmd0[j + 1], &cmd1[j + 1], segdw[j + 1],
+							   flags);
+				vst1q_u64(lmt_addr + 6 + off, cmd0[j + 1]);
+				vst1q_u64(lmt_addr + 8 + off, cmd2[j + 1]);
+				vst1q_u64(lmt_addr + 10 + off, cmd1[j + 1]);
+			} else {
+				cn20k_nix_prepare_mseg_vec(mbufs[j], lmt_addr + 4, &cmd0[j],
+							   &cmd1[j], segdw[j], flags);
+				vst1q_u64(lmt_addr, cmd0[j]);
+				vst1q_u64(lmt_addr + 2, cmd1[j]);
+				off = segdw[j] - 2;
+				off <<= 1;
+				cn20k_nix_prepare_mseg_vec(mbufs[j + 1], lmt_addr + 8 + off,
+							   &cmd0[j + 1], &cmd1[j + 1], segdw[j + 1],
+							   flags);
+				vst1q_u64(lmt_addr + 4 + off, cmd0[j + 1]);
+				vst1q_u64(lmt_addr + 6 + off, cmd1[j + 1]);
+			}
+			*data128 |= ((__uint128_t)(segdw[j] + segdw[j + 1]) - 1) << *shift;
+			*shift += 3;
+			j += 2;
+		} else {
+			if ((flags & NIX_TX_NEED_EXT_HDR) && (flags & NIX_TX_OFFLOAD_TSTAMP_F)) {
+				cn20k_nix_prepare_mseg_vec(mbufs[j], lmt_addr + 6, &cmd0[j],
+							   &cmd1[j], segdw[j], flags);
+				vst1q_u64(lmt_addr, cmd0[j]);
+				vst1q_u64(lmt_addr + 2, cmd2[j]);
+				vst1q_u64(lmt_addr + 4, cmd1[j]);
+				off = segdw[j] - 4;
+				off <<= 1;
+				vst1q_u64(lmt_addr + 6 + off, cmd3[j]);
+			} else if (flags & NIX_TX_NEED_EXT_HDR) {
+				cn20k_nix_prepare_mseg_vec(mbufs[j], lmt_addr + 6, &cmd0[j],
+							   &cmd1[j], segdw[j], flags);
+				vst1q_u64(lmt_addr, cmd0[j]);
+				vst1q_u64(lmt_addr + 2, cmd2[j]);
+				vst1q_u64(lmt_addr + 4, cmd1[j]);
+			} else {
+				cn20k_nix_prepare_mseg_vec(mbufs[j], lmt_addr + 4, &cmd0[j],
+							   &cmd1[j], segdw[j], flags);
+				vst1q_u64(lmt_addr, cmd0[j]);
+				vst1q_u64(lmt_addr + 2, cmd1[j]);
+			}
+			*data128 |= ((__uint128_t)(segdw[j]) - 1) << *shift;
+			*shift += 3;
+			j++;
+		}
+		lmt_used++;
+		lmt_addr += 16;
+	}
+
+	return lmt_used;
+}
+
 static __rte_always_inline void
 cn20k_nix_lmt_next(uint8_t dw, uintptr_t laddr, uint8_t *lnum, uint8_t *loff, uint8_t *shift,
 		   __uint128_t *data128, uintptr_t *next)
@@ -1740,12 +2033,36 @@ cn20k_nix_xmit_store(struct cn20k_eth_txq *txq, struct rte_mbuf *mbuf, struct rt
 		     uint8_t segdw, uintptr_t laddr, uint64x2_t cmd0, uint64x2_t cmd1,
 		     uint64x2_t cmd2, uint64x2_t cmd3, const uint16_t flags)
 {
-	RTE_SET_USED(txq);
-	RTE_SET_USED(mbuf);
-	RTE_SET_USED(extm);
-	RTE_SET_USED(segdw);
+	uint8_t off;
 
-	if (flags & NIX_TX_NEED_EXT_HDR) {
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) {
+		cn20k_nix_prepare_mseg_vec_noff(txq, mbuf, extm, LMT_OFF(laddr, 0, 0), &cmd0, &cmd1,
+						&cmd2, &cmd3, flags);
+		return;
+	}
+	if (flags & NIX_TX_MULTI_SEG_F) {
+		if ((flags & NIX_TX_NEED_EXT_HDR) && (flags & NIX_TX_OFFLOAD_TSTAMP_F)) {
+			cn20k_nix_prepare_mseg_vec(mbuf, LMT_OFF(laddr, 0, 48), &cmd0, &cmd1, segdw,
+						   flags);
+			vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
+			vst1q_u64(LMT_OFF(laddr, 0, 16), cmd2);
+			vst1q_u64(LMT_OFF(laddr, 0, 32), cmd1);
+			off = segdw - 4;
+			off <<= 4;
+			vst1q_u64(LMT_OFF(laddr, 0, 48 + off), cmd3);
+		} else if (flags & NIX_TX_NEED_EXT_HDR) {
+			cn20k_nix_prepare_mseg_vec(mbuf, LMT_OFF(laddr, 0, 48), &cmd0, &cmd1, segdw,
+						   flags);
+			vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
+			vst1q_u64(LMT_OFF(laddr, 0, 16), cmd2);
+			vst1q_u64(LMT_OFF(laddr, 0, 32), cmd1);
+		} else {
+			cn20k_nix_prepare_mseg_vec(mbuf, LMT_OFF(laddr, 0, 32), &cmd0, &cmd1, segdw,
+						   flags);
+			vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
+			vst1q_u64(LMT_OFF(laddr, 0, 16), cmd1);
+		}
+	} else if (flags & NIX_TX_NEED_EXT_HDR) {
 		/* Store the prepared send desc to LMT lines */
 		if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
 			vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
@@ -1814,6 +2131,12 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 		pkts = RTE_ALIGN_FLOOR(pkts, NIX_DESCS_PER_LOOP);
 	}
 
+	/* Perform header writes before barrier for TSO */
+	if (flags & NIX_TX_OFFLOAD_TSO_F) {
+		for (i = 0; i < pkts; i++)
+			cn20k_nix_xmit_prepare_tso(tx_pkts[i], flags);
+	}
+
 	if (!(flags & NIX_TX_VWQE_F)) {
 		senddesc01_w0 = vld1q_dup_u64(&txq->send_hdr_w0);
 	} else {
@@ -1866,7 +2189,7 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 	/* Number of packets to prepare depends on offloads enabled. */
 	burst = left > cn20k_nix_pkts_per_vec_brst(flags) ? cn20k_nix_pkts_per_vec_brst(flags) :
 							    left;
-	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+	if (flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F)) {
 		wd.data128 = 0;
 		shift = 16;
 	}
@@ -1885,6 +2208,54 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 			break;
 		}
 
+		if (flags & NIX_TX_MULTI_SEG_F) {
+			uint8_t j;
+
+			for (j = 0; j < NIX_DESCS_PER_LOOP; j++) {
+				struct rte_mbuf *m = tx_pkts[j];
+
+				cn20k_nix_tx_mbuf_validate(m, flags);
+
+				/* Get dwords based on nb_segs. */
+				if (!(flags & NIX_TX_OFFLOAD_MBUF_NOFF_F &&
+				      flags & NIX_TX_MULTI_SEG_F))
+					segdw[j] = NIX_NB_SEGS_TO_SEGDW(m->nb_segs);
+				else
+					segdw[j] = cn20k_nix_mbuf_sg_dwords(m);
+
+				/* Add dwords based on offloads. */
+				segdw[j] += 1 + /* SEND HDR */
+					    !!(flags & NIX_TX_NEED_EXT_HDR) +
+					    !!(flags & NIX_TX_OFFLOAD_TSTAMP_F);
+			}
+
+			/* Check if there are enough LMTLINES for this loop.
+			 * Consider previous line to be partial.
+			 */
+			if (lnum + 4 >= 32) {
+				uint8_t ldwords_con = 0, lneeded = 0;
+
+				if ((loff >> 4) + segdw[0] > 8) {
+					lneeded += 1;
+					ldwords_con = segdw[0];
+				} else {
+					ldwords_con = (loff >> 4) + segdw[0];
+				}
+
+				for (j = 1; j < NIX_DESCS_PER_LOOP; j++) {
+					ldwords_con += segdw[j];
+					if (ldwords_con > 8) {
+						lneeded += 1;
+						ldwords_con = segdw[j];
+					}
+				}
+				lneeded += 1;
+				if (lnum + lneeded > 32) {
+					burst = i;
+					break;
+				}
+			}
+		}
 		/* Clear lower 32bit of SEND_HDR_W0 and SEND_SG_W0 */
 		senddesc01_w0 = vbicq_u64(senddesc01_w0, vdupq_n_u64(0x800FFFFFFFF));
 		sgdesc01_w0 = vbicq_u64(sgdesc01_w0, vdupq_n_u64(0xFFFFFFFF));
@@ -1907,6 +2278,12 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 			sendmem23_w1 = sendmem01_w1;
 		}
 
+		if (flags & NIX_TX_OFFLOAD_TSO_F) {
+			/* Clear the LSO enable bit. */
+			sendext01_w0 = vbicq_u64(sendext01_w0, vdupq_n_u64(BIT_ULL(14)));
+			sendext23_w0 = sendext01_w0;
+		}
+
 		/* Move mbufs to iova */
 		mbuf0 = (uint64_t *)tx_pkts[0];
 		mbuf1 = (uint64_t *)tx_pkts[1];
@@ -2512,7 +2889,49 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 			cmd3[3] = vzip2q_u64(sendmem23_w0, sendmem23_w1);
 		}
 
-		if ((flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) && !(flags & NIX_TX_OFFLOAD_SECURITY_F)) {
+		if (flags & NIX_TX_OFFLOAD_TSO_F) {
+			const uint64_t lso_fmt = txq->lso_tun_fmt;
+			uint64_t sx_w0[NIX_DESCS_PER_LOOP];
+			uint64_t sd_w1[NIX_DESCS_PER_LOOP];
+
+			/* Extract SD W1 as we need to set L4 types. */
+			vst1q_u64(sd_w1, senddesc01_w1);
+			vst1q_u64(sd_w1 + 2, senddesc23_w1);
+
+			/* Extract SX W0 as we need to set LSO fields. */
+			vst1q_u64(sx_w0, sendext01_w0);
+			vst1q_u64(sx_w0 + 2, sendext23_w0);
+
+			/* Extract ol_flags. */
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			/* Prepare individual mbufs. */
+			cn20k_nix_prepare_tso(tx_pkts[0], (union nix_send_hdr_w1_u *)&sd_w1[0],
+					      (union nix_send_ext_w0_u *)&sx_w0[0],
+					      vgetq_lane_u64(xtmp128, 0), flags, lso_fmt);
+
+			cn20k_nix_prepare_tso(tx_pkts[1], (union nix_send_hdr_w1_u *)&sd_w1[1],
+					      (union nix_send_ext_w0_u *)&sx_w0[1],
+					      vgetq_lane_u64(xtmp128, 1), flags, lso_fmt);
+
+			cn20k_nix_prepare_tso(tx_pkts[2], (union nix_send_hdr_w1_u *)&sd_w1[2],
+					      (union nix_send_ext_w0_u *)&sx_w0[2],
+					      vgetq_lane_u64(ytmp128, 0), flags, lso_fmt);
+
+			cn20k_nix_prepare_tso(tx_pkts[3], (union nix_send_hdr_w1_u *)&sd_w1[3],
+					      (union nix_send_ext_w0_u *)&sx_w0[3],
+					      vgetq_lane_u64(ytmp128, 1), flags, lso_fmt);
+
+			senddesc01_w1 = vld1q_u64(sd_w1);
+			senddesc23_w1 = vld1q_u64(sd_w1 + 2);
+
+			sendext01_w0 = vld1q_u64(sx_w0);
+			sendext23_w0 = vld1q_u64(sx_w0 + 2);
+		}
+
+		if ((flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) && !(flags & NIX_TX_MULTI_SEG_F) &&
+		    !(flags & NIX_TX_OFFLOAD_SECURITY_F)) {
 			/* Set don't free bit if reference count > 1 */
 			cn20k_nix_prefree_seg_vec(tx_pkts, &extm, txq, &senddesc01_w0,
 						  &senddesc23_w0, &senddesc01_w1, &senddesc23_w1);
@@ -2626,6 +3045,15 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 			cn20k_nix_xmit_store(txq, tx_pkts[3], &extm, segdw[3], next, cmd0[3],
 					     cmd1[3], cmd2[3], cmd3[3], flags);
 
+		} else if (flags & NIX_TX_MULTI_SEG_F) {
+			uint8_t j;
+
+			segdw[4] = 8;
+			j = cn20k_nix_prep_lmt_mseg_vector(txq, tx_pkts, &extm, cmd0, cmd1, cmd2,
+							   cmd3, segdw,
+							   (uint64_t *)LMT_OFF(laddr, lnum, 0),
+							   &wd.data128, &shift, flags);
+			lnum += j;
 		} else if (flags & NIX_TX_NEED_EXT_HDR) {
 			/* Store the prepared send desc to LMT lines */
 			if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
@@ -2684,7 +3112,7 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 		wd.data128 = wd.data128 | (((__uint128_t)(((loff >> 4) - 1) & 0x7) << shift));
 	}
 
-	if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+	if (flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F))
 		wd.data[0] >>= 16;
 
 	if ((flags & NIX_TX_VWQE_F) && !(ws[3] & BIT_ULL(35)))
@@ -2704,13 +3132,13 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 
 	/* Trigger LMTST */
 	if (lnum > 16) {
-		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F))
+		if (!(flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F)))
 			wd.data[0] = cn20k_nix_tx_steor_vec_data(flags);
 
 		pa = io_addr | (wd.data[0] & 0x7) << 4;
 		wd.data[0] &= ~0x7ULL;
 
-		if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+		if (flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F))
 			wd.data[0] <<= 16;
 
 		wd.data[0] |= (15ULL << 12);
@@ -2721,32 +3149,38 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 		/* STEOR0 */
 		roc_lmt_submit_steorl(wd.data[0], pa);
 
-		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F))
+		if (!(flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F)))
 			wd.data[1] = cn20k_nix_tx_steor_vec_data(flags);
 
 		pa = io_addr | (wd.data[1] & 0x7) << 4;
 		wd.data[1] &= ~0x7ULL;
 
-		if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+		if (flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F))
 			wd.data[1] <<= 16;
 
 		wd.data[1] |= ((uint64_t)(lnum - 17)) << 12;
 		wd.data[1] |= (uint64_t)(lmt_id + 16);
 
 		if (flags & NIX_TX_VWQE_F) {
-			cn20k_nix_vwqe_wait_fc(txq,
-					       burst - (cn20k_nix_pkts_per_vec_brst(flags) >> 1));
+			if (flags & NIX_TX_MULTI_SEG_F) {
+				if (burst - (cn20k_nix_pkts_per_vec_brst(flags) >> 1) > 0)
+					cn20k_nix_vwqe_wait_fc(txq,
+						burst - (cn20k_nix_pkts_per_vec_brst(flags) >> 1));
+			} else {
+				cn20k_nix_vwqe_wait_fc(txq,
+						burst - (cn20k_nix_pkts_per_vec_brst(flags) >> 1));
+			}
 		}
 		/* STEOR1 */
 		roc_lmt_submit_steorl(wd.data[1], pa);
 	} else if (lnum) {
-		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F))
+		if (!(flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F)))
 			wd.data[0] = cn20k_nix_tx_steor_vec_data(flags);
 
 		pa = io_addr | (wd.data[0] & 0x7) << 4;
 		wd.data[0] &= ~0x7ULL;
 
-		if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+		if (flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F))
 			wd.data[0] <<= 16;
 
 		wd.data[0] |= ((uint64_t)(lnum - 1)) << 12;
@@ -2767,8 +3201,13 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 	if (left)
 		goto again;
 
-	if (unlikely(scalar))
-		pkts += cn20k_nix_xmit_pkts(tx_queue, ws, tx_pkts, scalar, cmd, flags);
+	if (unlikely(scalar)) {
+		if (flags & NIX_TX_MULTI_SEG_F)
+			pkts += cn20k_nix_xmit_pkts_mseg(tx_queue, ws, tx_pkts, scalar, cmd, flags);
+		else
+			pkts += cn20k_nix_xmit_pkts(tx_queue, ws, tx_pkts, scalar, cmd, flags);
+	}
+
 	return pkts;
 }
 
@@ -3014,10 +3453,12 @@ NIX_TX_FASTPATH_MODES
 	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
 					     uint16_t pkts)                                        \
 	{                                                                                          \
-		RTE_SET_USED(tx_queue);                                                            \
-		RTE_SET_USED(tx_pkts);                                                             \
-		RTE_SET_USED(pkts);                                                                \
-		return 0;                                                                          \
+		uint64_t cmd[(sz) + CNXK_NIX_TX_MSEG_SG_DWORDS - 2];                               \
+		/* For TSO inner checksum is a must */                                             \
+		if (((flags) & NIX_TX_OFFLOAD_TSO_F) && !((flags) & NIX_TX_OFFLOAD_L3_L4_CSUM_F))  \
+			return 0;                                                                  \
+		return cn20k_nix_xmit_pkts_vector(tx_queue, NULL, tx_pkts, pkts, cmd,              \
+						  (flags) | NIX_TX_MULTI_SEG_F);                   \
 	}
 
 uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_all_offload(void *tx_queue,
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 27/33] common/cnxk: add flush wait after write of inline ctx
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (25 preceding siblings ...)
  2024-09-10  8:59 ` [PATCH 26/33] net/cnxk: support Tx multi-seg in " Nithin Dabilpuram
@ 2024-09-10  8:59 ` Nithin Dabilpuram
  2024-09-10  8:59 ` [PATCH 28/33] common/cnxk: fix CPT HW word size for outbound SA Nithin Dabilpuram
                   ` (8 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:59 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

Reading a CPT_LF_CTX_ERR csr will ensure writes for
FLUSH are complete and also tell whether flush is
complete or not.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
---
 drivers/common/cnxk/roc_nix_inl.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/common/cnxk/roc_nix_inl.c b/drivers/common/cnxk/roc_nix_inl.c
index a759052973..0e3305efd3 100644
--- a/drivers/common/cnxk/roc_nix_inl.c
+++ b/drivers/common/cnxk/roc_nix_inl.c
@@ -1750,6 +1750,7 @@ roc_nix_inl_ctx_write(struct roc_nix *roc_nix, void *sa_dptr, void *sa_cptr,
 	struct nix_inl_dev *inl_dev = NULL;
 	struct roc_cpt_lf *outb_lf = NULL;
 	union cpt_lf_ctx_flush flush;
+	union cpt_lf_ctx_err err;
 	bool get_inl_lf = true;
 	uintptr_t rbase;
 	struct nix *nix;
@@ -1791,6 +1792,13 @@ roc_nix_inl_ctx_write(struct roc_nix *roc_nix, void *sa_dptr, void *sa_cptr,
 		flush.s.cptr = ((uintptr_t)sa_cptr) >> 7;
 		plt_write64(flush.u, rbase + CPT_LF_CTX_FLUSH);
 
+		plt_atomic_thread_fence(__ATOMIC_ACQ_REL);
+
+		/* Read a CSR to ensure that the FLUSH operation is complete */
+		err.u = plt_read64(rbase + CPT_LF_CTX_ERR);
+
+		if (err.s.flush_st_flt)
+			plt_warn("CTX flush could not complete");
 		return 0;
 	}
 	plt_nix_dbg("Could not get CPT LF for CTX write");
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 28/33] common/cnxk: fix CPT HW word size for outbound SA
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (26 preceding siblings ...)
  2024-09-10  8:59 ` [PATCH 27/33] common/cnxk: add flush wait after write of inline ctx Nithin Dabilpuram
@ 2024-09-10  8:59 ` Nithin Dabilpuram
  2024-09-10  8:59 ` [PATCH 29/33] net/cnxk: add PMD APIs for IPsec SA base and flush Nithin Dabilpuram
                   ` (7 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:59 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, stable

Fix the CPT HW word size inited for outbound SA to be
two words.

Fixes: 5ece02e736c3 ("common/cnxk: use common SA init API for default options")
Cc: stable@dpdk.org

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
---
 drivers/common/cnxk/roc_ie_ot.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/common/cnxk/roc_ie_ot.c b/drivers/common/cnxk/roc_ie_ot.c
index 465b2bc1fb..1b436dba72 100644
--- a/drivers/common/cnxk/roc_ie_ot.c
+++ b/drivers/common/cnxk/roc_ie_ot.c
@@ -38,5 +38,6 @@ roc_ot_ipsec_outb_sa_init(struct roc_ot_ipsec_outb_sa *sa)
 	offset = offsetof(struct roc_ot_ipsec_outb_sa, ctx);
 	sa->w0.s.ctx_push_size = (offset / ROC_CTX_UNIT_8B) + 1;
 	sa->w0.s.ctx_size = ROC_IE_OT_CTX_ILEN;
+	sa->w0.s.ctx_hdr_size = ROC_IE_OT_SA_CTX_HDR_SIZE;
 	sa->w0.s.aop_valid = 1;
 }
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 29/33] net/cnxk: add PMD APIs for IPsec SA base and flush
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (27 preceding siblings ...)
  2024-09-10  8:59 ` [PATCH 28/33] common/cnxk: fix CPT HW word size for outbound SA Nithin Dabilpuram
@ 2024-09-10  8:59 ` Nithin Dabilpuram
  2024-09-10  8:59 ` [PATCH 30/33] net/cnxk: add PMD APIs to submit CPT instruction Nithin Dabilpuram
                   ` (6 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:59 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Srujana Challa

From: Srujana Challa <schalla@marvell.com>

Introduces new PMD APIs for Inline IPsec, including hardware SA
flush and obtaining the base address for the inline device sa table.
This allows applications to directly manage IPsec SAs.
This patch also updates the rte_pmd_cnxk_hw_sa_read|write() API's to
get use portid instead of device pointer as input.

Signed-off-by: Srujana Challa <schalla@marvell.com>
---
 drivers/common/cnxk/roc_nix_inl.c  |  9 +++++
 drivers/net/cnxk/cnxk_ethdev_sec.c | 64 ++++++++++++++++++++----------
 drivers/net/cnxk/rte_pmd_cnxk.h    | 48 +++++++++++++++++++---
 drivers/net/cnxk/version.map       |  2 +
 4 files changed, 97 insertions(+), 26 deletions(-)

diff --git a/drivers/common/cnxk/roc_nix_inl.c b/drivers/common/cnxk/roc_nix_inl.c
index 0e3305efd3..79116afe6d 100644
--- a/drivers/common/cnxk/roc_nix_inl.c
+++ b/drivers/common/cnxk/roc_nix_inl.c
@@ -1687,6 +1687,7 @@ roc_nix_inl_sa_sync(struct roc_nix *roc_nix, void *sa, bool inb,
 	struct roc_cpt_lf *outb_lf = NULL;
 	union cpt_lf_ctx_reload reload;
 	union cpt_lf_ctx_flush flush;
+	union cpt_lf_ctx_err err;
 	bool get_inl_lf = true;
 	uintptr_t rbase;
 	struct nix *nix;
@@ -1728,6 +1729,14 @@ roc_nix_inl_sa_sync(struct roc_nix *roc_nix, void *sa, bool inb,
 		case ROC_NIX_INL_SA_OP_FLUSH:
 			flush.s.cptr = ((uintptr_t)sa) >> 7;
 			plt_write64(flush.u, rbase + CPT_LF_CTX_FLUSH);
+			plt_atomic_thread_fence(__ATOMIC_ACQ_REL);
+			/* Read a CSR to ensure that the FLUSH operation is complete */
+			err.u = plt_read64(rbase + CPT_LF_CTX_ERR);
+
+			if (err.s.flush_st_flt) {
+				plt_warn("CTX flush could not complete");
+				return -EIO;
+			}
 			break;
 		case ROC_NIX_INL_SA_OP_RELOAD:
 			reload.s.cptr = ((uintptr_t)sa) >> 7;
diff --git a/drivers/net/cnxk/cnxk_ethdev_sec.c b/drivers/net/cnxk/cnxk_ethdev_sec.c
index cdd5656817..ec129b6584 100644
--- a/drivers/net/cnxk/cnxk_ethdev_sec.c
+++ b/drivers/net/cnxk/cnxk_ethdev_sec.c
@@ -297,47 +297,71 @@ cnxk_eth_sec_sess_get_by_sess(struct cnxk_eth_dev *dev,
 	return NULL;
 }
 
+union rte_pmd_cnxk_ipsec_hw_sa *
+rte_pmd_cnxk_hw_session_base_get(uint16_t portid, bool inb)
+{
+	struct rte_eth_dev *eth_dev = &rte_eth_devices[portid];
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	uintptr_t sa_base;
+
+	if (inb)
+		sa_base = roc_nix_inl_inb_sa_base_get(&dev->nix, dev->inb.inl_dev);
+	else
+		sa_base = roc_nix_inl_outb_sa_base_get(&dev->nix);
+
+	return (union rte_pmd_cnxk_ipsec_hw_sa *)sa_base;
+}
+
+int
+rte_pmd_cnxk_sa_flush(uint16_t portid, union rte_pmd_cnxk_ipsec_hw_sa *sess, bool inb)
+{
+	struct rte_eth_dev *eth_dev = &rte_eth_devices[portid];
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	return roc_nix_inl_sa_sync(&dev->nix, sess, inb, ROC_NIX_INL_SA_OP_FLUSH);
+}
+
 int
-rte_pmd_cnxk_hw_sa_read(void *device, void *__sess, union rte_pmd_cnxk_ipsec_hw_sa *data,
-			uint32_t len)
+rte_pmd_cnxk_hw_sa_read(uint16_t portid, void *sess, union rte_pmd_cnxk_ipsec_hw_sa *data,
+			uint32_t len, bool inb)
 {
-	struct rte_security_session *sess = __sess;
-	struct rte_eth_dev *eth_dev = (struct rte_eth_dev *)device;
+	struct rte_eth_dev *eth_dev = &rte_eth_devices[portid];
 	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
 	struct cnxk_eth_sec_sess *eth_sec;
+	void *sa;
 	int rc;
 
 	eth_sec = cnxk_eth_sec_sess_get_by_sess(dev, sess);
-	if (eth_sec == NULL)
-		return -EINVAL;
+	if (eth_sec)
+		sa = eth_sec->sa;
+	else
+		sa = sess;
 
-	rc = roc_nix_inl_sa_sync(&dev->nix, eth_sec->sa, eth_sec->inb, ROC_NIX_INL_SA_OP_FLUSH);
+	rc = roc_nix_inl_sa_sync(&dev->nix, sa, inb, ROC_NIX_INL_SA_OP_FLUSH);
 	if (rc)
 		return -EINVAL;
-	rte_delay_ms(1);
-	memcpy(data, eth_sec->sa, len);
+
+	memcpy(data, sa, len);
 
 	return 0;
 }
 
 int
-rte_pmd_cnxk_hw_sa_write(void *device, void *__sess, union rte_pmd_cnxk_ipsec_hw_sa *data,
-			 uint32_t len)
+rte_pmd_cnxk_hw_sa_write(uint16_t portid, void *sess, union rte_pmd_cnxk_ipsec_hw_sa *data,
+			 uint32_t len, bool inb)
 {
-	struct rte_security_session *sess = __sess;
-	struct rte_eth_dev *eth_dev = (struct rte_eth_dev *)device;
+	struct rte_eth_dev *eth_dev = &rte_eth_devices[portid];
 	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
 	struct cnxk_eth_sec_sess *eth_sec;
-	int rc = -EINVAL;
+	void *sa;
 
 	eth_sec = cnxk_eth_sec_sess_get_by_sess(dev, sess);
-	if (eth_sec == NULL)
-		return rc;
-	rc = roc_nix_inl_ctx_write(&dev->nix, data, eth_sec->sa, eth_sec->inb, len);
-	if (rc)
-		return rc;
+	if (eth_sec)
+		sa = eth_sec->sa;
+	else
+		sa = sess;
 
-	return 0;
+	return roc_nix_inl_ctx_write(&dev->nix, data, sa, inb, len);
 }
 
 union rte_pmd_cnxk_cpt_res_s *
diff --git a/drivers/net/cnxk/rte_pmd_cnxk.h b/drivers/net/cnxk/rte_pmd_cnxk.h
index 70f2f96fd4..ecd112e881 100644
--- a/drivers/net/cnxk/rte_pmd_cnxk.h
+++ b/drivers/net/cnxk/rte_pmd_cnxk.h
@@ -492,7 +492,7 @@ union rte_pmd_cnxk_cpt_res_s {
 /**
  * Read HW SA context from session.
  *
- * @param device
+ * @param portid
  *   Port identifier of Ethernet device.
  * @param sess
  *   Handle of the security session as void *.
@@ -500,17 +500,19 @@ union rte_pmd_cnxk_cpt_res_s {
  *   Destination pointer to copy SA context for application.
  * @param len
  *   Length of SA context to copy into data parameter.
+ * @param inb
+ *   Determines the type of specified SA.
  *
  * @return
  *   0 on success, a negative errno value otherwise.
  */
 __rte_experimental
-int rte_pmd_cnxk_hw_sa_read(void *device, void *sess,
-			    union rte_pmd_cnxk_ipsec_hw_sa *data, uint32_t len);
+int rte_pmd_cnxk_hw_sa_read(uint16_t portid, void *sess, union rte_pmd_cnxk_ipsec_hw_sa *data,
+			    uint32_t len, bool inb);
 /**
  * Write HW SA context to session.
  *
- * @param device
+ * @param portid
  *   Port identifier of Ethernet device.
  * @param sess
  *   Handle of the security session as void *.
@@ -518,13 +520,15 @@ int rte_pmd_cnxk_hw_sa_read(void *device, void *sess,
  *   Source data pointer from application to copy SA context into session.
  * @param len
  *   Length of SA context to copy from data parameter.
+ * @param inb
+ *   Determines the type of specified SA.
  *
  * @return
  *   0 on success, a negative errno value otherwise.
  */
 __rte_experimental
-int rte_pmd_cnxk_hw_sa_write(void *device, void *sess,
-			     union rte_pmd_cnxk_ipsec_hw_sa *data, uint32_t len);
+int rte_pmd_cnxk_hw_sa_write(uint16_t portid, void *sess, union rte_pmd_cnxk_ipsec_hw_sa *data,
+			     uint32_t len, bool inb);
 
 /**
  * Get pointer to CPT result info for inline inbound processed pkt.
@@ -542,4 +546,36 @@ int rte_pmd_cnxk_hw_sa_write(void *device, void *sess,
  */
 __rte_experimental
 union rte_pmd_cnxk_cpt_res_s *rte_pmd_cnxk_inl_ipsec_res(struct rte_mbuf *mbuf);
+
+/**
+ * Get pointer to the Inline Inbound or Outbound SA table base.
+ *
+ * @param portid
+ *   Port identifier of Ethernet device.
+ * @param inb
+ *   Determines the type of SA base to be returned.
+ *   When inb is true, the method returns the Inbound SA base.
+ *   When inb is false, the method returns the Outbound SA base.
+ *
+ * @return
+ *   Pointer to Inbound or Outbound SA base.
+ */
+__rte_experimental
+union rte_pmd_cnxk_ipsec_hw_sa *rte_pmd_cnxk_hw_session_base_get(uint16_t portid, bool inb);
+
+/**
+ * Executes a CPT flush on the specified session.
+ *
+ * @param portid
+ *   Port identifier of Ethernet device.
+ * @param sess
+ *   Handle of the session on which the CPT flush will be executed.
+ * @param inb
+ *   Determines the type of SA to be flushed, Inbound or Outbound.
+ *
+ * @return
+ *   0 upon success, a negative errno value otherwise.
+ */
+__rte_experimental
+int rte_pmd_cnxk_sa_flush(uint16_t portid, union rte_pmd_cnxk_ipsec_hw_sa *sess, bool inb);
 #endif /* _PMD_CNXK_H_ */
diff --git a/drivers/net/cnxk/version.map b/drivers/net/cnxk/version.map
index 1ad0616bdf..7e8703df5c 100644
--- a/drivers/net/cnxk/version.map
+++ b/drivers/net/cnxk/version.map
@@ -10,7 +10,9 @@ EXPERIMENTAL {
 	rte_pmd_cnxk_hw_sa_write;
 
 	# added in 23.11
+	rte_pmd_cnxk_hw_session_base_get;
 	rte_pmd_cnxk_inl_ipsec_res;
+	rte_pmd_cnxk_sa_flush;
 };
 
 INTERNAL {
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 30/33] net/cnxk: add PMD APIs to submit CPT instruction
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (28 preceding siblings ...)
  2024-09-10  8:59 ` [PATCH 29/33] net/cnxk: add PMD APIs for IPsec SA base and flush Nithin Dabilpuram
@ 2024-09-10  8:59 ` Nithin Dabilpuram
  2024-09-10  8:59 ` [PATCH 31/33] net/cnxk: add PMD API to retrieve CPT queue statistics Nithin Dabilpuram
                   ` (5 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:59 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Srujana Challa

From: Srujana Challa <schalla@marvell.com>

Introduces new PMD APIs for submitting CPT instructions to the
Inline Device. These APIs allows applications to directly submit
CPT instructions to the Inline Device.

Signed-off-by: Srujana Challa <schalla@marvell.com>
---
 drivers/common/cnxk/roc_nix_inl.h      | 12 ++++++-
 drivers/common/cnxk/roc_nix_inl_dev.c  | 32 ++++++++++++++++++
 drivers/common/cnxk/roc_nix_inl_priv.h |  2 ++
 drivers/common/cnxk/version.map        |  1 +
 drivers/net/cnxk/cn10k_ethdev_sec.c    | 42 ++++++++++++++++++++++++
 drivers/net/cnxk/cn9k_ethdev_sec.c     | 14 ++++++++
 drivers/net/cnxk/cnxk_ethdev.c         |  1 +
 drivers/net/cnxk/cnxk_ethdev.h         | 45 ++++++++++++++++++++++++++
 drivers/net/cnxk/cnxk_ethdev_sec.c     | 19 +++++++++++
 drivers/net/cnxk/rte_pmd_cnxk.h        | 35 ++++++++++++++++++++
 drivers/net/cnxk/version.map           |  2 ++
 11 files changed, 204 insertions(+), 1 deletion(-)

diff --git a/drivers/common/cnxk/roc_nix_inl.h b/drivers/common/cnxk/roc_nix_inl.h
index 1a4bf8808c..974834a0f3 100644
--- a/drivers/common/cnxk/roc_nix_inl.h
+++ b/drivers/common/cnxk/roc_nix_inl.h
@@ -99,10 +99,19 @@ struct roc_nix_inl_dev {
 	uint8_t rx_inj_ena; /* Rx Inject Enable */
 	/* End of input parameters */
 
-#define ROC_NIX_INL_MEM_SZ (1408)
+#define ROC_NIX_INL_MEM_SZ (2048)
 	uint8_t reserved[ROC_NIX_INL_MEM_SZ] __plt_cache_aligned;
 } __plt_cache_aligned;
 
+struct roc_nix_inl_dev_q {
+	uint32_t nb_desc;
+	uintptr_t rbase;
+	uintptr_t lmt_base;
+	uint64_t *fc_addr;
+	uint64_t io_addr;
+	int32_t fc_addr_sw;
+} __plt_cache_aligned;
+
 /* NIX Inline Device API */
 int __roc_api roc_nix_inl_dev_init(struct roc_nix_inl_dev *roc_inl_dev);
 int __roc_api roc_nix_inl_dev_fini(struct roc_nix_inl_dev *roc_inl_dev);
@@ -176,5 +185,6 @@ int __roc_api roc_nix_inl_ctx_write(struct roc_nix *roc_nix, void *sa_dptr,
 				    void *sa_cptr, bool inb, uint16_t sa_len);
 void __roc_api roc_nix_inl_outb_cpt_lfs_dump(struct roc_nix *roc_nix, FILE *file);
 uint64_t __roc_api roc_nix_inl_eng_caps_get(struct roc_nix *roc_nix);
+void *__roc_api roc_nix_inl_dev_qptr_get(uint8_t qid);
 
 #endif /* _ROC_NIX_INL_H_ */
diff --git a/drivers/common/cnxk/roc_nix_inl_dev.c b/drivers/common/cnxk/roc_nix_inl_dev.c
index e2bbe3a67b..84c69a44c5 100644
--- a/drivers/common/cnxk/roc_nix_inl_dev.c
+++ b/drivers/common/cnxk/roc_nix_inl_dev.c
@@ -168,6 +168,7 @@ nix_inl_nix_ipsec_cfg(struct nix_inl_dev *inl_dev, bool ena)
 static int
 nix_inl_cpt_setup(struct nix_inl_dev *inl_dev, bool inl_dev_sso)
 {
+	struct roc_nix_inl_dev_q *q_info;
 	struct dev *dev = &inl_dev->dev;
 	bool ctx_ilen_valid = false;
 	struct roc_cpt_lf *lf;
@@ -209,6 +210,13 @@ nix_inl_cpt_setup(struct nix_inl_dev *inl_dev, bool inl_dev_sso)
 			goto lf_free;
 		}
 
+		q_info = &inl_dev->q_info[i];
+		q_info->nb_desc = lf->nb_desc;
+		q_info->fc_addr = lf->fc_addr;
+		q_info->io_addr = lf->io_addr;
+		q_info->lmt_base = lf->lmt_base;
+		q_info->rbase = lf->rbase;
+
 		roc_cpt_iq_enable(lf);
 	}
 	return 0;
@@ -835,6 +843,30 @@ nix_inl_outb_poll_thread_setup(struct nix_inl_dev *inl_dev)
 	return rc;
 }
 
+void *
+roc_nix_inl_dev_qptr_get(uint8_t qid)
+{
+	struct idev_cfg *idev = idev_get_cfg();
+	struct nix_inl_dev *inl_dev = NULL;
+
+	if (idev)
+		inl_dev = idev->nix_inl_dev;
+
+	if (!inl_dev) {
+		plt_err("Inline Device could not be detected\n");
+		return NULL;
+	}
+	if (!inl_dev->attach_cptlf) {
+		plt_err("No CPT LFs are attached to Inline Device\n");
+		return NULL;
+	}
+	if (qid >= inl_dev->nb_cptlf) {
+		plt_err("Invalid qid: %u total queues: %d\n", qid, inl_dev->nb_cptlf);
+		return NULL;
+	}
+	return &inl_dev->q_info[qid];
+}
+
 int
 roc_nix_inl_dev_stats_get(struct roc_nix_stats *stats)
 {
diff --git a/drivers/common/cnxk/roc_nix_inl_priv.h b/drivers/common/cnxk/roc_nix_inl_priv.h
index 5afc7d6655..64b8b3977d 100644
--- a/drivers/common/cnxk/roc_nix_inl_priv.h
+++ b/drivers/common/cnxk/roc_nix_inl_priv.h
@@ -100,6 +100,8 @@ struct nix_inl_dev {
 	uint32_t curr_ipsec_idx;
 	uint32_t max_ipsec_rules;
 	uint32_t alloc_ipsec_rules;
+
+	struct roc_nix_inl_dev_q q_info[NIX_INL_CPT_LF];
 };
 
 int nix_inl_sso_register_irqs(struct nix_inl_dev *inl_dev);
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index f98738d07e..8832c75eef 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -267,6 +267,7 @@ INTERNAL {
 	roc_nix_inl_meta_pool_cb_register;
 	roc_nix_inl_custom_meta_pool_cb_register;
 	roc_nix_inb_mode_set;
+	roc_nix_inl_dev_qptr_get;
 	roc_nix_inl_outb_fini;
 	roc_nix_inl_outb_init;
 	roc_nix_inl_outb_lf_base_get;
diff --git a/drivers/net/cnxk/cn10k_ethdev_sec.c b/drivers/net/cnxk/cn10k_ethdev_sec.c
index 074bb09822..f22f2ae12d 100644
--- a/drivers/net/cnxk/cn10k_ethdev_sec.c
+++ b/drivers/net/cnxk/cn10k_ethdev_sec.c
@@ -1305,6 +1305,45 @@ cn10k_eth_sec_rx_inject_config(void *device, uint16_t port_id, bool enable)
 	return 0;
 }
 
+#define CPT_LMTST_BURST 32
+static uint16_t
+cn10k_inl_dev_submit(struct roc_nix_inl_dev_q *q, void *inst, uint16_t nb_inst)
+{
+	uintptr_t lbase = q->lmt_base;
+	uint8_t lnum, shft, loff;
+	uint16_t left, burst;
+	rte_iova_t io_addr;
+	uint16_t lmt_id;
+
+	/* Check the flow control to avoid the queue overflow */
+	if (cnxk_nix_inl_fc_check(q->fc_addr, &q->fc_addr_sw, q->nb_desc, nb_inst))
+		return 0;
+
+	io_addr = q->io_addr;
+	ROC_LMT_CPT_BASE_ID_GET(lbase, lmt_id);
+
+	left = nb_inst;
+again:
+	burst = left > CPT_LMTST_BURST ? CPT_LMTST_BURST : left;
+
+	lnum = 0;
+	loff = 0;
+	shft = 16;
+	memcpy(PLT_PTR_CAST(lbase), inst, burst * sizeof(struct cpt_inst_s));
+	loff = (burst % 2) ? 1 : 0;
+	lnum = (burst / 2);
+	shft = shft + (lnum * 3);
+
+	left -= burst;
+	cn10k_nix_sec_steorl(io_addr, lmt_id, lnum, loff, shft);
+	rte_io_wmb();
+	if (left) {
+		inst = RTE_PTR_ADD(inst, burst * sizeof(struct cpt_inst_s));
+		goto again;
+	}
+	return nb_inst;
+}
+
 void
 cn10k_eth_sec_ops_override(void)
 {
@@ -1341,4 +1380,7 @@ cn10k_eth_sec_ops_override(void)
 	cnxk_eth_sec_ops.macsec_sa_stats_get = cnxk_eth_macsec_sa_stats_get;
 	cnxk_eth_sec_ops.rx_inject_configure = cn10k_eth_sec_rx_inject_config;
 	cnxk_eth_sec_ops.inb_pkt_rx_inject = cn10k_eth_sec_inb_rx_inject;
+
+	/* Update platform specific rte_pmd_cnxk ops */
+	cnxk_pmd_ops.inl_dev_submit = cn10k_inl_dev_submit;
 }
diff --git a/drivers/net/cnxk/cn9k_ethdev_sec.c b/drivers/net/cnxk/cn9k_ethdev_sec.c
index a0e0a73639..ae8d04be69 100644
--- a/drivers/net/cnxk/cn9k_ethdev_sec.c
+++ b/drivers/net/cnxk/cn9k_ethdev_sec.c
@@ -845,6 +845,17 @@ cn9k_eth_sec_capabilities_get(void *device __rte_unused)
 	return cn9k_eth_sec_capabilities;
 }
 
+static uint16_t
+cn9k_inl_dev_submit(struct roc_nix_inl_dev_q *q, void *inst, uint16_t nb_inst)
+{
+	/* Not supported */
+	PLT_SET_USED(q);
+	PLT_SET_USED(inst);
+	PLT_SET_USED(nb_inst);
+
+	return 0;
+}
+
 void
 cn9k_eth_sec_ops_override(void)
 {
@@ -859,4 +870,7 @@ cn9k_eth_sec_ops_override(void)
 	cnxk_eth_sec_ops.session_update = cn9k_eth_sec_session_update;
 	cnxk_eth_sec_ops.session_destroy = cn9k_eth_sec_session_destroy;
 	cnxk_eth_sec_ops.capabilities_get = cn9k_eth_sec_capabilities_get;
+
+	/* Update platform specific rte_pmd_cnxk ops */
+	cnxk_pmd_ops.inl_dev_submit = cn9k_inl_dev_submit;
 }
diff --git a/drivers/net/cnxk/cnxk_ethdev.c b/drivers/net/cnxk/cnxk_ethdev.c
index dd065c8269..13b7e8a38c 100644
--- a/drivers/net/cnxk/cnxk_ethdev.c
+++ b/drivers/net/cnxk/cnxk_ethdev.c
@@ -135,6 +135,7 @@ nix_security_setup(struct cnxk_eth_dev *dev)
 			rc = -ENOMEM;
 			goto cleanup;
 		}
+		dev->inb.inl_dev_q = roc_nix_inl_dev_qptr_get(0);
 	}
 
 	if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_SECURITY ||
diff --git a/drivers/net/cnxk/cnxk_ethdev.h b/drivers/net/cnxk/cnxk_ethdev.h
index 5920488e1a..e9a8a14561 100644
--- a/drivers/net/cnxk/cnxk_ethdev.h
+++ b/drivers/net/cnxk/cnxk_ethdev.h
@@ -260,6 +260,9 @@ struct cnxk_eth_dev_sec_inb {
 
 	/* Disable custom meta aura */
 	bool custom_meta_aura_dis;
+
+	/* Inline device CPT queue info */
+	struct roc_nix_inl_dev_q *inl_dev_q;
 };
 
 /* Outbound security data */
@@ -499,6 +502,39 @@ cnxk_nix_tx_queue_sec_count(uint64_t *mem, uint16_t sqes_per_sqb_log2, uint64_t
 	return (val & 0xFFFF);
 }
 
+static inline int
+cnxk_nix_inl_fc_check(uint64_t *fc, int32_t *fc_sw, uint32_t nb_desc, uint16_t nb_inst)
+{
+	uint8_t retry_count = 32;
+	int32_t val, newval;
+
+	/* Check if there is any CPT instruction to submit */
+	if (!nb_inst)
+		return -EINVAL;
+
+retry:
+	val = rte_atomic_fetch_sub_explicit(fc_sw, nb_inst, __ATOMIC_RELAXED) - nb_inst;
+	if (likely(val >= 0))
+		return 0;
+
+	newval = (int64_t)nb_desc - rte_atomic_load_explicit(fc, __ATOMIC_RELAXED);
+	newval -= nb_inst;
+
+	if (!rte_atomic_compare_exchange_strong_explicit(fc_sw, &val, newval, __ATOMIC_RELEASE,
+							 __ATOMIC_RELAXED)) {
+		if (retry_count) {
+			retry_count--;
+			goto retry;
+		} else {
+			return -EAGAIN;
+		}
+	}
+	if (unlikely(newval < 0))
+		return -EAGAIN;
+
+	return 0;
+}
+
 /* Common ethdev ops */
 extern struct eth_dev_ops cnxk_eth_dev_ops;
 
@@ -511,6 +547,15 @@ extern struct rte_security_ops cnxk_eth_sec_ops;
 /* Common tm ops */
 extern struct rte_tm_ops cnxk_tm_ops;
 
+/* Platform specific rte pmd cnxk ops */
+typedef uint16_t (*cnxk_inl_dev_submit_cb_t)(struct roc_nix_inl_dev_q *q, void *inst,
+					     uint16_t nb_inst);
+
+struct cnxk_ethdev_pmd_ops {
+	cnxk_inl_dev_submit_cb_t inl_dev_submit;
+};
+extern struct cnxk_ethdev_pmd_ops cnxk_pmd_ops;
+
 /* Ops */
 int cnxk_nix_probe(struct rte_pci_driver *pci_drv,
 		   struct rte_pci_device *pci_dev);
diff --git a/drivers/net/cnxk/cnxk_ethdev_sec.c b/drivers/net/cnxk/cnxk_ethdev_sec.c
index ec129b6584..7e5103bf54 100644
--- a/drivers/net/cnxk/cnxk_ethdev_sec.c
+++ b/drivers/net/cnxk/cnxk_ethdev_sec.c
@@ -33,6 +33,8 @@ struct inl_cpt_channel {
 #define CNXK_NIX_INL_DEV_NAME_LEN                                              \
 	(sizeof(CNXK_NIX_INL_DEV_NAME) + PCI_PRI_STR_SIZE)
 
+struct cnxk_ethdev_pmd_ops cnxk_pmd_ops;
+
 static inline int
 bitmap_ctzll(uint64_t slab)
 {
@@ -297,6 +299,18 @@ cnxk_eth_sec_sess_get_by_sess(struct cnxk_eth_dev *dev,
 	return NULL;
 }
 
+uint16_t
+rte_pmd_cnxk_inl_dev_submit(struct rte_pmd_cnxk_inl_dev_q *qptr, void *inst, uint16_t nb_inst)
+{
+	return cnxk_pmd_ops.inl_dev_submit((struct roc_nix_inl_dev_q *)qptr, inst, nb_inst);
+}
+
+struct rte_pmd_cnxk_inl_dev_q *
+rte_pmd_cnxk_inl_dev_qptr_get(void)
+{
+	return roc_nix_inl_dev_qptr_get(0);
+}
+
 union rte_pmd_cnxk_ipsec_hw_sa *
 rte_pmd_cnxk_hw_session_base_get(uint16_t portid, bool inb)
 {
@@ -353,6 +367,7 @@ rte_pmd_cnxk_hw_sa_write(uint16_t portid, void *sess, union rte_pmd_cnxk_ipsec_h
 	struct rte_eth_dev *eth_dev = &rte_eth_devices[portid];
 	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
 	struct cnxk_eth_sec_sess *eth_sec;
+	struct roc_nix_inl_dev_q *q;
 	void *sa;
 
 	eth_sec = cnxk_eth_sec_sess_get_by_sess(dev, sess);
@@ -361,6 +376,10 @@ rte_pmd_cnxk_hw_sa_write(uint16_t portid, void *sess, union rte_pmd_cnxk_ipsec_h
 	else
 		sa = sess;
 
+	q = dev->inb.inl_dev_q;
+	if (q && cnxk_nix_inl_fc_check(q->fc_addr, &q->fc_addr_sw, q->nb_desc, 1))
+		return -EAGAIN;
+
 	return roc_nix_inl_ctx_write(&dev->nix, data, sa, inb, len);
 }
 
diff --git a/drivers/net/cnxk/rte_pmd_cnxk.h b/drivers/net/cnxk/rte_pmd_cnxk.h
index ecd112e881..798547e731 100644
--- a/drivers/net/cnxk/rte_pmd_cnxk.h
+++ b/drivers/net/cnxk/rte_pmd_cnxk.h
@@ -489,6 +489,13 @@ union rte_pmd_cnxk_cpt_res_s {
 	uint64_t u64[2];
 };
 
+/** Forward structure declaration for inline device queue. Applications obtain a pointer
+ * to this structure using the ``rte_pmd_cnxk_inl_dev_qptr_get`` API and use it to submit
+ * CPT instructions (cpt_inst_s) to the inline device via the
+ * ``rte_pmd_cnxk_inl_dev_submit`` API.
+ */
+struct rte_pmd_cnxk_inl_dev_q;
+
 /**
  * Read HW SA context from session.
  *
@@ -578,4 +585,32 @@ union rte_pmd_cnxk_ipsec_hw_sa *rte_pmd_cnxk_hw_session_base_get(uint16_t portid
  */
 __rte_experimental
 int rte_pmd_cnxk_sa_flush(uint16_t portid, union rte_pmd_cnxk_ipsec_hw_sa *sess, bool inb);
+
+/**
+ * Get queue pointer of Inline Device.
+ *
+ * @return
+ *   - Pointer to queue structure that would be the input to submit API.
+ *   - NULL upon failure.
+ */
+__rte_experimental
+struct rte_pmd_cnxk_inl_dev_q *rte_pmd_cnxk_inl_dev_qptr_get(void);
+
+/**
+ * Submit CPT instruction(s) (cpt_inst_s) to Inline Device.
+ *
+ * @param qptr
+ *   Pointer obtained with ``rte_pmd_cnxk_inl_dev_qptr_get``.
+ * @param inst
+ *   Pointer to an array of ``cpt_inst_s`` prapared by application.
+ * @param nb_inst
+ *   Number of instructions to be processed.
+ *
+ * @return
+ *   Number of instructions processed.
+ */
+__rte_experimental
+uint16_t rte_pmd_cnxk_inl_dev_submit(struct rte_pmd_cnxk_inl_dev_q *qptr, void *inst,
+				     uint16_t nb_inst);
+
 #endif /* _PMD_CNXK_H_ */
diff --git a/drivers/net/cnxk/version.map b/drivers/net/cnxk/version.map
index 7e8703df5c..58dcb1fac0 100644
--- a/drivers/net/cnxk/version.map
+++ b/drivers/net/cnxk/version.map
@@ -11,6 +11,8 @@ EXPERIMENTAL {
 
 	# added in 23.11
 	rte_pmd_cnxk_hw_session_base_get;
+	rte_pmd_cnxk_inl_dev_qptr_get;
+	rte_pmd_cnxk_inl_dev_submit;
 	rte_pmd_cnxk_inl_ipsec_res;
 	rte_pmd_cnxk_sa_flush;
 };
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 31/33] net/cnxk: add PMD API to retrieve CPT queue statistics
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (29 preceding siblings ...)
  2024-09-10  8:59 ` [PATCH 30/33] net/cnxk: add PMD APIs to submit CPT instruction Nithin Dabilpuram
@ 2024-09-10  8:59 ` Nithin Dabilpuram
  2024-09-10  8:59 ` [PATCH 32/33] net/cnxk: add option to enable custom inbound sa usage Nithin Dabilpuram
                   ` (4 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:59 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Srujana Challa

From: Srujana Challa <schalla@marvell.com>

Introduces a new PMD API to obtain CPT queue statistics, including:
- CPT_LF_CTX_ENC_BYTE_CNT - Encrypted byte count on the given queue
- CPT_LF_CTX_ENC_PKT_CNT - Encrypted packet count on the given queue
- CPT_LF_CTX_DEC_BYTE_CNT - Decrypted byte count on the given queue
- CPT_LF_CTX_DEC_PKT_CNT - Decrypted packet count on the given queue

This API enables applications to access CPT queue statistics directly.

Signed-off-by: Srujana Challa <schalla@marvell.com>
---
 drivers/common/cnxk/roc_nix_inl.c  | 67 ++++++++++++++++++++++++++++++
 drivers/common/cnxk/roc_nix_inl.h  | 15 +++++++
 drivers/common/cnxk/version.map    |  1 +
 drivers/net/cnxk/cnxk_ethdev_sec.c | 11 +++++
 drivers/net/cnxk/rte_pmd_cnxk.h    | 43 +++++++++++++++++++
 drivers/net/cnxk/version.map       |  1 +
 6 files changed, 138 insertions(+)

diff --git a/drivers/common/cnxk/roc_nix_inl.c b/drivers/common/cnxk/roc_nix_inl.c
index 79116afe6d..4d6d0cab5f 100644
--- a/drivers/common/cnxk/roc_nix_inl.c
+++ b/drivers/common/cnxk/roc_nix_inl.c
@@ -1814,6 +1814,73 @@ roc_nix_inl_ctx_write(struct roc_nix *roc_nix, void *sa_dptr, void *sa_cptr,
 	return -ENOTSUP;
 }
 
+static inline int
+nix_inl_dev_cpt_lf_stats_get(struct roc_nix *roc_nix, struct roc_nix_cpt_lf_stats *stats,
+			     uint16_t idx)
+{
+	struct idev_cfg *idev = idev_get_cfg();
+	struct nix_inl_dev *inl_dev = NULL;
+	struct roc_cpt_lf *lf = NULL;
+
+	PLT_SET_USED(roc_nix);
+	if (idev)
+		inl_dev = idev->nix_inl_dev;
+
+	if (inl_dev && inl_dev->attach_cptlf) {
+		if (idx >= inl_dev->nb_cptlf) {
+			plt_err("Invalid idx: %u total lfs: %d\n", idx, inl_dev->nb_cptlf);
+			return -EINVAL;
+		}
+		lf = &inl_dev->cpt_lf[idx];
+	} else {
+		plt_err("No CPT LF(s) are found for Inline Device\n");
+		return -EINVAL;
+	}
+	stats->enc_pkts = plt_read64(lf->rbase + CPT_LF_CTX_ENC_PKT_CNT);
+	stats->enc_bytes = plt_read64(lf->rbase + CPT_LF_CTX_ENC_BYTE_CNT);
+	stats->dec_pkts = plt_read64(lf->rbase + CPT_LF_CTX_DEC_PKT_CNT);
+	stats->dec_bytes = plt_read64(lf->rbase + CPT_LF_CTX_DEC_BYTE_CNT);
+
+	return 0;
+}
+
+static inline int
+nix_eth_dev_cpt_lf_stats_get(struct roc_nix *roc_nix, struct roc_nix_cpt_lf_stats *stats,
+			     uint16_t idx)
+{
+	struct roc_cpt_lf *lf;
+	struct nix *nix;
+
+	if (!roc_nix)
+		return -EINVAL;
+	nix = roc_nix_to_nix_priv(roc_nix);
+	if (idx >= nix->nb_cpt_lf) {
+		plt_err("Invalid idx: %u total lfs: %d\n", idx, nix->nb_cpt_lf);
+		return -EINVAL;
+	}
+	lf = &nix->cpt_lf_base[idx];
+	stats->enc_pkts = plt_read64(lf->rbase + CPT_LF_CTX_ENC_PKT_CNT);
+	stats->enc_bytes = plt_read64(lf->rbase + CPT_LF_CTX_ENC_BYTE_CNT);
+	stats->dec_pkts = plt_read64(lf->rbase + CPT_LF_CTX_DEC_PKT_CNT);
+	stats->dec_bytes = plt_read64(lf->rbase + CPT_LF_CTX_DEC_BYTE_CNT);
+
+	return 0;
+}
+
+int
+roc_nix_inl_cpt_lf_stats_get(struct roc_nix *roc_nix, enum roc_nix_cpt_lf_stats_type type,
+			     struct roc_nix_cpt_lf_stats *stats, uint16_t idx)
+{
+	switch (type) {
+	case ROC_NIX_CPT_LF_STATS_INL_DEV:
+		return nix_inl_dev_cpt_lf_stats_get(roc_nix, stats, idx);
+	case ROC_NIX_CPT_LF_STATS_ETHDEV:
+		return nix_eth_dev_cpt_lf_stats_get(roc_nix, stats, idx);
+	default:
+		return -EINVAL;
+	}
+}
+
 int
 roc_nix_inl_ts_pkind_set(struct roc_nix *roc_nix, bool ts_ena, bool inb_inl_dev)
 {
diff --git a/drivers/common/cnxk/roc_nix_inl.h b/drivers/common/cnxk/roc_nix_inl.h
index 974834a0f3..16cead7fa4 100644
--- a/drivers/common/cnxk/roc_nix_inl.h
+++ b/drivers/common/cnxk/roc_nix_inl.h
@@ -112,6 +112,13 @@ struct roc_nix_inl_dev_q {
 	int32_t fc_addr_sw;
 } __plt_cache_aligned;
 
+struct roc_nix_cpt_lf_stats {
+	uint64_t enc_pkts;
+	uint64_t enc_bytes;
+	uint64_t dec_pkts;
+	uint64_t dec_bytes;
+};
+
 /* NIX Inline Device API */
 int __roc_api roc_nix_inl_dev_init(struct roc_nix_inl_dev *roc_inl_dev);
 int __roc_api roc_nix_inl_dev_fini(struct roc_nix_inl_dev *roc_inl_dev);
@@ -187,4 +194,12 @@ void __roc_api roc_nix_inl_outb_cpt_lfs_dump(struct roc_nix *roc_nix, FILE *file
 uint64_t __roc_api roc_nix_inl_eng_caps_get(struct roc_nix *roc_nix);
 void *__roc_api roc_nix_inl_dev_qptr_get(uint8_t qid);
 
+enum roc_nix_cpt_lf_stats_type {
+	ROC_NIX_CPT_LF_STATS_INL_DEV,
+	ROC_NIX_CPT_LF_STATS_KERNEL,
+	ROC_NIX_CPT_LF_STATS_ETHDEV = 2,
+};
+int __roc_api roc_nix_inl_cpt_lf_stats_get(struct roc_nix *roc_nix,
+					   enum roc_nix_cpt_lf_stats_type type,
+					   struct roc_nix_cpt_lf_stats *stats, uint16_t idx);
 #endif /* _ROC_NIX_INL_H_ */
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index 8832c75eef..6f8a2e02da 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -267,6 +267,7 @@ INTERNAL {
 	roc_nix_inl_meta_pool_cb_register;
 	roc_nix_inl_custom_meta_pool_cb_register;
 	roc_nix_inb_mode_set;
+	roc_nix_inl_cpt_lf_stats_get;
 	roc_nix_inl_dev_qptr_get;
 	roc_nix_inl_outb_fini;
 	roc_nix_inl_outb_init;
diff --git a/drivers/net/cnxk/cnxk_ethdev_sec.c b/drivers/net/cnxk/cnxk_ethdev_sec.c
index 7e5103bf54..32b6946ac1 100644
--- a/drivers/net/cnxk/cnxk_ethdev_sec.c
+++ b/drivers/net/cnxk/cnxk_ethdev_sec.c
@@ -311,6 +311,17 @@ rte_pmd_cnxk_inl_dev_qptr_get(void)
 	return roc_nix_inl_dev_qptr_get(0);
 }
 
+int
+rte_pmd_cnxk_cpt_q_stats_get(uint16_t portid, enum rte_pmd_cnxk_cpt_q_stats_type type,
+			     struct rte_pmd_cnxk_cpt_q_stats *stats, uint16_t idx)
+{
+	struct rte_eth_dev *eth_dev = &rte_eth_devices[portid];
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	return roc_nix_inl_cpt_lf_stats_get(&dev->nix, (enum roc_nix_cpt_lf_stats_type)type,
+					    (struct roc_nix_cpt_lf_stats *)stats, idx);
+}
+
 union rte_pmd_cnxk_ipsec_hw_sa *
 rte_pmd_cnxk_hw_session_base_get(uint16_t portid, bool inb)
 {
diff --git a/drivers/net/cnxk/rte_pmd_cnxk.h b/drivers/net/cnxk/rte_pmd_cnxk.h
index 798547e731..dcb4f334fe 100644
--- a/drivers/net/cnxk/rte_pmd_cnxk.h
+++ b/drivers/net/cnxk/rte_pmd_cnxk.h
@@ -47,6 +47,30 @@ enum rte_pmd_cnxk_sec_action_alg {
 	RTE_PMD_CNXK_SEC_ACTION_ALG4,
 };
 
+/** CPT queue type for obtaining queue hardware statistics. */
+enum rte_pmd_cnxk_cpt_q_stats_type {
+	/** Type to get Inline Device LF(s) statistics */
+	RTE_PMD_CNXK_CPT_Q_STATS_INL_DEV,
+	/** Type to get Inline Inbound LF which is attached to kernel device
+	 * statistics.
+	 */
+	RTE_PMD_CNXK_CPT_Q_STATS_KERNEL,
+	/** Type to get CPT LF which is attached to ethdev statistics */
+	RTE_PMD_CNXK_CPT_Q_STATS_ETHDEV = 2,
+};
+
+/** CPT queue hardware statistics */
+struct rte_pmd_cnxk_cpt_q_stats {
+	/** Encrypted packet count */
+	uint64_t enc_pkts;
+	/** Encrypted byte count */
+	uint64_t enc_bytes;
+	/** Decrypted packet count */
+	uint64_t dec_pkts;
+	/** Decrypted byte count */
+	uint64_t dec_bytes;
+};
+
 struct rte_pmd_cnxk_sec_action {
 	/** Used as lookup result for ALG3 */
 	uint32_t sa_index;
@@ -613,4 +637,23 @@ __rte_experimental
 uint16_t rte_pmd_cnxk_inl_dev_submit(struct rte_pmd_cnxk_inl_dev_q *qptr, void *inst,
 				     uint16_t nb_inst);
 
+/**
+ * Retrieves the hardware statistics of a given port and stats type.
+ *
+ * @param portid
+ *   Port identifier of Ethernet device.
+ * @param type
+ *   The type of hardware statistics to retrieve, as defined in the
+ *   ``enum rte_pmd_cnxk_cpt_q_stats_type``.
+ * @param stats
+ *   Pointer where the retrieved statistics will be stored.
+ * @param idx
+ *   The index of the queue of a given type.
+ *
+ * @return
+ *   0 upon success, a negative errno value otherwise.
+ */
+__rte_experimental
+int rte_pmd_cnxk_cpt_q_stats_get(uint16_t portid, enum rte_pmd_cnxk_cpt_q_stats_type type,
+				 struct rte_pmd_cnxk_cpt_q_stats *stats, uint16_t idx);
 #endif /* _PMD_CNXK_H_ */
diff --git a/drivers/net/cnxk/version.map b/drivers/net/cnxk/version.map
index 58dcb1fac0..02a02edc25 100644
--- a/drivers/net/cnxk/version.map
+++ b/drivers/net/cnxk/version.map
@@ -10,6 +10,7 @@ EXPERIMENTAL {
 	rte_pmd_cnxk_hw_sa_write;
 
 	# added in 23.11
+	rte_pmd_cnxk_cpt_q_stats_get;
 	rte_pmd_cnxk_hw_session_base_get;
 	rte_pmd_cnxk_inl_dev_qptr_get;
 	rte_pmd_cnxk_inl_dev_submit;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 32/33] net/cnxk: add option to enable custom inbound sa usage
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (30 preceding siblings ...)
  2024-09-10  8:59 ` [PATCH 31/33] net/cnxk: add PMD API to retrieve CPT queue statistics Nithin Dabilpuram
@ 2024-09-10  8:59 ` Nithin Dabilpuram
  2024-09-10  8:59 ` [PATCH 33/33] net/cnxk: add PMD API to retrieve the model string Nithin Dabilpuram
                   ` (3 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:59 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Srujana Challa

From: Srujana Challa <schalla@marvell.com>

Introduces a device argument (custom_inb_sa) to activate the usage of
custom inbound SA. If inline device is used then this device argument
will be required for both inline device and eth device. With
custom_inb_sa configuration, application can do the post processing
of inline IPsec inbound packet directly.
This patch also adds a RTE PMD API for inline inbound param1 and
param2 configuration.

Signed-off-by: Srujana Challa <schalla@marvell.com>
---
 doc/guides/nics/cnxk.rst               | 25 +++++++++++++++++++++
 drivers/common/cnxk/roc_nix.h          |  3 +++
 drivers/common/cnxk/roc_nix_inl.c      | 19 +++++++++++++---
 drivers/common/cnxk/roc_nix_inl.h      |  2 ++
 drivers/common/cnxk/roc_nix_inl_dev.c  |  3 +++
 drivers/common/cnxk/roc_nix_inl_priv.h |  1 +
 drivers/net/cnxk/cn10k_ethdev.c        |  2 +-
 drivers/net/cnxk/cn10k_ethdev_sec.c    |  5 +++++
 drivers/net/cnxk/cnxk_ethdev.c         |  3 +++
 drivers/net/cnxk/cnxk_ethdev_devargs.c |  4 ++++
 drivers/net/cnxk/cnxk_ethdev_sec.c     | 21 +++++++++++++++---
 drivers/net/cnxk/rte_pmd_cnxk.h        | 30 +++++++++++++++++++++++---
 drivers/net/cnxk/version.map           |  1 +
 13 files changed, 109 insertions(+), 10 deletions(-)

diff --git a/doc/guides/nics/cnxk.rst b/doc/guides/nics/cnxk.rst
index ff380c10e9..85ca555115 100644
--- a/doc/guides/nics/cnxk.rst
+++ b/doc/guides/nics/cnxk.rst
@@ -457,6 +457,19 @@ Runtime Config Options
    With the above configuration, the driver would disable custom meta aura feature
    for the device ``0002:02:00.0``.
 
+- ``Enable custom sa for inbound inline IPsec`` (default ``0``)
+
+   Custom SA for inbound inline IPsec can be enabled by specifying ``custom_inb_sa``
+   ``devargs`` parameter. This option needs to be given to both ethdev and inline
+   device.
+
+   For example::
+
+      -a 0002:02:00.0,custom_inb_sa=1
+
+   With the above configuration, inline inbound IPsec post processing should be done
+   by the application.
+
 .. note::
 
    Above devarg parameters are configurable per device, user needs to pass the
@@ -655,6 +668,18 @@ Runtime Config Options for inline device
    With the above configuration, driver would enable packet inject from ARM cores
    to crypto to process and send back in Rx path.
 
+- ``Enable custom sa for inbound inline IPsec`` (default ``0``)
+
+   Custom SA for inbound inline IPsec can be enabled by specifying ``custom_inb_sa``
+   ``devargs`` parameter with both inline device and ethdev.
+
+   For example::
+
+      -a 0002:1d:00.0,custom_inb_sa=1
+
+   With the above configuration, inline inbound IPsec post processing should be done
+   by the application.
+
 Port Representors
 -----------------
 
diff --git a/drivers/common/cnxk/roc_nix.h b/drivers/common/cnxk/roc_nix.h
index 25cf261348..f213823b9b 100644
--- a/drivers/common/cnxk/roc_nix.h
+++ b/drivers/common/cnxk/roc_nix.h
@@ -473,7 +473,10 @@ struct roc_nix {
 	bool force_rx_aura_bp;
 	bool custom_meta_aura_ena;
 	bool rx_inj_ena;
+	bool custom_inb_sa;
 	uint32_t root_sched_weight;
+	uint16_t inb_cfg_param1;
+	uint16_t inb_cfg_param2;
 	/* End of input parameters */
 	/* LMT line base for "Per Core Tx LMT line" mode*/
 	uintptr_t lmt_base;
diff --git a/drivers/common/cnxk/roc_nix_inl.c b/drivers/common/cnxk/roc_nix_inl.c
index 4d6d0cab5f..ea199b763d 100644
--- a/drivers/common/cnxk/roc_nix_inl.c
+++ b/drivers/common/cnxk/roc_nix_inl.c
@@ -406,6 +406,8 @@ nix_inl_inb_sa_tbl_setup(struct roc_nix *roc_nix)
 	/* CN9K SA size is different */
 	if (roc_model_is_cn9k())
 		inb_sa_sz = ROC_NIX_INL_ON_IPSEC_INB_SA_SZ;
+	else if (roc_nix->custom_inb_sa)
+		inb_sa_sz = ROC_NIX_INL_INB_CUSTOM_SA_SZ;
 	else
 		inb_sa_sz = ROC_NIX_INL_OT_IPSEC_INB_SA_SZ;
 
@@ -910,6 +912,11 @@ roc_nix_inl_inb_init(struct roc_nix *roc_nix)
 		cfg.param1 = u.u16;
 		cfg.param2 = 0;
 		cfg.opcode = (ROC_IE_OT_MAJOR_OP_PROCESS_INBOUND_IPSEC | (1 << 6));
+
+		if (roc_nix->custom_inb_sa) {
+			cfg.param1 = roc_nix->inb_cfg_param1;
+			cfg.param2 = roc_nix->inb_cfg_param2;
+		}
 		rc = roc_nix_bpids_alloc(roc_nix, ROC_NIX_INTF_TYPE_CPT_NIX, 1, bpids);
 		if (rc > 0) {
 			nix->cpt_nixbpid = bpids[0];
@@ -1769,7 +1776,6 @@ roc_nix_inl_ctx_write(struct roc_nix *roc_nix, void *sa_dptr, void *sa_cptr,
 	if (roc_model_is_cn9k()) {
 		return 0;
 	}
-
 	if (idev)
 		inl_dev = idev->nix_inl_dev;
 
@@ -1777,6 +1783,11 @@ roc_nix_inl_ctx_write(struct roc_nix *roc_nix, void *sa_dptr, void *sa_cptr,
 		return -EINVAL;
 
 	if (roc_nix) {
+		if (inb && roc_nix->custom_inb_sa && sa_len > ROC_NIX_INL_INB_CUSTOM_SA_SZ) {
+			plt_nix_dbg("SA length: %u is more than allocated length: %u\n", sa_len,
+				    ROC_NIX_INL_INB_CUSTOM_SA_SZ);
+			return -EINVAL;
+		}
 		nix = roc_nix_to_nix_priv(roc_nix);
 		outb_lf = nix->cpt_lf_base;
 
@@ -1891,6 +1902,7 @@ roc_nix_inl_ts_pkind_set(struct roc_nix *roc_nix, bool ts_ena, bool inb_inl_dev)
 	uint16_t max_spi = 0;
 	uint32_t rq_refs = 0;
 	uint8_t pkind = 0;
+	size_t inb_sa_sz;
 	int i;
 
 	if (roc_model_is_cn9k())
@@ -1908,6 +1920,7 @@ roc_nix_inl_ts_pkind_set(struct roc_nix *roc_nix, bool ts_ena, bool inb_inl_dev)
 		if (!nix->inl_inb_ena)
 			return 0;
 		sa_base = nix->inb_sa_base;
+		inb_sa_sz = nix->inb_sa_sz;
 		max_spi = roc_nix->ipsec_in_max_spi;
 	}
 
@@ -1919,6 +1932,7 @@ roc_nix_inl_ts_pkind_set(struct roc_nix *roc_nix, bool ts_ena, bool inb_inl_dev)
 			inl_dev->ts_ena = ts_ena;
 			max_spi = inl_dev->ipsec_in_max_spi;
 			sa_base = inl_dev->inb_sa_base;
+			inb_sa_sz = inl_dev->inb_sa_sz;
 		} else if (inl_dev->ts_ena != ts_ena) {
 			if (inl_dev->ts_ena)
 				plt_err("Inline device is already configured with TS enable");
@@ -1937,8 +1951,7 @@ roc_nix_inl_ts_pkind_set(struct roc_nix *roc_nix, bool ts_ena, bool inb_inl_dev)
 		return 0;
 
 	for (i = 0; i < max_spi; i++) {
-		sa = ((uint8_t *)sa_base) +
-		     (i * ROC_NIX_INL_OT_IPSEC_INB_SA_SZ);
+		sa = ((uint8_t *)sa_base) + (i * inb_sa_sz);
 		((struct roc_ot_ipsec_inb_sa *)sa)->w0.s.pkind = pkind;
 	}
 	return 0;
diff --git a/drivers/common/cnxk/roc_nix_inl.h b/drivers/common/cnxk/roc_nix_inl.h
index 16cead7fa4..e26e3fe38c 100644
--- a/drivers/common/cnxk/roc_nix_inl.h
+++ b/drivers/common/cnxk/roc_nix_inl.h
@@ -33,6 +33,7 @@
 
 #define ROC_NIX_INL_MAX_SOFT_EXP_RNGS                                          \
 	(PLT_MAX_ETHPORTS * ROC_NIX_SOFT_EXP_PER_PORT_MAX_RINGS)
+#define ROC_NIX_INL_INB_CUSTOM_SA_SZ 512
 
 /* Reassembly configuration */
 #define ROC_NIX_INL_REAS_ACTIVE_LIMIT	  0xFFF
@@ -97,6 +98,7 @@ struct roc_nix_inl_dev {
 	uint32_t meta_buf_sz;
 	uint32_t max_ipsec_rules;
 	uint8_t rx_inj_ena; /* Rx Inject Enable */
+	uint8_t custom_inb_sa;
 	/* End of input parameters */
 
 #define ROC_NIX_INL_MEM_SZ (2048)
diff --git a/drivers/common/cnxk/roc_nix_inl_dev.c b/drivers/common/cnxk/roc_nix_inl_dev.c
index 84c69a44c5..753b60563a 100644
--- a/drivers/common/cnxk/roc_nix_inl_dev.c
+++ b/drivers/common/cnxk/roc_nix_inl_dev.c
@@ -420,6 +420,8 @@ nix_inl_nix_setup(struct nix_inl_dev *inl_dev)
 	/* CN9K SA is different */
 	if (roc_model_is_cn9k())
 		inb_sa_sz = ROC_NIX_INL_ON_IPSEC_INB_SA_SZ;
+	else if (inl_dev->custom_inb_sa)
+		inb_sa_sz = ROC_NIX_INL_INB_CUSTOM_SA_SZ;
 	else
 		inb_sa_sz = ROC_NIX_INL_OT_IPSEC_INB_SA_SZ;
 
@@ -942,6 +944,7 @@ roc_nix_inl_dev_init(struct roc_nix_inl_dev *roc_inl_dev)
 	inl_dev->nb_meta_bufs = roc_inl_dev->nb_meta_bufs;
 	inl_dev->meta_buf_sz = roc_inl_dev->meta_buf_sz;
 	inl_dev->soft_exp_poll_freq = roc_inl_dev->soft_exp_poll_freq;
+	inl_dev->custom_inb_sa = roc_inl_dev->custom_inb_sa;
 
 	if (roc_inl_dev->rx_inj_ena) {
 		inl_dev->rx_inj_ena = 1;
diff --git a/drivers/common/cnxk/roc_nix_inl_priv.h b/drivers/common/cnxk/roc_nix_inl_priv.h
index 64b8b3977d..e5494fd71a 100644
--- a/drivers/common/cnxk/roc_nix_inl_priv.h
+++ b/drivers/common/cnxk/roc_nix_inl_priv.h
@@ -94,6 +94,7 @@ struct nix_inl_dev {
 	uint32_t nb_meta_bufs;
 	uint32_t meta_buf_sz;
 	uint8_t rx_inj_ena; /* Rx Inject Enable */
+	uint8_t custom_inb_sa;
 
 	/* NPC */
 	int *ipsec_index;
diff --git a/drivers/net/cnxk/cn10k_ethdev.c b/drivers/net/cnxk/cn10k_ethdev.c
index d335f3971b..bf9c97020a 100644
--- a/drivers/net/cnxk/cn10k_ethdev.c
+++ b/drivers/net/cnxk/cn10k_ethdev.c
@@ -36,7 +36,7 @@ nix_rx_offload_flags(struct rte_eth_dev *eth_dev)
 	if (!dev->ptype_disable)
 		flags |= NIX_RX_OFFLOAD_PTYPE_F;
 
-	if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SECURITY)
+	if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SECURITY && !dev->nix.custom_inb_sa)
 		flags |= NIX_RX_OFFLOAD_SECURITY_F;
 
 	return flags;
diff --git a/drivers/net/cnxk/cn10k_ethdev_sec.c b/drivers/net/cnxk/cn10k_ethdev_sec.c
index f22f2ae12d..84e570f5b9 100644
--- a/drivers/net/cnxk/cn10k_ethdev_sec.c
+++ b/drivers/net/cnxk/cn10k_ethdev_sec.c
@@ -754,6 +754,9 @@ cn10k_eth_sec_session_create(void *device,
 	else if (conf->protocol != RTE_SECURITY_PROTOCOL_IPSEC)
 		return -ENOTSUP;
 
+	if (nix->custom_inb_sa)
+		return -ENOTSUP;
+
 	if (rte_security_dynfield_register() < 0)
 		return -ENOTSUP;
 
@@ -1038,6 +1041,8 @@ cn10k_eth_sec_session_destroy(void *device, struct rte_security_session *sess)
 			return cnxk_eth_macsec_session_destroy(dev, sess);
 		return -ENOENT;
 	}
+	if (dev->nix.custom_inb_sa)
+		return -ENOTSUP;
 
 	lock = eth_sec->inb ? &dev->inb.lock : &dev->outb.lock;
 	rte_spinlock_lock(lock);
diff --git a/drivers/net/cnxk/cnxk_ethdev.c b/drivers/net/cnxk/cnxk_ethdev.c
index 13b7e8a38c..c7723800ef 100644
--- a/drivers/net/cnxk/cnxk_ethdev.c
+++ b/drivers/net/cnxk/cnxk_ethdev.c
@@ -1269,6 +1269,9 @@ cnxk_nix_configure(struct rte_eth_dev *eth_dev)
 	dev->rx_offloads = rxmode->offloads;
 	dev->tx_offloads = txmode->offloads;
 
+	if (nix->custom_inb_sa)
+		dev->rx_offloads |= RTE_ETH_RX_OFFLOAD_SECURITY;
+
 	/* Prepare rx cfg */
 	rx_cfg = ROC_NIX_LF_RX_CFG_DIS_APAD;
 	if (dev->rx_offloads &
diff --git a/drivers/net/cnxk/cnxk_ethdev_devargs.c b/drivers/net/cnxk/cnxk_ethdev_devargs.c
index 3454295d7d..5bd50bb9a1 100644
--- a/drivers/net/cnxk/cnxk_ethdev_devargs.c
+++ b/drivers/net/cnxk/cnxk_ethdev_devargs.c
@@ -281,6 +281,7 @@ parse_val_u16(const char *key, const char *value, void *extra_args)
 #define CNXK_FLOW_AGING_POLL_FREQ	"aging_poll_freq"
 #define CNXK_NIX_RX_INJ_ENABLE	"rx_inj_ena"
 #define CNXK_CUSTOM_META_AURA_DIS "custom_meta_aura_dis"
+#define CNXK_CUSTOM_INB_SA	  "custom_inb_sa"
 
 int
 cnxk_ethdev_parse_devargs(struct rte_devargs *devargs, struct cnxk_eth_dev *dev)
@@ -304,6 +305,7 @@ cnxk_ethdev_parse_devargs(struct rte_devargs *devargs, struct cnxk_eth_dev *dev)
 	uint16_t scalar_enable = 0;
 	uint16_t tx_compl_ena = 0;
 	uint16_t custom_sa_act = 0;
+	uint8_t custom_inb_sa = 0;
 	struct rte_kvargs *kvlist;
 	uint32_t meta_buf_sz = 0;
 	uint16_t no_inl_dev = 0;
@@ -362,6 +364,7 @@ cnxk_ethdev_parse_devargs(struct rte_devargs *devargs, struct cnxk_eth_dev *dev)
 	rte_kvargs_process(kvlist, CNXK_NIX_RX_INJ_ENABLE, &parse_flag, &rx_inj_ena);
 	rte_kvargs_process(kvlist, CNXK_CUSTOM_META_AURA_DIS, &parse_flag,
 			   &custom_meta_aura_dis);
+	rte_kvargs_process(kvlist, CNXK_CUSTOM_INB_SA, &parse_flag, &custom_inb_sa);
 	rte_kvargs_free(kvlist);
 
 null_devargs:
@@ -381,6 +384,7 @@ cnxk_ethdev_parse_devargs(struct rte_devargs *devargs, struct cnxk_eth_dev *dev)
 	dev->nix.lock_rx_ctx = lock_rx_ctx;
 	dev->nix.custom_sa_action = custom_sa_act;
 	dev->nix.sqb_slack = sqb_slack;
+	dev->nix.custom_inb_sa = custom_inb_sa;
 
 	if (roc_feature_nix_has_own_meta_aura())
 		dev->nix.meta_buf_sz = meta_buf_sz;
diff --git a/drivers/net/cnxk/cnxk_ethdev_sec.c b/drivers/net/cnxk/cnxk_ethdev_sec.c
index 32b6946ac1..051588e65e 100644
--- a/drivers/net/cnxk/cnxk_ethdev_sec.c
+++ b/drivers/net/cnxk/cnxk_ethdev_sec.c
@@ -19,6 +19,7 @@
 #define CNXK_NIX_SOFT_EXP_POLL_FREQ   "soft_exp_poll_freq"
 #define CNXK_MAX_IPSEC_RULES	"max_ipsec_rules"
 #define CNXK_NIX_INL_RX_INJ_ENABLE	"rx_inj_ena"
+#define CNXK_NIX_CUSTOM_INB_SA	      "custom_inb_sa"
 
 /* Default soft expiry poll freq in usec */
 #define CNXK_NIX_SOFT_EXP_POLL_FREQ_DFLT 100
@@ -198,7 +199,7 @@ parse_max_ipsec_rules(const char *key, const char *value, void *extra_args)
 }
 
 static int
-parse_inl_rx_inj_ena(const char *key, const char *value, void *extra_args)
+parse_val_u8(const char *key, const char *value, void *extra_args)
 {
 	RTE_SET_USED(key);
 	uint32_t val;
@@ -412,6 +413,16 @@ rte_pmd_cnxk_inl_ipsec_res(struct rte_mbuf *mbuf)
 	return (void *)(wqe + 64 + desc_size);
 }
 
+void
+rte_pmd_cnxk_hw_inline_inb_cfg_set(uint16_t portid, struct rte_pmd_cnxk_ipsec_inb_cfg *cfg)
+{
+	struct rte_eth_dev *eth_dev = &rte_eth_devices[portid];
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	dev->nix.inb_cfg_param1 = cfg->param1;
+	dev->nix.inb_cfg_param2 = cfg->param2;
+}
+
 static unsigned int
 cnxk_eth_sec_session_get_size(void *device __rte_unused)
 {
@@ -481,6 +492,7 @@ nix_inl_parse_devargs(struct rte_devargs *devargs,
 	struct inl_cpt_channel cpt_channel;
 	uint32_t max_ipsec_rules = 0;
 	struct rte_kvargs *kvlist;
+	uint8_t custom_inb_sa = 0;
 	uint32_t nb_meta_bufs = 0;
 	uint32_t meta_buf_sz = 0;
 	uint8_t rx_inj_ena = 0;
@@ -510,7 +522,8 @@ nix_inl_parse_devargs(struct rte_devargs *devargs,
 	rte_kvargs_process(kvlist, CNXK_NIX_SOFT_EXP_POLL_FREQ,
 			   &parse_val_u32, &soft_exp_poll_freq);
 	rte_kvargs_process(kvlist, CNXK_MAX_IPSEC_RULES, &parse_max_ipsec_rules, &max_ipsec_rules);
-	rte_kvargs_process(kvlist, CNXK_NIX_INL_RX_INJ_ENABLE, &parse_inl_rx_inj_ena, &rx_inj_ena);
+	rte_kvargs_process(kvlist, CNXK_NIX_INL_RX_INJ_ENABLE, &parse_val_u8, &rx_inj_ena);
+	rte_kvargs_process(kvlist, CNXK_NIX_CUSTOM_INB_SA, &parse_val_u8, &custom_inb_sa);
 	rte_kvargs_free(kvlist);
 
 null_devargs:
@@ -526,6 +539,7 @@ nix_inl_parse_devargs(struct rte_devargs *devargs,
 	inl_dev->max_ipsec_rules = max_ipsec_rules;
 	if (roc_feature_nix_has_rx_inject())
 		inl_dev->rx_inj_ena = rx_inj_ena;
+	inl_dev->custom_inb_sa = custom_inb_sa;
 	return 0;
 exit:
 	return -EINVAL;
@@ -654,4 +668,5 @@ RTE_PMD_REGISTER_PARAM_STRING(cnxk_nix_inl,
 			      CNXK_NIX_INL_META_BUF_SZ "=<1-U32_MAX>"
 			      CNXK_NIX_SOFT_EXP_POLL_FREQ "=<0-U32_MAX>"
 			      CNXK_MAX_IPSEC_RULES "=<1-4095>"
-			      CNXK_NIX_INL_RX_INJ_ENABLE "=1");
+			      CNXK_NIX_INL_RX_INJ_ENABLE "=1"
+			      CNXK_NIX_CUSTOM_INB_SA "=1");
diff --git a/drivers/net/cnxk/rte_pmd_cnxk.h b/drivers/net/cnxk/rte_pmd_cnxk.h
index dcb4f334fe..e207f43c80 100644
--- a/drivers/net/cnxk/rte_pmd_cnxk.h
+++ b/drivers/net/cnxk/rte_pmd_cnxk.h
@@ -49,13 +49,13 @@ enum rte_pmd_cnxk_sec_action_alg {
 
 /** CPT queue type for obtaining queue hardware statistics. */
 enum rte_pmd_cnxk_cpt_q_stats_type {
-	/** Type to get Inline Device LF(s) statistics */
+	/** Type to get Inline Device queue(s) statistics */
 	RTE_PMD_CNXK_CPT_Q_STATS_INL_DEV,
-	/** Type to get Inline Inbound LF which is attached to kernel device
+	/** Type to get Inline Inbound queue which is attached to kernel device
 	 * statistics.
 	 */
 	RTE_PMD_CNXK_CPT_Q_STATS_KERNEL,
-	/** Type to get CPT LF which is attached to ethdev statistics */
+	/** Type to get CPT queue which is attached to ethdev statistics */
 	RTE_PMD_CNXK_CPT_Q_STATS_ETHDEV = 2,
 };
 
@@ -513,6 +513,18 @@ union rte_pmd_cnxk_cpt_res_s {
 	uint64_t u64[2];
 };
 
+/** Inline IPsec inbound queue configuration */
+struct rte_pmd_cnxk_ipsec_inb_cfg {
+	/** Param1 of PROCESS_INBOUND_IPSEC_PACKET as mentioned in the CPT
+	 * microcode document.
+	 */
+	uint16_t param1;
+	/** Param2 of PROCESS_INBOUND_IPSEC_PACKET as mentioned in the CPT
+	 * microcode document.
+	 */
+	uint16_t param2;
+};
+
 /** Forward structure declaration for inline device queue. Applications obtain a pointer
  * to this structure using the ``rte_pmd_cnxk_inl_dev_qptr_get`` API and use it to submit
  * CPT instructions (cpt_inst_s) to the inline device via the
@@ -656,4 +668,16 @@ uint16_t rte_pmd_cnxk_inl_dev_submit(struct rte_pmd_cnxk_inl_dev_q *qptr, void *
 __rte_experimental
 int rte_pmd_cnxk_cpt_q_stats_get(uint16_t portid, enum rte_pmd_cnxk_cpt_q_stats_type type,
 				 struct rte_pmd_cnxk_cpt_q_stats *stats, uint16_t idx);
+
+/**
+ * Set the configuration for hardware inline inbound IPsec processing. This API must be
+ * called before calling the ``rte_eth_dev_configure`` API.
+ *
+ * @param portid
+ *   Port identifier of Ethernet device.
+ * @param cfg
+ *   Pointer to the IPsec inbound configuration structure.
+ */
+__rte_experimental
+void rte_pmd_cnxk_hw_inline_inb_cfg_set(uint16_t portid, struct rte_pmd_cnxk_ipsec_inb_cfg *cfg);
 #endif /* _PMD_CNXK_H_ */
diff --git a/drivers/net/cnxk/version.map b/drivers/net/cnxk/version.map
index 02a02edc25..dd41e7bd56 100644
--- a/drivers/net/cnxk/version.map
+++ b/drivers/net/cnxk/version.map
@@ -11,6 +11,7 @@ EXPERIMENTAL {
 
 	# added in 23.11
 	rte_pmd_cnxk_cpt_q_stats_get;
+	rte_pmd_cnxk_hw_inline_inb_cfg_set;
 	rte_pmd_cnxk_hw_session_base_get;
 	rte_pmd_cnxk_inl_dev_qptr_get;
 	rte_pmd_cnxk_inl_dev_submit;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH 33/33] net/cnxk: add PMD API to retrieve the model string
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (31 preceding siblings ...)
  2024-09-10  8:59 ` [PATCH 32/33] net/cnxk: add option to enable custom inbound sa usage Nithin Dabilpuram
@ 2024-09-10  8:59 ` Nithin Dabilpuram
  2024-09-23 15:44 ` [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Jerin Jacob
                   ` (2 subsequent siblings)
  35 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-10  8:59 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Srujana Challa

From: Srujana Challa <schalla@marvell.com>

This patch adds PMD API to retrieve the model string. This API
allows applications to get the HW model string directly.

Signed-off-by: Srujana Challa <schalla@marvell.com>
---
 drivers/net/cnxk/cnxk_ethdev.c  | 7 +++++++
 drivers/net/cnxk/rte_pmd_cnxk.h | 9 +++++++++
 drivers/net/cnxk/version.map    | 1 +
 3 files changed, 17 insertions(+)

diff --git a/drivers/net/cnxk/cnxk_ethdev.c b/drivers/net/cnxk/cnxk_ethdev.c
index c7723800ef..23dc2a26cc 100644
--- a/drivers/net/cnxk/cnxk_ethdev.c
+++ b/drivers/net/cnxk/cnxk_ethdev.c
@@ -4,11 +4,18 @@
 #include <cnxk_ethdev.h>
 
 #include <rte_eventdev.h>
+#include <rte_pmd_cnxk.h>
 
 #define CNXK_NIX_CQ_INL_CLAMP_MAX (64UL * 1024UL)
 
 #define NIX_TM_DFLT_RR_WT 71
 
+const char *
+rte_pmd_cnxk_model_str_get(void)
+{
+	return roc_model->name;
+}
+
 static inline uint64_t
 nix_get_rx_offload_capa(struct cnxk_eth_dev *dev)
 {
diff --git a/drivers/net/cnxk/rte_pmd_cnxk.h b/drivers/net/cnxk/rte_pmd_cnxk.h
index e207f43c80..a20b4f277d 100644
--- a/drivers/net/cnxk/rte_pmd_cnxk.h
+++ b/drivers/net/cnxk/rte_pmd_cnxk.h
@@ -680,4 +680,13 @@ int rte_pmd_cnxk_cpt_q_stats_get(uint16_t portid, enum rte_pmd_cnxk_cpt_q_stats_
  */
 __rte_experimental
 void rte_pmd_cnxk_hw_inline_inb_cfg_set(uint16_t portid, struct rte_pmd_cnxk_ipsec_inb_cfg *cfg);
+
+/**
+ * Retrieves model name on which it is running as a string.
+ *
+ * @return
+ *   Returns model string, ex."cn10ka_a1"
+ */
+__rte_experimental
+const char *rte_pmd_cnxk_model_str_get(void);
 #endif /* _PMD_CNXK_H_ */
diff --git a/drivers/net/cnxk/version.map b/drivers/net/cnxk/version.map
index dd41e7bd56..099c518ecf 100644
--- a/drivers/net/cnxk/version.map
+++ b/drivers/net/cnxk/version.map
@@ -16,6 +16,7 @@ EXPERIMENTAL {
 	rte_pmd_cnxk_inl_dev_qptr_get;
 	rte_pmd_cnxk_inl_dev_submit;
 	rte_pmd_cnxk_inl_ipsec_res;
+	rte_pmd_cnxk_model_str_get;
 	rte_pmd_cnxk_sa_flush;
 };
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH 00/33] add Marvell cn20k SOC support for mempool and net
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (32 preceding siblings ...)
  2024-09-10  8:59 ` [PATCH 33/33] net/cnxk: add PMD API to retrieve the model string Nithin Dabilpuram
@ 2024-09-23 15:44 ` Jerin Jacob
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
  35 siblings, 0 replies; 75+ messages in thread
From: Jerin Jacob @ 2024-09-23 15:44 UTC (permalink / raw)
  To: Nithin Dabilpuram; +Cc: jerinj, dev

On Tue, Sep 10, 2024 at 2:54 PM Nithin Dabilpuram
<ndabilpuram@marvell.com> wrote:
>
> This series adds support for Marvell cn20k SOC for mempool and
> net PMD's.
>
> This series also adds few net/cnxk PMD updates to expose IPsec
> features supported by HW that are very custom in nature and
> some enhancements for cn10k.


# Please update release note for mempool cn20k driver support
# Please update release note for ethdev cn20k driver support
# Please split non cn20k driver patches to separate series.
# Please fix the following build issue


total: 0 errors, 0 warnings
Applying: net/cnxk: add PMD APIs to submit CPT instruction


ccache clang -Idrivers/libtmp_rte_net_cnxk.a.p -Idrivers -I../drivers
-Idrivers/net/cnxk -I../drivers/net/cnxk -Ilib/ethdev -I../lib/ethdev
-I. -I.. -Iconfig -I../config -Ilib/eal/include -I../lib/eal/include
-Ilib/eal/linux/include -I..
/lib/eal/linux/include -Ilib/eal/x86/include -I../lib/eal/x86/include
-Ilib/eal/common -I../lib/eal/common -Ilib/eal -I../lib/eal
-Ilib/kvargs -I../lib/kvargs -Ilib/log -I../lib/log -Ilib/metrics
-I../lib/metrics -Ilib/telemetry -I../lib
/telemetry -Ilib/net -I../lib/net -Ilib/mbuf -I../lib/mbuf
-Ilib/mempool -I../lib/mempool -Ilib/ring -I../lib/ring -Ilib/meter
-I../lib/meter -Idrivers/bus/pci -I../drivers/bus/pci
-I../drivers/bus/pci/linux -Ilib/pci -I../lib/pci -Idriv
ers/bus/vdev -I../drivers/bus/vdev -Ilib/cryptodev -I../lib/cryptodev
-Ilib/rcu -I../lib/rcu -Ilib/eventdev -I../lib/eventdev -Ilib/hash
-I../lib/hash -Ilib/timer -I../lib/timer -Ilib/dmadev -I../lib/dmadev
-Ilib/security -I../lib/securi
ty -Idrivers/common/cnxk -I../drivers/common/cnxk
-Idrivers/mempool/cnxk -I../drivers/mempool/cnxk
-fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch
-Wextra -Werror -std=c11 -O2 -g -include rte_config.h -Wcast-qual -W
deprecated -Wformat -Wformat-nonliteral -Wformat-security
-Wmissing-declarations -Wmissing-prototypes -Wnested-externs
-Wold-style-definition -Wpointer-arith -Wsign-compare
-Wstrict-prototypes -Wundef -Wwrite-strings -Wno-address-of-pack
ed-member -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC
-march=native -mrtm -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API
-Wno-format-truncation -flax-vector-conversions -Wno-strict-aliasing
-Wno-asm-operand-widths -DRTE_LOG_DEFAUL
T_LOGTYPE=pmd.net.cnxk -MD -MQ
drivers/libtmp_rte_net_cnxk.a.p/net_cnxk_cn20k_ethdev.c.o -MF
drivers/libtmp_rte_net_cnxk.a.p/net_cnxk_cn20k_ethdev.c.o.d -o
drivers/libtmp_rte_net_cnxk.a.p/net_cnxk_cn20k_ethdev.c.o -c
../drivers/net/cnxk/
cn20k_ethdev.c
In file included from ../drivers/net/cnxk/cn20k_ethdev.c:4:
In file included from ../drivers/net/cnxk/cn20k_ethdev.h:8:
../drivers/net/cnxk/cnxk_ethdev.h:516:8: error: address argument to
atomic operation must be a pointer to _Atomic type ('int32_t *' (aka
'int *') invalid)
  516 |         val = rte_atomic_fetch_sub_explicit(fc_sw, nb_inst,
__ATOMIC_RELAXED) - nb_inst;
      |               ^                             ~~~~~
../lib/eal/include/rte_stdatomic.h:95:2: note: expanded from macro
'rte_atomic_fetch_sub_explicit'
   95 |         atomic_fetch_sub_explicit(ptr, val, memorder)
      |         ^                         ~~~
/usr/lib/clang/18/include/stdatomic.h:154:35: note: expanded from
macro 'atomic_fetch_sub_explicit'
  154 | #define atomic_fetch_sub_explicit __c11_atomic_fetch_sub
      |                                   ^
In file included from ../drivers/net/cnxk/cn20k_ethdev.c:4:
In file included from ../drivers/net/cnxk/cn20k_ethdev.h:8:
../drivers/net/cnxk/cnxk_ethdev.h:520:30: error: address argument to
atomic operation must be a pointer to _Atomic type ('uint64_t *' (aka
'unsigned long *') invalid)
  520 |         newval = (int64_t)nb_desc -
rte_atomic_load_explicit(fc, __ATOMIC_RELAXED);
      |                                     ^                        ~~
../lib/eal/include/rte_stdatomic.h:73:2: note: expanded from macro
'rte_atomic_load_explicit'
   73 |         atomic_load_explicit(ptr, memorder)
      |         ^                    ~~~
/usr/lib/clang/18/include/stdatomic.h:139:30: note: expanded from
macro 'atomic_load_explicit'
  139 | #define atomic_load_explicit __c11_atomic_load
      |                              ^
In file included from ../drivers/net/cnxk/cn20k_ethdev.c:4:
In file included from ../drivers/net/cnxk/cn20k_ethdev.h:8:
../drivers/net/cnxk/cnxk_ethdev.h:523:7: error: address argument to
atomic operation must be a pointer to _Atomic type ('int32_t *' (aka
'int *') invalid)
  523 |         if
(!rte_atomic_compare_exchange_strong_explicit(fc_sw, &val, newval,
__ATOMIC_RELEASE,
      |              ^                                           ~~~~~
../lib/eal/include/rte_stdatomic.h:83:2: note: expanded from macro
'rte_atomic_compare_exchange_strong_explicit'
   83 |         atomic_compare_exchange_strong_explicit(ptr, expected,
desired, \
      |         ^                                       ~~~
/usr/lib/clang/18/include/stdatomic.h:145:49: note: expanded from
macro 'atomic_compare_exchange_strong_explicit'
  145 | #define atomic_compare_exchange_strong_explicit
__c11_atomic_compare_exchange_strong
      |                                                 ^
3 errors generated.

[for-main]dell[dpdk-next-net-mrvl] $ clang -v
clang version 18.1.8
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-pc-linux-gnu/14.2.1
Found candidate GCC installation:
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1
Selected GCC installation: /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 00/18] add Marvell cn20k SOC support for mempool and net
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (33 preceding siblings ...)
  2024-09-23 15:44 ` [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Jerin Jacob
@ 2024-09-26 16:01 ` Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 01/18] mempool/cnxk: add cn20k PCI device ids Nithin Dabilpuram
                     ` (18 more replies)
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
  35 siblings, 19 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-26 16:01 UTC (permalink / raw)
  To: jerinj; +Cc: dev, Nithin Dabilpuram

This series adds support for Marvell cn20k SOC for mempool and
net PMD's.

This series also adds few net/cnxk PMD updates to expose IPsec
features supported by HW that are very custom in nature and
some enhancements for cn10k.


Ashwin Sekhar T K (4):
  mempool/cnxk: add cn20k PCI device ids
  common/cnxk: accommodate change in aura field width
  common/cnxk: use new NPA aq enq mbox for cn20k
  mempool/cnxk: initialize mempool ops for cn20k

Nithin Dabilpuram (9):
  net/cnxk: add cn20k base control path support
  net/cnxk: support Rx function select for cn20k
  net/cnxk: support Tx function select for cn20k
  net/cnxk: support Rx burst scalar for cn20k
  net/cnxk: support Rx burst vector for cn20k
  net/cnxk: support Tx burst scalar for cn20k
  net/cnxk: support Tx multi-seg in cn20k
  net/cnxk: support Tx burst vector for cn20k
  net/cnxk: support Tx multi-seg in vector for cn20k

Satha Rao (5):
  common/cnxk: add cn20k NIX register definitions
  common/cnxk: support NIX queue config for cn20k
  common/cnxk: support bandwidth profile for cn20k
  common/cnxk: support NIX debug for cn20k
  common/cnxk: add RSS support for cn20k

v2:
- Moved out 15 patches to seperate series
- Updated release notes

 doc/guides/rel_notes/release_24_11.rst        |    8 +
 drivers/common/cnxk/cnxk_telemetry_nix.c      |  260 +-
 drivers/common/cnxk/hw/nix.h                  |  524 ++-
 drivers/common/cnxk/hw/npa.h                  |  164 +-
 drivers/common/cnxk/hw/rvu.h                  |    7 +-
 drivers/common/cnxk/roc_mbox.h                |   84 +
 drivers/common/cnxk/roc_nix.c                 |   15 +-
 drivers/common/cnxk/roc_nix_bpf.c             |  528 ++-
 drivers/common/cnxk/roc_nix_debug.c           |  243 +-
 drivers/common/cnxk/roc_nix_fc.c              |  106 +-
 drivers/common/cnxk/roc_nix_inl.c             |    2 +
 drivers/common/cnxk/roc_nix_priv.h            |    4 +-
 drivers/common/cnxk/roc_nix_queue.c           |  638 ++-
 drivers/common/cnxk/roc_nix_rss.c             |   74 +-
 drivers/common/cnxk/roc_nix_stats.c           |   55 +-
 drivers/common/cnxk/roc_nix_tm.c              |   22 +-
 drivers/common/cnxk/roc_nix_tm_ops.c          |   29 +-
 drivers/common/cnxk/roc_npa.c                 |  100 +-
 drivers/common/cnxk/roc_npa.h                 |   24 +-
 drivers/common/cnxk/roc_npa_debug.c           |   17 +-
 drivers/mempool/cnxk/cnxk_mempool.c           |    2 +
 drivers/mempool/cnxk/cnxk_mempool_ops.c       |    2 +-
 drivers/net/cnxk/cn20k_ethdev.c               |  943 +++++
 drivers/net/cnxk/cn20k_ethdev.h               |   15 +
 drivers/net/cnxk/cn20k_rx.h                   | 1100 ++++++
 drivers/net/cnxk/cn20k_rx_select.c            |  160 +
 drivers/net/cnxk/cn20k_rxtx.h                 |  245 ++
 drivers/net/cnxk/cn20k_tx.h                   | 3471 +++++++++++++++++
 drivers/net/cnxk/cn20k_tx_select.c            |  122 +
 drivers/net/cnxk/cnxk_ethdev_dp.h             |    3 +
 drivers/net/cnxk/meson.build                  |   92 +-
 drivers/net/cnxk/rx/cn20k/rx_0_15.c           |   20 +
 drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c      |   20 +
 drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c       |   20 +
 drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c  |   20 +
 drivers/net/cnxk/rx/cn20k/rx_112_127.c        |   20 +
 drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c   |   20 +
 drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c    |   20 +
 .../net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c   |   20 +
 drivers/net/cnxk/rx/cn20k/rx_16_31.c          |   20 +
 drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c     |   20 +
 drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c      |   20 +
 drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c |   20 +
 drivers/net/cnxk/rx/cn20k/rx_32_47.c          |   20 +
 drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c     |   20 +
 drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c      |   20 +
 drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c |   20 +
 drivers/net/cnxk/rx/cn20k/rx_48_63.c          |   20 +
 drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c     |   20 +
 drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c      |   20 +
 drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c |   20 +
 drivers/net/cnxk/rx/cn20k/rx_64_79.c          |   20 +
 drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c     |   20 +
 drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c      |   20 +
 drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c |   20 +
 drivers/net/cnxk/rx/cn20k/rx_80_95.c          |   20 +
 drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c     |   20 +
 drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c      |   20 +
 drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c |   20 +
 drivers/net/cnxk/rx/cn20k/rx_96_111.c         |   20 +
 drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c    |   20 +
 drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c     |   20 +
 .../net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c    |   20 +
 drivers/net/cnxk/rx/cn20k/rx_all_offload.c    |   57 +
 drivers/net/cnxk/tx/cn20k/tx_0_15.c           |   18 +
 drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c      |   18 +
 drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c       |   18 +
 drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c  |   18 +
 drivers/net/cnxk/tx/cn20k/tx_112_127.c        |   18 +
 drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c   |   18 +
 drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c    |   18 +
 .../net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c   |   18 +
 drivers/net/cnxk/tx/cn20k/tx_16_31.c          |   18 +
 drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c     |   18 +
 drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c      |   18 +
 drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c |   18 +
 drivers/net/cnxk/tx/cn20k/tx_32_47.c          |   18 +
 drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c     |   18 +
 drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c      |   18 +
 drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c |   18 +
 drivers/net/cnxk/tx/cn20k/tx_48_63.c          |   18 +
 drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c     |   18 +
 drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c      |   18 +
 drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c |   18 +
 drivers/net/cnxk/tx/cn20k/tx_64_79.c          |   18 +
 drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c     |   18 +
 drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c      |   18 +
 drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c |   18 +
 drivers/net/cnxk/tx/cn20k/tx_80_95.c          |   18 +
 drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c     |   18 +
 drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c      |   18 +
 drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c |   18 +
 drivers/net/cnxk/tx/cn20k/tx_96_111.c         |   18 +
 drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c    |   18 +
 drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c     |   18 +
 .../net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c    |   18 +
 drivers/net/cnxk/tx/cn20k/tx_all_offload.c    |   39 +
 97 files changed, 9979 insertions(+), 392 deletions(-)
 create mode 100644 drivers/net/cnxk/cn20k_ethdev.c
 create mode 100644 drivers/net/cnxk/cn20k_ethdev.h
 create mode 100644 drivers/net/cnxk/cn20k_rx.h
 create mode 100644 drivers/net/cnxk/cn20k_rx_select.c
 create mode 100644 drivers/net/cnxk/cn20k_rxtx.h
 create mode 100644 drivers/net/cnxk/cn20k_tx.h
 create mode 100644 drivers/net/cnxk/cn20k_tx_select.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_all_offload.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_all_offload.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 01/18] mempool/cnxk: add cn20k PCI device ids
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
@ 2024-09-26 16:01   ` Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 02/18] common/cnxk: accommodate change in aura field width Nithin Dabilpuram
                     ` (17 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-26 16:01 UTC (permalink / raw)
  To: jerinj, Ashwin Sekhar T K, Pavan Nikhilesh; +Cc: dev

From: Ashwin Sekhar T K <asekhar@marvell.com>

Add cn20k PCI device ids.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
---
 doc/guides/rel_notes/release_24_11.rst | 4 ++++
 drivers/mempool/cnxk/cnxk_mempool.c    | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index 0ff70d9057..3c666ddd10 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -55,6 +55,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Updated Marvell cnxk mempool driver.**
+
+  * Added support for HW mempool in CN20K SoC.
+
 
 Removed Items
 -------------
diff --git a/drivers/mempool/cnxk/cnxk_mempool.c b/drivers/mempool/cnxk/cnxk_mempool.c
index 1181b6f265..6ff11d8004 100644
--- a/drivers/mempool/cnxk/cnxk_mempool.c
+++ b/drivers/mempool/cnxk/cnxk_mempool.c
@@ -161,6 +161,7 @@ npa_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 }
 
 static const struct rte_pci_id npa_pci_map[] = {
+	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN20KA, PCI_DEVID_CNXK_RVU_NPA_PF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KA, PCI_DEVID_CNXK_RVU_NPA_PF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KAS, PCI_DEVID_CNXK_RVU_NPA_PF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KB, PCI_DEVID_CNXK_RVU_NPA_PF),
@@ -172,6 +173,7 @@ static const struct rte_pci_id npa_pci_map[] = {
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN9KD, PCI_DEVID_CNXK_RVU_NPA_PF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN9KE, PCI_DEVID_CNXK_RVU_NPA_PF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CNF9KA, PCI_DEVID_CNXK_RVU_NPA_PF),
+	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN20KA, PCI_DEVID_CNXK_RVU_NPA_VF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KA, PCI_DEVID_CNXK_RVU_NPA_VF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KAS, PCI_DEVID_CNXK_RVU_NPA_VF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KB, PCI_DEVID_CNXK_RVU_NPA_VF),
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 02/18] common/cnxk: accommodate change in aura field width
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 01/18] mempool/cnxk: add cn20k PCI device ids Nithin Dabilpuram
@ 2024-09-26 16:01   ` Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 03/18] common/cnxk: use new NPA aq enq mbox for cn20k Nithin Dabilpuram
                     ` (16 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-26 16:01 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Ashwin Sekhar T K

From: Ashwin Sekhar T K <asekhar@marvell.com>

Aura field width has changed from 20 bits to 17 bits in
cn20k. Adjust the bit fields accordingly for register
reads/writes.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
---
 drivers/common/cnxk/roc_npa.h | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/common/cnxk/roc_npa.h b/drivers/common/cnxk/roc_npa.h
index 4ad5f044b5..fbf75b2fca 100644
--- a/drivers/common/cnxk/roc_npa.h
+++ b/drivers/common/cnxk/roc_npa.h
@@ -16,6 +16,7 @@
 #else
 #include "roc_io_generic.h"
 #endif
+#include "roc_model.h"
 #include "roc_npa_dp.h"
 
 #define ROC_AURA_OP_LIMIT_MASK (BIT_ULL(36) - 1)
@@ -68,11 +69,12 @@ roc_npa_aura_op_alloc(uint64_t aura_handle, const int drop)
 static inline uint64_t
 roc_npa_aura_op_cnt_get(uint64_t aura_handle)
 {
-	uint64_t wdata;
+	uint64_t wdata, shift;
 	int64_t *addr;
 	uint64_t reg;
 
-	wdata = roc_npa_aura_handle_to_aura(aura_handle) << 44;
+	shift = roc_model_is_cn20k() ? 47 : 44;
+	wdata = roc_npa_aura_handle_to_aura(aura_handle) << shift;
 	addr = (int64_t *)(roc_npa_aura_handle_to_base(aura_handle) +
 			   NPA_LF_AURA_OP_CNT);
 	reg = roc_atomic64_add_nosync(wdata, addr);
@@ -87,11 +89,13 @@ static inline void
 roc_npa_aura_op_cnt_set(uint64_t aura_handle, const int sign, uint64_t count)
 {
 	uint64_t reg = count & (BIT_ULL(36) - 1);
+	uint64_t shift;
 
 	if (sign)
 		reg |= BIT_ULL(43); /* CNT_ADD */
 
-	reg |= (roc_npa_aura_handle_to_aura(aura_handle) << 44);
+	shift = roc_model_is_cn20k() ? 47 : 44;
+	reg |= (roc_npa_aura_handle_to_aura(aura_handle) << shift);
 
 	plt_write64(reg, roc_npa_aura_handle_to_base(aura_handle) +
 				 NPA_LF_AURA_OP_CNT);
@@ -100,11 +104,12 @@ roc_npa_aura_op_cnt_set(uint64_t aura_handle, const int sign, uint64_t count)
 static inline uint64_t
 roc_npa_aura_op_limit_get(uint64_t aura_handle)
 {
-	uint64_t wdata;
+	uint64_t wdata, shift;
 	int64_t *addr;
 	uint64_t reg;
 
-	wdata = roc_npa_aura_handle_to_aura(aura_handle) << 44;
+	shift = roc_model_is_cn20k() ? 47 : 44;
+	wdata = roc_npa_aura_handle_to_aura(aura_handle) << shift;
 	addr = (int64_t *)(roc_npa_aura_handle_to_base(aura_handle) +
 			   NPA_LF_AURA_OP_LIMIT);
 	reg = roc_atomic64_add_nosync(wdata, addr);
@@ -119,8 +124,10 @@ static inline void
 roc_npa_aura_op_limit_set(uint64_t aura_handle, uint64_t limit)
 {
 	uint64_t reg = limit & ROC_AURA_OP_LIMIT_MASK;
+	uint64_t shift;
 
-	reg |= (roc_npa_aura_handle_to_aura(aura_handle) << 44);
+	shift = roc_model_is_cn20k() ? 47 : 44;
+	reg |= (roc_npa_aura_handle_to_aura(aura_handle) << shift);
 
 	plt_write64(reg, roc_npa_aura_handle_to_base(aura_handle) +
 				 NPA_LF_AURA_OP_LIMIT);
@@ -129,11 +136,12 @@ roc_npa_aura_op_limit_set(uint64_t aura_handle, uint64_t limit)
 static inline uint64_t
 roc_npa_aura_op_available(uint64_t aura_handle)
 {
-	uint64_t wdata;
+	uint64_t wdata, shift;
 	uint64_t reg;
 	int64_t *addr;
 
-	wdata = roc_npa_aura_handle_to_aura(aura_handle) << 44;
+	shift = roc_model_is_cn20k() ? 47 : 44;
+	wdata = roc_npa_aura_handle_to_aura(aura_handle) << shift;
 	addr = (int64_t *)(roc_npa_aura_handle_to_base(aura_handle) +
 			   NPA_LF_POOL_OP_AVAILABLE);
 	reg = roc_atomic64_add_nosync(wdata, addr);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 03/18] common/cnxk: use new NPA aq enq mbox for cn20k
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 01/18] mempool/cnxk: add cn20k PCI device ids Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 02/18] common/cnxk: accommodate change in aura field width Nithin Dabilpuram
@ 2024-09-26 16:01   ` Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 04/18] mempool/cnxk: initialize mempool ops " Nithin Dabilpuram
                     ` (15 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-26 16:01 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Ashwin Sekhar T K

From: Ashwin Sekhar T K <asekhar@marvell.com>

A new mbox npa_cn20k_aq_enq_req has been added
for cn20k. Use this mbox for NPA configurations.

Note that the size of these new mbox request and
response remains same when compared to the older
mboxes. Also the new contexts npa_cn20k_aura_s/
npa_cn20k_pool_s which has been added for cn20k are
also same in size as older npa_aura_s/npa_pool_s.
So, we will be able to typecast these structures
into each other for most cases. Only the fields
that have changed in width/positions need to be
taken care of.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
---
 drivers/common/cnxk/hw/npa.h         | 164 ++++++++++++++++++++++++---
 drivers/common/cnxk/roc_mbox.h       |  32 ++++++
 drivers/common/cnxk/roc_nix_debug.c  |   9 +-
 drivers/common/cnxk/roc_nix_fc.c     |  54 ++++++---
 drivers/common/cnxk/roc_nix_tm_ops.c |  15 ++-
 drivers/common/cnxk/roc_npa.c        | 100 ++++++++++++++--
 drivers/common/cnxk/roc_npa_debug.c  |  17 ++-
 7 files changed, 339 insertions(+), 52 deletions(-)

diff --git a/drivers/common/cnxk/hw/npa.h b/drivers/common/cnxk/hw/npa.h
index 891a1b2b5f..4fd1f9a64b 100644
--- a/drivers/common/cnxk/hw/npa.h
+++ b/drivers/common/cnxk/hw/npa.h
@@ -216,10 +216,10 @@ struct npa_aura_op_wdata_s {
 	uint64_t drop : 1;
 };
 
-/* NPA aura context structure */
+/* NPA aura context structure [CN9K, CN10K] */
 struct npa_aura_s {
 	uint64_t pool_addr : 64; /* W0 */
-	uint64_t ena : 1;
+	uint64_t ena : 1; /* W1 */
 	uint64_t rsvd_66_65 : 2;
 	uint64_t pool_caching : 1;
 	uint64_t pool_way_mask : 16;
@@ -233,24 +233,24 @@ struct npa_aura_s {
 	uint64_t shift : 6;
 	uint64_t rsvd_119_118 : 2;
 	uint64_t avg_level : 8;
-	uint64_t count : 36;
+	uint64_t count : 36; /* W2 */
 	uint64_t rsvd_167_164 : 4;
 	uint64_t nix0_bpid : 9;
 	uint64_t rsvd_179_177 : 3;
 	uint64_t nix1_bpid : 9;
 	uint64_t rsvd_191_189 : 3;
-	uint64_t limit : 36;
+	uint64_t limit : 36; /* W3 */
 	uint64_t rsvd_231_228 : 4;
 	uint64_t bp : 8;
 	uint64_t rsvd_242_240 : 3;
-	uint64_t fc_be : 1; /* [CN10K, .) */
+	uint64_t fc_be : 1; /* [CN10K] */
 	uint64_t fc_ena : 1;
 	uint64_t fc_up_crossing : 1;
 	uint64_t fc_stype : 2;
 	uint64_t fc_hyst_bits : 4;
 	uint64_t rsvd_255_252 : 4;
 	uint64_t fc_addr : 64; /* W4 */
-	uint64_t pool_drop : 8;
+	uint64_t pool_drop : 8; /* W5 */
 	uint64_t update_time : 16;
 	uint64_t err_int : 8;
 	uint64_t err_int_ena : 8;
@@ -262,17 +262,17 @@ struct npa_aura_s {
 	uint64_t rsvd_371 : 1;
 	uint64_t err_qint_idx : 7;
 	uint64_t rsvd_383_379 : 5;
-	uint64_t thresh : 36;
+	uint64_t thresh : 36; /* W6 */
 	uint64_t rsvd_423_420 : 4;
-	uint64_t fc_msh_dst : 11; /* [CN10K, .) */
+	uint64_t fc_msh_dst : 11; /* [CN10K] */
 	uint64_t rsvd_447_435 : 13;
 	uint64_t rsvd_511_448 : 64; /* W7 */
 };
 
-/* NPA pool context structure */
+/* NPA pool context structure [CN9K, CN10K] */
 struct npa_pool_s {
 	uint64_t stack_base : 64; /* W0 */
-	uint64_t ena : 1;
+	uint64_t ena : 1; /* W1 */
 	uint64_t nat_align : 1;
 	uint64_t rsvd_67_66 : 2;
 	uint64_t stack_caching : 1;
@@ -282,11 +282,11 @@ struct npa_pool_s {
 	uint64_t rsvd_103_100 : 4;
 	uint64_t buf_size : 11;
 	uint64_t rsvd_127_115 : 13;
-	uint64_t stack_max_pages : 32;
+	uint64_t stack_max_pages : 32; /* W2 */
 	uint64_t stack_pages : 32;
-	uint64_t op_pc : 48;
+	uint64_t op_pc : 48; /* W3 */
 	uint64_t rsvd_255_240 : 16;
-	uint64_t stack_offset : 4;
+	uint64_t stack_offset : 4; /* W4 */
 	uint64_t rsvd_263_260 : 4;
 	uint64_t shift : 6;
 	uint64_t rsvd_271_270 : 2;
@@ -296,14 +296,14 @@ struct npa_pool_s {
 	uint64_t fc_stype : 2;
 	uint64_t fc_hyst_bits : 4;
 	uint64_t fc_up_crossing : 1;
-	uint64_t fc_be : 1; /* [CN10K, .) */
+	uint64_t fc_be : 1; /* [CN10K] */
 	uint64_t rsvd_299_298 : 2;
 	uint64_t update_time : 16;
 	uint64_t rsvd_319_316 : 4;
 	uint64_t fc_addr : 64;	 /* W5 */
 	uint64_t ptr_start : 64; /* W6 */
 	uint64_t ptr_end : 64;	 /* W7 */
-	uint64_t rsvd_535_512 : 24;
+	uint64_t rsvd_535_512 : 24; /* W8 */
 	uint64_t err_int : 8;
 	uint64_t err_int_ena : 8;
 	uint64_t thresh_int : 1;
@@ -314,9 +314,9 @@ struct npa_pool_s {
 	uint64_t rsvd_563 : 1;
 	uint64_t err_qint_idx : 7;
 	uint64_t rsvd_575_571 : 5;
-	uint64_t thresh : 36;
+	uint64_t thresh : 36; /* W9 */
 	uint64_t rsvd_615_612 : 4;
-	uint64_t fc_msh_dst : 11; /* [CN10K, .) */
+	uint64_t fc_msh_dst : 11; /* [CN10K] */
 	uint64_t rsvd_639_627 : 13;
 	uint64_t rsvd_703_640 : 64;  /* W10 */
 	uint64_t rsvd_767_704 : 64;  /* W11 */
@@ -326,6 +326,136 @@ struct npa_pool_s {
 	uint64_t rsvd_1023_960 : 64; /* W15 */
 };
 
+/* NPA aura context structure [CN20K] */
+struct npa_cn20k_aura_s {
+	uint64_t pool_addr : 64; /* W0 */
+	uint64_t ena : 1;   /* W1 */
+	uint64_t rsvd_66_65 : 2;
+	uint64_t pool_caching : 1;
+	uint64_t rsvd_68 : 16;
+	uint64_t avg_con : 9;
+	uint64_t rsvd_93 : 1;
+	uint64_t pool_drop_ena : 1;
+	uint64_t aura_drop_ena : 1;
+	uint64_t bp_ena : 1;
+	uint64_t rsvd_103_97 : 7;
+	uint64_t aura_drop : 8;
+	uint64_t shift : 6;
+	uint64_t rsvd_119_118 : 2;
+	uint64_t avg_level : 8;
+	uint64_t count : 36; /* W2 */
+	uint64_t rsvd_167_164 : 4;
+	uint64_t bpid : 12;
+	uint64_t rsvd_191_180 : 12;
+	uint64_t limit : 36; /* W3 */
+	uint64_t rsvd_231_228 : 4;
+	uint64_t bp : 7;
+	uint64_t rsvd_243_239 : 5;
+	uint64_t fc_ena : 1;
+	uint64_t fc_up_crossing : 1;
+	uint64_t fc_stype : 2;
+	uint64_t fc_hyst_bits : 4;
+	uint64_t rsvd_255_252 : 4;
+	uint64_t fc_addr : 64;  /* W4 */
+	uint64_t pool_drop : 8; /* W5 */
+	uint64_t update_time : 16;
+	uint64_t err_int : 8;
+	uint64_t err_int_ena : 8;
+	uint64_t thresh_int : 1;
+	uint64_t thresh_int_ena : 1;
+	uint64_t thresh_up : 1;
+	uint64_t rsvd_363 : 1;
+	uint64_t thresh_qint_idx : 7;
+	uint64_t rsvd_371 : 1;
+	uint64_t err_qint_idx : 7;
+	uint64_t rsvd_383_379 : 5;
+	uint64_t thresh : 36; /* W6*/
+	uint64_t rsvd_423_420 : 4;
+	uint64_t fc_msh_dst : 11;
+	uint64_t rsvd_438_435 : 4;
+	uint64_t op_dpc_ena : 1;
+	uint64_t op_dpc_set : 6;
+	uint64_t stream_ctx : 1;
+	uint64_t unified_ctx : 1;
+	uint64_t rsvd_511_448 : 64; /* W7 */
+};
+
+/* NPA pool context structure [CN20K] */
+struct npa_cn20k_pool_s {
+	uint64_t stack_base : 64; /* W0 */
+	uint64_t ena : 1; /* W1 */
+	uint64_t nat_align : 1;
+	uint64_t rsvd_67_66 : 2;
+	uint64_t stack_caching : 1;
+	uint64_t rsvd_87_69 : 19;
+	uint64_t buf_offset : 12;
+	uint64_t rsvd_103_100 : 4;
+	uint64_t buf_size : 12;
+	uint64_t rsvd_119_116 : 4;
+	uint64_t ref_cnt_prof : 3;
+	uint64_t rsvd_127_123 : 5;
+	uint64_t stack_max_pages : 32; /* W2 */
+	uint64_t stack_pages : 32;
+	uint64_t bp_0 : 7; /* W3 */
+	uint64_t bp_1 : 7;
+	uint64_t bp_2 : 7;
+	uint64_t bp_3 : 7;
+	uint64_t bp_4 : 7;
+	uint64_t bp_5 : 7;
+	uint64_t bp_6 : 7;
+	uint64_t bp_7 : 7;
+	uint64_t bp_ena_0 : 1;
+	uint64_t bp_ena_1 : 1;
+	uint64_t bp_ena_2 : 1;
+	uint64_t bp_ena_3 : 1;
+	uint64_t bp_ena_4 : 1;
+	uint64_t bp_ena_5 : 1;
+	uint64_t bp_ena_6 : 1;
+	uint64_t bp_ena_7 : 1;
+	uint64_t stack_offset : 4; /* W4 */
+	uint64_t rsvd_263_260 : 4;
+	uint64_t shift : 6;
+	uint64_t rsvd_271_270 : 2;
+	uint64_t avg_level : 8;
+	uint64_t avg_con : 9;
+	uint64_t fc_ena : 1;
+	uint64_t fc_stype : 2;
+	uint64_t fc_hyst_bits : 4;
+	uint64_t fc_up_crossing : 1;
+	uint64_t rsvd_299_297 : 3;
+	uint64_t update_time : 16;
+	uint64_t rsvd_319_316 : 4;
+	uint64_t fc_addr : 64;   /* W5 */
+	uint64_t ptr_start : 64; /* W6 */
+	uint64_t ptr_end : 64;   /* W7 */
+	uint64_t bpid_0 : 12; /* W8 */
+	uint64_t rsvd_535_524 : 12;
+	uint64_t err_int : 8;
+	uint64_t err_int_ena : 8;
+	uint64_t thresh_int : 1;
+	uint64_t thresh_int_ena : 1;
+	uint64_t thresh_up : 1;
+	uint64_t rsvd_555 : 1;
+	uint64_t thresh_qint_idx : 7;
+	uint64_t rsvd_563 : 1;
+	uint64_t err_qint_idx : 7;
+	uint64_t rsvd_575_571 : 5;
+	uint64_t thresh : 36; /* W9 */
+	uint64_t rsvd_615_612 : 4;
+	uint64_t fc_msh_dst : 11;
+	uint64_t rsvd_630_627 : 4;
+	uint64_t op_dpc_ena : 1;
+	uint64_t op_dpc_set : 6;
+	uint64_t stream_ctx : 1;
+	uint64_t rsvd_639 : 1;
+	uint64_t rsvd_703_640 : 64;  /* W10 */
+	uint64_t rsvd_767_704 : 64;  /* W11 */
+	uint64_t rsvd_831_768 : 64;  /* W12 */
+	uint64_t rsvd_895_832 : 64;  /* W13 */
+	uint64_t rsvd_959_896 : 64;  /* W14 */
+	uint64_t rsvd_1023_960 : 64; /* W15 */
+};
+
 /* NPA queue interrupt context hardware structure */
 struct npa_qint_hw_s {
 	uint32_t count : 22;
diff --git a/drivers/common/cnxk/roc_mbox.h b/drivers/common/cnxk/roc_mbox.h
index f1a3371ef9..9a9dcbdbda 100644
--- a/drivers/common/cnxk/roc_mbox.h
+++ b/drivers/common/cnxk/roc_mbox.h
@@ -119,6 +119,8 @@ struct mbox_msghdr {
 	M(NPA_AQ_ENQ, 0x402, npa_aq_enq, npa_aq_enq_req, npa_aq_enq_rsp)       \
 	M(NPA_HWCTX_DISABLE, 0x403, npa_hwctx_disable, hwctx_disable_req,      \
 	  msg_rsp)                                                             \
+	M(NPA_CN20K_AQ_ENQ, 0x404, npa_cn20k_aq_enq, npa_cn20k_aq_enq_req,     \
+	  npa_cn20k_aq_enq_rsp)                                                \
 	/* SSO/SSOW mbox IDs (range 0x600 - 0x7FF) */                          \
 	M(SSO_LF_ALLOC, 0x600, sso_lf_alloc, sso_lf_alloc_req,                 \
 	  sso_lf_alloc_rsp)                                                    \
@@ -1325,6 +1327,36 @@ struct npa_aq_enq_rsp {
 	};
 };
 
+struct npa_cn20k_aq_enq_req {
+	struct mbox_msghdr hdr;
+	uint32_t __io aura_id;
+	uint8_t __io ctype;
+	uint8_t __io op;
+	union {
+		/* Valid when op == WRITE/INIT and ctype == AURA */
+		__io struct npa_cn20k_aura_s aura;
+		/* Valid when op == WRITE/INIT and ctype == POOL */
+		__io struct npa_cn20k_pool_s pool;
+	};
+	/* Mask data when op == WRITE (1=write, 0=don't write) */
+	union {
+		/* Valid when op == WRITE and ctype == AURA */
+		__io struct npa_cn20k_aura_s aura_mask;
+		/* Valid when op == WRITE and ctype == POOL */
+		__io struct npa_cn20k_pool_s pool_mask;
+	};
+};
+
+struct npa_cn20k_aq_enq_rsp {
+	struct mbox_msghdr hdr;
+	union {
+		/* Valid when op == READ and ctype == AURA */
+		__io struct npa_cn20k_aura_s aura;
+		/* Valid when op == READ and ctype == POOL */
+		__io struct npa_cn20k_pool_s pool;
+	};
+};
+
 /* Disable all contexts of type 'ctype' */
 struct hwctx_disable_req {
 	struct mbox_msghdr hdr;
diff --git a/drivers/common/cnxk/roc_nix_debug.c b/drivers/common/cnxk/roc_nix_debug.c
index 26546f9297..2e91470c09 100644
--- a/drivers/common/cnxk/roc_nix_debug.c
+++ b/drivers/common/cnxk/roc_nix_debug.c
@@ -690,6 +690,7 @@ int
 roc_nix_queues_ctx_dump(struct roc_nix *roc_nix, FILE *file)
 {
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	struct npa_cn20k_aq_enq_req *npa_aq_cn20k;
 	int rc = -1, q, rq = nix->nb_rx_queues;
 	struct npa_aq_enq_rsp *npa_rsp;
 	struct npa_aq_enq_req *npa_aq;
@@ -772,8 +773,12 @@ roc_nix_queues_ctx_dump(struct roc_nix *roc_nix, FILE *file)
 			continue;
 		}
 
-		/* Dump SQB Aura minimal info */
-		npa_aq = mbox_alloc_msg_npa_aq_enq(mbox_get(npa_lf->mbox));
+		if (roc_model_is_cn20k()) {
+			npa_aq_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox_get(npa_lf->mbox));
+			npa_aq = (struct npa_aq_enq_req *)npa_aq_cn20k; /* Common fields */
+		} else {
+			npa_aq = mbox_alloc_msg_npa_aq_enq(mbox_get(npa_lf->mbox));
+		}
 		if (npa_aq == NULL) {
 			rc = -ENOSPC;
 			mbox_put(npa_lf->mbox);
diff --git a/drivers/common/cnxk/roc_nix_fc.c b/drivers/common/cnxk/roc_nix_fc.c
index 12bfb9816b..2f72e67993 100644
--- a/drivers/common/cnxk/roc_nix_fc.c
+++ b/drivers/common/cnxk/roc_nix_fc.c
@@ -158,6 +158,8 @@ static int
 nix_fc_rq_config_get(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 {
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	struct npa_cn20k_aq_enq_req *npa_req_cn20k;
+	struct npa_cn20k_aq_enq_rsp *npa_rsp_cn20k;
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = mbox_get(dev->mbox);
 	struct nix_aq_enq_rsp *rsp;
@@ -195,24 +197,44 @@ nix_fc_rq_config_get(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 	if (rc)
 		goto exit;
 
-	npa_req = mbox_alloc_msg_npa_aq_enq(mbox);
-	if (!npa_req) {
-		rc = -ENOSPC;
-		goto exit;
+	if (roc_model_is_cn20k()) {
+		npa_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		if (!npa_req_cn20k) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		npa_req_cn20k->aura_id = rsp->rq.lpb_aura;
+		npa_req_cn20k->ctype = NPA_AQ_CTYPE_AURA;
+		npa_req_cn20k->op = NPA_AQ_INSTOP_READ;
+
+		rc = mbox_process_msg(mbox, (void *)&npa_rsp_cn20k);
+		if (rc)
+			goto exit;
+
+		fc_cfg->cq_cfg.cq_drop = npa_rsp_cn20k->aura.bp;
+		fc_cfg->cq_cfg.enable = npa_rsp_cn20k->aura.bp_ena;
+		fc_cfg->type = ROC_NIX_FC_RQ_CFG;
+	} else {
+		npa_req = mbox_alloc_msg_npa_aq_enq(mbox);
+		if (!npa_req) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		npa_req->aura_id = rsp->rq.lpb_aura;
+		npa_req->ctype = NPA_AQ_CTYPE_AURA;
+		npa_req->op = NPA_AQ_INSTOP_READ;
+
+		rc = mbox_process_msg(mbox, (void *)&npa_rsp);
+		if (rc)
+			goto exit;
+
+		fc_cfg->cq_cfg.cq_drop = npa_rsp->aura.bp;
+		fc_cfg->cq_cfg.enable = npa_rsp->aura.bp_ena;
+		fc_cfg->type = ROC_NIX_FC_RQ_CFG;
 	}
 
-	npa_req->aura_id = rsp->rq.lpb_aura;
-	npa_req->ctype = NPA_AQ_CTYPE_AURA;
-	npa_req->op = NPA_AQ_INSTOP_READ;
-
-	rc = mbox_process_msg(mbox, (void *)&npa_rsp);
-	if (rc)
-		goto exit;
-
-	fc_cfg->cq_cfg.cq_drop = npa_rsp->aura.bp;
-	fc_cfg->cq_cfg.enable = npa_rsp->aura.bp_ena;
-	fc_cfg->type = ROC_NIX_FC_RQ_CFG;
-
 exit:
 	mbox_put(mbox);
 	return rc;
diff --git a/drivers/common/cnxk/roc_nix_tm_ops.c b/drivers/common/cnxk/roc_nix_tm_ops.c
index 9f3870a311..8144675f89 100644
--- a/drivers/common/cnxk/roc_nix_tm_ops.c
+++ b/drivers/common/cnxk/roc_nix_tm_ops.c
@@ -8,6 +8,7 @@
 int
 roc_nix_tm_sq_aura_fc(struct roc_nix_sq *sq, bool enable)
 {
+	struct npa_cn20k_aq_enq_req *req_cn20k;
 	struct npa_aq_enq_req *req;
 	struct npa_aq_enq_rsp *rsp;
 	uint64_t aura_handle;
@@ -25,7 +26,12 @@ roc_nix_tm_sq_aura_fc(struct roc_nix_sq *sq, bool enable)
 	mbox = mbox_get(lf->mbox);
 	/* Set/clear sqb aura fc_ena */
 	aura_handle = sq->aura_handle;
-	req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		req = (struct npa_aq_enq_req *)req_cn20k;
+	} else {
+		req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (req == NULL)
 		goto exit;
 
@@ -52,7 +58,12 @@ roc_nix_tm_sq_aura_fc(struct roc_nix_sq *sq, bool enable)
 
 	/* Read back npa aura ctx */
 	if (enable) {
-		req = mbox_alloc_msg_npa_aq_enq(mbox);
+		if (roc_model_is_cn20k()) {
+			req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+			req = (struct npa_aq_enq_req *)req_cn20k;
+		} else {
+			req = mbox_alloc_msg_npa_aq_enq(mbox);
+		}
 		if (req == NULL) {
 			rc = -ENOSPC;
 			goto exit;
diff --git a/drivers/common/cnxk/roc_npa.c b/drivers/common/cnxk/roc_npa.c
index 6c14c49901..934d7361a9 100644
--- a/drivers/common/cnxk/roc_npa.c
+++ b/drivers/common/cnxk/roc_npa.c
@@ -76,6 +76,7 @@ static int
 npa_aura_pool_init(struct mbox *m_box, uint32_t aura_id, struct npa_aura_s *aura,
 		   struct npa_pool_s *pool)
 {
+	struct npa_cn20k_aq_enq_req *aura_init_req_cn20k, *pool_init_req_cn20k;
 	struct npa_aq_enq_req *aura_init_req, *pool_init_req;
 	struct npa_aq_enq_rsp *aura_init_rsp, *pool_init_rsp;
 	struct mbox_dev *mdev = &m_box->dev[0];
@@ -83,7 +84,12 @@ npa_aura_pool_init(struct mbox *m_box, uint32_t aura_id, struct npa_aura_s *aura
 	struct mbox *mbox;
 
 	mbox = mbox_get(m_box);
-	aura_init_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		aura_init_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		aura_init_req = (struct npa_aq_enq_req *)aura_init_req_cn20k;
+	} else {
+		aura_init_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (aura_init_req == NULL)
 		goto exit;
 	aura_init_req->aura_id = aura_id;
@@ -91,6 +97,12 @@ npa_aura_pool_init(struct mbox *m_box, uint32_t aura_id, struct npa_aura_s *aura
 	aura_init_req->op = NPA_AQ_INSTOP_INIT;
 	mbox_memcpy(&aura_init_req->aura, aura, sizeof(*aura));
 
+	if (roc_model_is_cn20k()) {
+		pool_init_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		pool_init_req = (struct npa_aq_enq_req *)pool_init_req_cn20k;
+	} else {
+		pool_init_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	pool_init_req = mbox_alloc_msg_npa_aq_enq(mbox);
 	if (pool_init_req == NULL)
 		goto exit;
@@ -121,13 +133,19 @@ npa_aura_pool_init(struct mbox *m_box, uint32_t aura_id, struct npa_aura_s *aura
 static int
 npa_aura_init(struct mbox *m_box, uint32_t aura_id, struct npa_aura_s *aura)
 {
+	struct npa_cn20k_aq_enq_req *aura_init_req_cn20k;
 	struct npa_aq_enq_req *aura_init_req;
 	struct npa_aq_enq_rsp *aura_init_rsp;
 	struct mbox *mbox;
 	int rc = -ENOSPC;
 
 	mbox = mbox_get(m_box);
-	aura_init_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		aura_init_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		aura_init_req = (struct npa_aq_enq_req *)aura_init_req_cn20k;
+	} else {
+		aura_init_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (aura_init_req == NULL)
 		goto exit;
 	aura_init_req->aura_id = aura_id;
@@ -151,6 +169,7 @@ npa_aura_init(struct mbox *m_box, uint32_t aura_id, struct npa_aura_s *aura)
 static int
 npa_aura_pool_fini(struct mbox *m_box, uint32_t aura_id, uint64_t aura_handle)
 {
+	struct npa_cn20k_aq_enq_req *aura_req_cn20k, *pool_req_cn20k;
 	struct npa_aq_enq_req *aura_req, *pool_req;
 	struct npa_aq_enq_rsp *aura_rsp, *pool_rsp;
 	struct mbox_dev *mdev = &m_box->dev[0];
@@ -168,7 +187,12 @@ npa_aura_pool_fini(struct mbox *m_box, uint32_t aura_id, uint64_t aura_handle)
 	} while (ptr);
 
 	mbox = mbox_get(m_box);
-	pool_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		pool_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		pool_req = (struct npa_aq_enq_req *)pool_req_cn20k;
+	} else {
+		pool_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (pool_req == NULL)
 		goto exit;
 	pool_req->aura_id = aura_id;
@@ -177,7 +201,12 @@ npa_aura_pool_fini(struct mbox *m_box, uint32_t aura_id, uint64_t aura_handle)
 	pool_req->pool.ena = 0;
 	pool_req->pool_mask.ena = ~pool_req->pool_mask.ena;
 
-	aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		aura_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		aura_req = (struct npa_aq_enq_req *)aura_req_cn20k;
+	} else {
+		aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (aura_req == NULL)
 		goto exit;
 	aura_req->aura_id = aura_id;
@@ -185,8 +214,18 @@ npa_aura_pool_fini(struct mbox *m_box, uint32_t aura_id, uint64_t aura_handle)
 	aura_req->op = NPA_AQ_INSTOP_WRITE;
 	aura_req->aura.ena = 0;
 	aura_req->aura_mask.ena = ~aura_req->aura_mask.ena;
-	aura_req->aura.bp_ena = 0;
-	aura_req->aura_mask.bp_ena = ~aura_req->aura_mask.bp_ena;
+	if (roc_model_is_cn20k()) {
+		__io struct npa_cn20k_aura_s *aura_cn20k, *aura_mask_cn20k;
+
+		/* The bit positions/width of bp_ena has changed in cn20k */
+		aura_cn20k = (__io struct npa_cn20k_aura_s *)&aura_req->aura;
+		aura_cn20k->bp_ena = 0;
+		aura_mask_cn20k = (__io struct npa_cn20k_aura_s *)&aura_req->aura_mask;
+		aura_mask_cn20k->bp_ena = ~aura_mask_cn20k->bp_ena;
+	} else {
+		aura_req->aura.bp_ena = 0;
+		aura_req->aura_mask.bp_ena = ~aura_req->aura_mask.bp_ena;
+	}
 
 	rc = mbox_process(mbox);
 	if (rc < 0)
@@ -204,6 +243,12 @@ npa_aura_pool_fini(struct mbox *m_box, uint32_t aura_id, uint64_t aura_handle)
 		goto exit;
 	}
 
+	if (roc_model_is_cn20k()) {
+		/* In cn20k, NPA does not use NDC */
+		rc = 0;
+		goto exit;
+	}
+
 	/* Sync NDC-NPA for LF */
 	ndc_req = mbox_alloc_msg_ndc_sync_op(mbox);
 	if (ndc_req == NULL) {
@@ -226,6 +271,7 @@ npa_aura_pool_fini(struct mbox *m_box, uint32_t aura_id, uint64_t aura_handle)
 static int
 npa_aura_fini(struct mbox *m_box, uint32_t aura_id)
 {
+	struct npa_cn20k_aq_enq_req *aura_req_cn20k;
 	struct npa_aq_enq_req *aura_req;
 	struct npa_aq_enq_rsp *aura_rsp;
 	struct ndc_sync_op *ndc_req;
@@ -236,7 +282,12 @@ npa_aura_fini(struct mbox *m_box, uint32_t aura_id)
 	plt_delay_us(10);
 
 	mbox = mbox_get(m_box);
-	aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		aura_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		aura_req = (struct npa_aq_enq_req *)aura_req_cn20k;
+	} else {
+		aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (aura_req == NULL)
 		goto exit;
 	aura_req->aura_id = aura_id;
@@ -254,6 +305,12 @@ npa_aura_fini(struct mbox *m_box, uint32_t aura_id)
 		goto exit;
 	}
 
+	if (roc_model_is_cn20k()) {
+		/* In cn20k, NPA does not use NDC */
+		rc = 0;
+		goto exit;
+	}
+
 	/* Sync NDC-NPA for LF */
 	ndc_req = mbox_alloc_msg_ndc_sync_op(mbox);
 	if (ndc_req == NULL) {
@@ -335,6 +392,7 @@ roc_npa_pool_op_pc_reset(uint64_t aura_handle)
 int
 roc_npa_aura_drop_set(uint64_t aura_handle, uint64_t limit, bool ena)
 {
+	struct npa_cn20k_aq_enq_req *aura_req_cn20k;
 	struct npa_aq_enq_req *aura_req;
 	struct npa_lf *lf;
 	struct mbox *mbox;
@@ -344,7 +402,12 @@ roc_npa_aura_drop_set(uint64_t aura_handle, uint64_t limit, bool ena)
 	if (lf == NULL)
 		return NPA_ERR_DEVICE_NOT_BOUNDED;
 	mbox = mbox_get(lf->mbox);
-	aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		aura_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		aura_req = (struct npa_aq_enq_req *)aura_req_cn20k;
+	} else {
+		aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (aura_req == NULL) {
 		rc = -ENOMEM;
 		goto exit;
@@ -723,6 +786,7 @@ roc_npa_aura_create(uint64_t *aura_handle, uint32_t block_count,
 int
 roc_npa_aura_limit_modify(uint64_t aura_handle, uint16_t aura_limit)
 {
+	struct npa_cn20k_aq_enq_req *aura_req_cn20k;
 	struct npa_aq_enq_req *aura_req;
 	struct npa_lf *lf;
 	struct mbox *mbox;
@@ -733,7 +797,12 @@ roc_npa_aura_limit_modify(uint64_t aura_handle, uint16_t aura_limit)
 		return NPA_ERR_DEVICE_NOT_BOUNDED;
 
 	mbox = mbox_get(lf->mbox);
-	aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		aura_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		aura_req = (struct npa_aq_enq_req *)aura_req_cn20k;
+	} else {
+		aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (aura_req == NULL) {
 		rc = -ENOMEM;
 		goto exit;
@@ -834,12 +903,13 @@ int
 roc_npa_pool_range_update_check(uint64_t aura_handle)
 {
 	uint64_t aura_id = roc_npa_aura_handle_to_aura(aura_handle);
-	struct npa_lf *lf;
-	struct npa_aura_lim *lim;
+	struct npa_cn20k_aq_enq_req *req_cn20k;
 	__io struct npa_pool_s *pool;
 	struct npa_aq_enq_req *req;
 	struct npa_aq_enq_rsp *rsp;
+	struct npa_aura_lim *lim;
 	struct mbox *mbox;
+	struct npa_lf *lf;
 	int rc;
 
 	lf = idev_npa_obj_get();
@@ -849,7 +919,12 @@ roc_npa_pool_range_update_check(uint64_t aura_handle)
 	lim = lf->aura_lim;
 
 	mbox = mbox_get(lf->mbox);
-	req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		req = (struct npa_aq_enq_req *)req_cn20k;
+	} else {
+		req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (req == NULL) {
 		rc = -ENOSPC;
 		goto exit;
@@ -903,6 +978,7 @@ int
 roc_npa_aura_bp_configure(uint64_t aura_handle, uint16_t bpid, uint8_t bp_intf, uint8_t bp_thresh,
 			  bool enable)
 {
+	/* TODO: Add support for CN20K */
 	uint32_t aura_id = roc_npa_aura_handle_to_aura(aura_handle);
 	struct npa_lf *lf = idev_npa_obj_get();
 	struct npa_aq_enq_req *req;
diff --git a/drivers/common/cnxk/roc_npa_debug.c b/drivers/common/cnxk/roc_npa_debug.c
index 173d32cd9b..9a16f481a8 100644
--- a/drivers/common/cnxk/roc_npa_debug.c
+++ b/drivers/common/cnxk/roc_npa_debug.c
@@ -89,8 +89,9 @@ npa_aura_dump(__io struct npa_aura_s *aura)
 int
 roc_npa_ctx_dump(void)
 {
-	struct npa_aq_enq_req *aq;
+	struct npa_cn20k_aq_enq_req *aq_cn20k;
 	struct npa_aq_enq_rsp *rsp;
+	struct npa_aq_enq_req *aq;
 	struct mbox *mbox;
 	struct npa_lf *lf;
 	uint32_t q;
@@ -106,7 +107,12 @@ roc_npa_ctx_dump(void)
 		if (plt_bitmap_get(lf->npa_bmp, q))
 			continue;
 
-		aq = mbox_alloc_msg_npa_aq_enq(mbox);
+		if (roc_model_is_cn20k()) {
+			aq_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+			aq = (struct npa_aq_enq_req *)aq_cn20k;
+		} else {
+			aq = mbox_alloc_msg_npa_aq_enq(mbox);
+		}
 		if (aq == NULL) {
 			rc = -ENOSPC;
 			goto exit;
@@ -129,7 +135,12 @@ roc_npa_ctx_dump(void)
 		if (plt_bitmap_get(lf->npa_bmp, q))
 			continue;
 
-		aq = mbox_alloc_msg_npa_aq_enq(mbox);
+		if (roc_model_is_cn20k()) {
+			aq_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+			aq = (struct npa_aq_enq_req *)aq_cn20k;
+		} else {
+			aq = mbox_alloc_msg_npa_aq_enq(mbox);
+		}
 		if (aq == NULL) {
 			rc = -ENOSPC;
 			goto exit;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 04/18] mempool/cnxk: initialize mempool ops for cn20k
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
                     ` (2 preceding siblings ...)
  2024-09-26 16:01   ` [PATCH v2 03/18] common/cnxk: use new NPA aq enq mbox for cn20k Nithin Dabilpuram
@ 2024-09-26 16:01   ` Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 05/18] common/cnxk: add cn20k NIX register definitions Nithin Dabilpuram
                     ` (14 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-26 16:01 UTC (permalink / raw)
  To: jerinj, Ashwin Sekhar T K, Pavan Nikhilesh; +Cc: dev

From: Ashwin Sekhar T K <asekhar@marvell.com>

Initialize mempool ops for cn20k.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
---
 drivers/mempool/cnxk/cnxk_mempool_ops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mempool/cnxk/cnxk_mempool_ops.c b/drivers/mempool/cnxk/cnxk_mempool_ops.c
index a1aeaee746..bb35e2d1d2 100644
--- a/drivers/mempool/cnxk/cnxk_mempool_ops.c
+++ b/drivers/mempool/cnxk/cnxk_mempool_ops.c
@@ -192,7 +192,7 @@ cnxk_mempool_plt_init(void)
 
 	if (roc_model_is_cn9k()) {
 		rte_mbuf_set_platform_mempool_ops("cn9k_mempool_ops");
-	} else if (roc_model_is_cn10k()) {
+	} else if (roc_model_is_cn10k() || roc_model_is_cn20k()) {
 		rte_mbuf_set_platform_mempool_ops("cn10k_mempool_ops");
 		rc = cn10k_mempool_plt_init();
 	}
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 05/18] common/cnxk: add cn20k NIX register definitions
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
                     ` (3 preceding siblings ...)
  2024-09-26 16:01   ` [PATCH v2 04/18] mempool/cnxk: initialize mempool ops " Nithin Dabilpuram
@ 2024-09-26 16:01   ` Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 06/18] common/cnxk: support NIX queue config for cn20k Nithin Dabilpuram
                     ` (13 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-26 16:01 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

From: Satha Rao <skoteshwar@marvell.com>

Add cn20k NIX register definitions.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
---
 drivers/common/cnxk/hw/nix.h   | 524 +++++++++++++++++++++++++++++----
 drivers/common/cnxk/hw/rvu.h   |   7 +-
 drivers/common/cnxk/roc_mbox.h |  52 ++++
 drivers/common/cnxk/roc_nix.c  |  15 +-
 4 files changed, 533 insertions(+), 65 deletions(-)

diff --git a/drivers/common/cnxk/hw/nix.h b/drivers/common/cnxk/hw/nix.h
index 1720eb3815..dd629a2080 100644
--- a/drivers/common/cnxk/hw/nix.h
+++ b/drivers/common/cnxk/hw/nix.h
@@ -32,7 +32,7 @@
 #define NIX_AF_RX_CFG			(0xd0ull)
 #define NIX_AF_AVG_DELAY		(0xe0ull)
 #define NIX_AF_CINT_DELAY		(0xf0ull)
-#define NIX_AF_VWQE_TIMER		(0xf8ull) /* [CN10K, .) */
+#define NIX_AF_VWQE_TIMER		(0xf8ull) /* [CN10K, CN20K) */
 #define NIX_AF_RX_MCAST_BASE		(0x100ull)
 #define NIX_AF_RX_MCAST_CFG		(0x110ull)
 #define NIX_AF_RX_MCAST_BUF_BASE	(0x120ull)
@@ -82,9 +82,11 @@
 #define NIX_AF_RX_DEF_IIP6_DSCP		(0x2f0ull) /* [CN10K, .) */
 #define NIX_AF_RX_DEF_OIP6_DSCP		(0x2f8ull) /* [CN10K, .) */
 #define NIX_AF_RX_IPSEC_GEN_CFG		(0x300ull)
-#define NIX_AF_RX_IPSEC_VWQE_GEN_CFG	(0x310ull) /* [CN10K, .) */
-#define NIX_AF_RX_CPTX_INST_QSEL(a)	(0x320ull | (uint64_t)(a) << 3)
-#define NIX_AF_RX_CPTX_CREDIT(a)	(0x360ull | (uint64_t)(a) << 3)
+#define NIX_AF_RX_IPSEC_VWQE_GEN_CFG	(0x310ull) /* [CN10K, CN20K) */
+#define NIX_AF_RX_CPTX_INST_QSEL(a)	(0x340ull | (uint64_t)(a) << 16) /* [CN20K, .) */
+#define NIX_AF_RX_CPTX_CREDIT(a)	(0x380ull | (uint64_t)(a) << 16) /* [CN20K, .) */
+#define NIX_AF_CN9K_RX_CPTX_INST_QSEL(a)(0x320ull | (uint64_t)(a) << 3) /* [CN9K, CN20K) */
+#define NIX_AF_CN9K_RX_CPTX_CREDIT(a)	(0x360ull | (uint64_t)(a) << 3) /* [CN9K, CN20K) */
 #define NIX_AF_NDC_RX_SYNC		(0x3e0ull)
 #define NIX_AF_NDC_TX_SYNC		(0x3f0ull)
 #define NIX_AF_AQ_CFG			(0x400ull)
@@ -100,12 +102,14 @@
 #define NIX_AF_RX_LINKX_CFG(a)		(0x540ull | (uint64_t)(a) << 16)
 #define NIX_AF_RX_SW_SYNC		(0x550ull)
 #define NIX_AF_RX_LINKX_WRR_CFG(a)	(0x560ull | (uint64_t)(a) << 16)
+#define NIX_AF_RQM_ECO                  (0x5a0ull)
 #define NIX_AF_SEB_CFG			(0x5f0ull) /* [CN10K, .) */
 #define NIX_AF_EXPR_TX_FIFO_STATUS	(0x640ull) /* [CN9K, CN10K) */
 #define NIX_AF_NORM_TX_FIFO_STATUS	(0x648ull)
 #define NIX_AF_SDP_TX_FIFO_STATUS	(0x650ull)
 #define NIX_AF_TX_NPC_CAPTURE_CONFIG	(0x660ull)
 #define NIX_AF_TX_NPC_CAPTURE_INFO	(0x668ull)
+#define NIX_AF_SEB_COALESCE_DBGX(a)             (0x670ull | (uint64_t)(a) << 3)
 #define NIX_AF_TX_NPC_CAPTURE_RESPX(a)	(0x680ull | (uint64_t)(a) << 3)
 #define NIX_AF_SEB_ACTIVE_CYCLES_PCX(a) (0x6c0ull | (uint64_t)(a) << 3)
 #define NIX_AF_SMQX_CFG(a)		(0x700ull | (uint64_t)(a) << 16)
@@ -115,6 +119,7 @@
 #define NIX_AF_SMQX_NXT_HEAD(a)		(0x740ull | (uint64_t)(a) << 16)
 #define NIX_AF_SQM_ACTIVE_CYCLES_PC	(0x770ull)
 #define NIX_AF_SQM_SCLK_CNT		(0x780ull) /* [CN10K, .) */
+#define NIX_AF_DWRR_MTUX(a)             (0x790ull | (uint64_t)(a) << 16)
 #define NIX_AF_DWRR_SDP_MTU		(0x790ull) /* [CN10K, .) */
 #define NIX_AF_DWRR_RPM_MTU		(0x7a0ull) /* [CN10K, .) */
 #define NIX_AF_PSE_CHANNEL_LEVEL	(0x800ull)
@@ -131,6 +136,7 @@
 #define NIX_AF_TX_LINKX_HW_XOFF(a)	(0xa30ull | (uint64_t)(a) << 16)
 #define NIX_AF_SDP_LINK_CREDIT		(0xa40ull)
 #define NIX_AF_SDP_LINK_CDT_ADJ		(0xa50ull) /* [CN10K, .) */
+#define NIX_AF_LINK_CDT_ADJ_ERR		(0xaa0ull) /* [CN10K, .) */
 /* [CN9K, CN10K) */
 #define NIX_AF_SDP_SW_XOFFX(a)	    (0xa60ull | (uint64_t)(a) << 3)
 #define NIX_AF_SDP_HW_XOFFX(a)	    (0xac0ull | (uint64_t)(a) << 3)
@@ -226,7 +232,7 @@
 #define NIX_AF_TL4X_CIR(a)		 (0x1220ull | (uint64_t)(a) << 16)
 #define NIX_AF_TL4X_PIR(a)		 (0x1230ull | (uint64_t)(a) << 16)
 #define NIX_AF_TL4X_SCHED_STATE(a)	 (0x1240ull | (uint64_t)(a) << 16)
-#define NIX_AF_TL4X_SHAPE_STATE(a)	 (0x1250ull | (uint64_t)(a) << 16)
+#define NIX_AF_TL4X_SHAPE_STATE_PIR(a)	 (0x1250ull | (uint64_t)(a) << 16)
 #define NIX_AF_TL4X_SW_XOFF(a)		 (0x1270ull | (uint64_t)(a) << 16)
 #define NIX_AF_TL4X_TOPOLOGY(a)		 (0x1280ull | (uint64_t)(a) << 16)
 #define NIX_AF_TL4X_PARENT(a)		 (0x1288ull | (uint64_t)(a) << 16)
@@ -272,6 +278,18 @@
 #define NIX_AF_CINT_TIMERX(a)	    (0x1a40ull | (uint64_t)(a) << 18)
 #define NIX_AF_LSO_FORMATX_FIELDX(a, b)                                        \
 	(0x1b00ull | (uint64_t)(a) << 16 | (uint64_t)(b) << 3)
+/* [CN10K, .) */
+#define NIX_AF_SPI_TO_SA_KEYX_WAYX(a, b)    (0x1c00ull | (uint64_t)(a) << 16 | (uint64_t)(b) << 3)
+#define NIX_AF_SPI_TO_SA_VALUEX_WAYX(a, b)  (0x1c40ull | (uint64_t)(a) << 16 | (uint64_t)(b) << 3)
+#define NIX_AF_SPI_TO_SA_CFG		    (0x1c80ull)
+#define NIX_AF_SPI_TO_SA_CFG1		    (0x1c88ull)
+#define NIX_AF_SPI_TO_SA_HASH_KEY	    (0x1c90ull)
+#define NIX_AF_SPI_TO_SA_HASH_VALUE	    (0x1ca0ull)
+/* CN20K, .) */
+#define NIX_AF_RX_IPSEC_VLAN_CFGX(a)	    (0x1d00ull | (uint64_t)(a) << 3)
+#define NIX_AF_RX_IPSEC_QMAPX_DSCPX(a, b)   (0x1e00ull | (uint64_t)(a) << 6 | (uint64_t)(b) << 3)
+#define NIX_AF_RX_SSO_GRPX_BP_CFG(a)	    (0x2000ull | (uint64_t)(a) << 3)
+#define NIX_AF_RX_SSO_GRPX_BP_LEVEL(a)	    (0x3000ull | (uint64_t)(a) << 3)
 #define NIX_AF_LFX_CFG(a) (0x4000ull | (uint64_t)(a) << 17)
 /* [CN10K, .) */
 #define NIX_AF_LINKX_CFG(a)		 (0x4010ull | (uint64_t)(a) << 17)
@@ -348,6 +366,7 @@
 #define NIX_LF_TX_STATX(a)	 (0x300ull | (uint64_t)(a) << 3)
 #define NIX_LF_RX_STATX(a)	 (0x400ull | (uint64_t)(a) << 3)
 #define NIX_LF_OP_SENDX(a)	 (0x800ull | (uint64_t)(a) << 3)
+#define NIX_LF_PTP_CLOCK	 (0x8f8ull) /* [CN20K, .) */
 #define NIX_LF_RQ_OP_INT	 (0x900ull)
 #define NIX_LF_RQ_OP_OCTS	 (0x910ull)
 #define NIX_LF_RQ_OP_PKTS	 (0x920ull)
@@ -355,7 +374,7 @@
 #define NIX_LF_RQ_OP_DROP_PKTS	 (0x940ull)
 #define NIX_LF_RQ_OP_RE_PKTS	 (0x950ull)
 #define NIX_LF_OP_IPSEC_DYNO_CNT (0x980ull)
-#define NIX_LF_OP_VWQE_FLUSH	 (0x9a0ull) /* [CN10K, .) */
+#define NIX_LF_OP_VWQE_FLUSH	 (0x9a0ull) /* [CN10K, CN20K) */
 #define NIX_LF_PL_OP_BAND_PROF	 (0x9c0ull) /* [CN10K, .) */
 #define NIX_LF_SQ_OP_INT	 (0xa00ull)
 #define NIX_LF_SQ_OP_OCTS	 (0xa10ull)
@@ -368,6 +387,9 @@
 #define NIX_LF_CQ_OP_INT	 (0xb00ull)
 #define NIX_LF_CQ_OP_DOOR	 (0xb30ull)
 #define NIX_LF_CQ_OP_STATUS	 (0xb40ull)
+#define NIX_LF_SSO_BP_OP_DOOR	 (0xb50ull) /* [CN20K, .) */
+#define NIX_LF_SSO_BP_OP_LEVEL	 (0xb58ull) /* [CN20K, .) */
+#define NIX_LF_SSO_BP_OP_INT	 (0xb60ull) /* [CN20K, .) */
 #define NIX_LF_QINTX_CNT(a)	 (0xc00ull | (uint64_t)(a) << 12)
 #define NIX_LF_QINTX_INT(a)	 (0xc10ull | (uint64_t)(a) << 12)
 #define NIX_LF_QINTX_ENA_W1S(a)	 (0xc20ull | (uint64_t)(a) << 12)
@@ -389,6 +411,8 @@
 
 /* Enum offsets */
 
+#define NIX_SSOERRINT_DOOR_ERR	(0x0ull) /*[CN20K, .) */
+
 #define NIX_STAT_LF_TX_TX_UCAST (0x0ull)
 #define NIX_STAT_LF_TX_TX_BCAST (0x1ull)
 #define NIX_STAT_LF_TX_TX_MCAST (0x2ull)
@@ -572,6 +596,7 @@
 #define NIX_SEND_STATUS_NPC_VTAG_SIZE_ERR  (0x26ull)
 #define NIX_SEND_STATUS_SEND_MEM_FAULT	   (0x27ull)
 #define NIX_SEND_STATUS_SEND_STATS_ERR	   (0x28ull)
+#define NIX_SEND_STATUS_SEND_HDR_DROP	   (0x29ull) /* [CN20K, .) */
 
 #define NIX_SENDSTATSALG_NOP			     (0x0ull)
 #define NIX_SENDSTATSALG_ADD_PKT_CNT		     (0x1ull)
@@ -606,6 +631,7 @@
 #define NIX_SUBDC_WORK		(0x7ull)
 #define NIX_SUBDC_SG2		(0x8ull) /* [CN10K, .) */
 #define NIX_SUBDC_AGE_AND_STATS (0x9ull) /* [CN10K, .) */
+#define NIX_SUBDC_COMPID	(0xaull) /* [CN20K, .) */
 #define NIX_SUBDC_SOD		(0xfull)
 
 #define NIX_STYPE_STF (0x0ull)
@@ -644,6 +670,18 @@
 #define NIX_LSOALG_ADD_PAYLEN (0x2ull)
 #define NIX_LSOALG_ADD_OFFSET (0x3ull)
 #define NIX_LSOALG_TCP_FLAGS  (0x4ull)
+#define NIX_LSOALG_ALT_FLAGS  (0x5ull) /* [CN20K, .) */
+
+#define NIX_METER_CFG_RFC_2698 (0x0ull) /* [CN20K, .) */
+#define NIX_METER_CFG_RFC_2697 (0x1ull) /* [CN20K, .) */
+#define NIX_METER_CFG_RFC_4115 (0x2ull) /* [CN20K, .) */
+
+#define NIX_NDC_RX_PORT_AQ	(0x0ull)
+#define NIX_NDC_RX_PORT_C	(0x1ull)
+#define NIX_NDC_RX_PORT_CINT	(0x2ull)
+#define NIX_NDC_RX_PORT_MC	(0x3ull)
+#define NIX_NDC_RX_PORT_PKT	(0x4ull)
+#define NIX_NDC_RX_PORT_RQ	(0x5ull)
 
 #define NIX_MNQERR_SQ_CTX_FAULT	    (0x0ull)
 #define NIX_MNQERR_SQ_CTX_POISON    (0x1ull)
@@ -732,12 +770,14 @@
 #define NIX_RX_PERRCODE_IL4_PORT       (0x23ull)
 
 #define NIX_SA_ALG_NON_MS     (0x0ull) /* [CN10K, .) */
-#define NIX_SA_ALG_MS_CISCO   (0x1ull) /* [CN10K, .) */
-#define NIX_SA_ALG_MS_VIPTELA (0x2ull) /* [CN10K, .) */
+#define NIX_SA_ALG_MS_31_28   (0x1ull) /* [CN10K, .) */
+#define NIX_SA_ALG_MS_27_25   (0x2ull) /* [CN10K, .) */
+#define NIX_SA_ALG_MS_28_25   (0x3ull) /* [CN10K, .) */
 
 #define NIX_SENDCRCALG_CRC32  (0x0ull)
 #define NIX_SENDCRCALG_CRC32C (0x1ull)
 #define NIX_SENDCRCALG_ONES16 (0x2ull)
+#define NIX_SENDCRCALG_INVCRC (0x3ull) /* [CN10K, .) */
 
 #define NIX_SENDL3TYPE_NONE	 (0x0ull)
 #define NIX_SENDL3TYPE_IP4	 (0x2ull)
@@ -761,7 +801,7 @@
 #define NIX_XQE_TYPE_RX_IPSECS (0x2ull)
 #define NIX_XQE_TYPE_RX_IPSECH (0x3ull)
 #define NIX_XQE_TYPE_RX_IPSECD (0x4ull)
-#define NIX_XQE_TYPE_RX_VWQE   (0x5ull) /* [CN10K, .) */
+#define NIX_XQE_TYPE_RX_VWQE   (0x5ull) /* [CN10K, CN20K) */
 #define NIX_XQE_TYPE_RES_6     (0x6ull)
 #define NIX_XQE_TYPE_RES_7     (0x7ull)
 #define NIX_XQE_TYPE_SEND      (0x8ull)
@@ -825,6 +865,11 @@
 #define NIX_AQ_CTYPE_DYNO      (0x5ull)
 #define NIX_AQ_CTYPE_BAND_PROF (0x6ull) /* [CN10K, .) */
 
+#define NIX_CQERRINT_DOOR_ERR  (0x0ull)
+#define NIX_CQERRINT_WR_FULL   (0x1ull)
+#define NIX_CQERRINT_CQE_FAULT (0x2ull)
+#define NIX_CQERRINT_CPT_DROP  (0x3ull) /* [CN10KB, .) */
+
 #define NIX_COLORRESULT_GREEN	 (0x0ull)
 #define NIX_COLORRESULT_YELLOW	 (0x1ull)
 #define NIX_COLORRESULT_RED_SEND (0x2ull)
@@ -846,11 +891,6 @@
 #define NIX_CHAN_RPMX_LMACX_CHX(a, b, c)                                       \
 	(0x800ull | ((uint64_t)(a) << 8) | ((uint64_t)(b) << 4) | (uint64_t)(c))
 
-/* The mask is to extract lower 10-bits of channel number
- * which CPT will pass to X2P.
- */
-#define NIX_CHAN_CPT_X2P_MASK (0x3ffull)
-
 #define NIX_INTF_SDP  (0x4ull)
 #define NIX_INTF_CGX0 (0x0ull) /* [CN9K, CN10K) */
 #define NIX_INTF_CGX1 (0x1ull) /* [CN9K, CN10K) */
@@ -861,11 +901,6 @@
 #define NIX_INTF_LBK0 (0x3ull)
 #define NIX_INTF_CPT0 (0x5ull) /* [CN10K, .) */
 
-#define NIX_CQERRINT_DOOR_ERR  (0x0ull)
-#define NIX_CQERRINT_WR_FULL   (0x1ull)
-#define NIX_CQERRINT_CQE_FAULT (0x2ull)
-#define NIX_CQERRINT_CPT_DROP  (0x3ull) /* [CN10KB, .) */
-
 #define NIX_LINK_SDP (0xdull) /* [CN10K, .) */
 #define NIX_LINK_CPT (0xeull) /* [CN10K, .) */
 #define NIX_LINK_MC  (0xfull) /* [CN10K, .) */
@@ -894,7 +929,7 @@ struct nix_age_and_send_stats_s {
 	uint64_t threshold : 29;
 	uint64_t latency_drop : 1;
 	uint64_t aging : 1;
-	uint64_t wmem : 1;
+	uint64_t coas_en : 1;
 	uint64_t ooffset : 12;
 	uint64_t ioffset : 12;
 	uint64_t sel : 1;
@@ -907,8 +942,8 @@ struct nix_age_and_send_stats_s {
 struct nix_aq_inst_s {
 	uint64_t op : 4;
 	uint64_t ctype : 4;
-	uint64_t lf : 7;
-	uint64_t rsvd_23_15 : 9;
+	uint64_t lf : 9;
+	uint64_t rsvd_23_17 : 7;
 	uint64_t cindex : 20;
 	uint64_t rsvd_62_44 : 19;
 	uint64_t doneint : 1;
@@ -927,7 +962,7 @@ struct nix_aq_res_s {
 
 /* NIX bandwidth profile structure */
 struct nix_band_prof_s {
-	uint64_t pc_mode : 2;
+	uint64_t pc_mode : 2; /* W0 */
 	uint64_t icolor : 2;
 	uint64_t tnl_ena : 1;
 	uint64_t rsvd_7_5 : 3;
@@ -942,7 +977,7 @@ struct nix_band_prof_s {
 	uint64_t peir_mantissa : 8;
 	uint64_t pebs_mantissa : 8;
 	uint64_t cir_mantissa : 8;
-	uint64_t cbs_mantissa : 8;
+	uint64_t cbs_mantissa : 8; /* W1 */
 	uint64_t lmode : 1;
 	uint64_t l_sellect : 3;
 	uint64_t rdiv : 4;
@@ -953,37 +988,37 @@ struct nix_band_prof_s {
 	uint64_t yc_action : 2;
 	uint64_t rc_action : 2;
 	uint64_t meter_algo : 2;
-	uint64_t band_prof_id : 7;
-	uint64_t rsvd_118_111 : 8;
+	uint64_t band_prof_id : 11;
+	uint64_t rsvd_118_115 : 4;
 	uint64_t hl_en : 1;
 	uint64_t rsvd_127_120 : 8;
-	uint64_t ts : 48;
+	uint64_t ts : 48; /* W2 */
 	uint64_t rsvd_191_176 : 16;
-	uint64_t pe_accum : 32;
+	uint64_t pe_accum : 32; /* W3 */
 	uint64_t c_accum : 32;
-	uint64_t green_pkt_pass : 48;
+	uint64_t green_pkt_pass : 48; /* W4 */
 	uint64_t rsvd_319_304 : 16;
-	uint64_t yellow_pkt_pass : 48;
+	uint64_t yellow_pkt_pass : 48; /* W5 */
 	uint64_t rsvd_383_368 : 16;
-	uint64_t red_pkt_pass : 48;
+	uint64_t red_pkt_pass : 48; /* W6 */
 	uint64_t rsvd_447_432 : 16;
-	uint64_t green_octs_pass : 48;
+	uint64_t green_octs_pass : 48; /* W7 */
 	uint64_t rsvd_511_496 : 16;
-	uint64_t yellow_octs_pass : 48;
+	uint64_t yellow_octs_pass : 48; /* W8 */
 	uint64_t rsvd_575_560 : 16;
-	uint64_t red_octs_pass : 48;
+	uint64_t red_octs_pass : 48; /* W9 */
 	uint64_t rsvd_639_624 : 16;
-	uint64_t green_pkt_drop : 48;
+	uint64_t green_pkt_drop : 48; /* W10 */
 	uint64_t rsvd_703_688 : 16;
-	uint64_t yellow_pkt_drop : 48;
+	uint64_t yellow_pkt_drop : 48; /* W11 */
 	uint64_t rsvd_767_752 : 16;
-	uint64_t red_pkt_drop : 48;
+	uint64_t red_pkt_drop : 48; /* W12 */
 	uint64_t rsvd_831_816 : 16;
-	uint64_t green_octs_drop : 48;
+	uint64_t green_octs_drop : 48; /* W13 */
 	uint64_t rsvd_895_880 : 16;
-	uint64_t yellow_octs_drop : 48;
+	uint64_t yellow_octs_drop : 48; /* W14 */
 	uint64_t rsvd_959_944 : 16;
-	uint64_t red_octs_drop : 48;
+	uint64_t red_octs_drop : 48; /* W15 */
 	uint64_t rsvd_1023_1008 : 16;
 };
 
@@ -1005,11 +1040,55 @@ struct nix_cint_hw_s {
 struct nix_cqe_hdr_s {
 	uint64_t tag : 32;
 	uint64_t q : 20;
-	uint64_t rsvd_57_52 : 6;
+	uint64_t long_send_comp : 1;
+	uint64_t rsvd_57_53 : 5;
 	uint64_t node : 2;
 	uint64_t cqe_type : 4;
 };
 
+/* [CN20K, .) NIX Completion queue context structure */
+struct nix_cn20k_cq_ctx_s {
+	uint64_t base : 64; /* W0 */
+	uint64_t lbp_ena : 1; /* W1 */
+	uint64_t lbpid_low : 3;
+	uint64_t bp_ena : 1;
+	uint64_t lbpid_med : 3;
+	uint64_t bpid : 9;
+	uint64_t lbpid_high : 3;
+	uint64_t qint_idx : 7;
+	uint64_t cq_err : 1;
+	uint64_t cint_idx : 7;
+	uint64_t avg_con : 9;
+	uint64_t wrptr : 20;
+	uint64_t tail : 20; /* W2 */
+	uint64_t head : 20;
+	uint64_t avg_level : 8;
+	uint64_t update_time : 16;
+	uint64_t bp : 8; /* W3 */
+	uint64_t drop : 8;
+	uint64_t drop_ena : 1;
+	uint64_t ena : 1;
+	uint64_t cpt_drop_err_en  : 1;
+	uint64_t reserved_211_211 : 1;
+	uint64_t msh_dst : 11;
+	uint64_t msh_valid : 1;
+	uint64_t stash_thresh : 4;
+	uint64_t lbp_frac : 4;
+	uint64_t caching : 1;
+	uint64_t stashing : 1;
+	uint64_t reserved_234_235 : 2;
+	uint64_t qsize : 4;
+	uint64_t cq_err_int : 8;
+	uint64_t cq_err_int_ena   : 8;
+	uint64_t bpid_ext : 2; /* W4 */
+	uint64_t reserved_258_259 : 2;
+	uint64_t lbpid_ext : 2;
+	uint64_t reserved_262_319 : 58;
+	uint64_t reserved_320_383 : 64; /* W5 */
+	uint64_t reserved_384_447 : 64; /* W6 */
+	uint64_t reserved_448_511 : 64; /* W7 */
+};
+
 /* NIX completion queue context structure */
 struct nix_cq_ctx_s {
 	uint64_t base : 64; /* W0 */
@@ -1083,6 +1162,184 @@ struct nix_qint_hw_s {
 	uint32_t ena : 1;
 };
 
+/* [CN20K, .) NIX receive queue context structure */
+struct nix_cn20k_rq_ctx_hw_s {
+	uint64_t ena : 1; /* W0 */
+	uint64_t sso_ena : 1;
+	uint64_t ipsech_ena : 1;
+	uint64_t ena_wqwd : 1;
+	uint64_t cq : 20;
+	uint64_t rsvd_34_24 : 11;
+	uint64_t port_il4_dis : 1;
+	uint64_t port_ol4_dis : 1;
+	uint64_t lenerr_dis : 1;
+	uint64_t csum_il4_dis : 1;
+	uint64_t csum_ol4_dis : 1;
+	uint64_t len_il4_dis : 1;
+	uint64_t len_il3_dis : 1;
+	uint64_t len_ol4_dis : 1;
+	uint64_t len_ol3_dis : 1;
+	uint64_t wqe_aura : 20;
+	uint64_t spb_aura : 20; /* W1 */
+	uint64_t lpb_aura : 20;
+	uint64_t sso_grp : 10;
+	uint64_t sso_tt : 2;
+	uint64_t pb_caching : 2;
+	uint64_t wqe_caching : 1;
+	uint64_t xqe_drop_ena : 1;
+	uint64_t spb_drop_ena : 1;
+	uint64_t lpb_drop_ena : 1;
+	uint64_t pb_stashing : 1;
+	uint64_t ipsecd_drop_en : 1;
+	uint64_t chi_ena : 1;
+	uint64_t rsvd_127_125 : 3;
+	uint64_t band_prof_id_l : 10; /* W2 */
+	uint64_t sso_drop_ena : 1;
+	uint64_t policer_ena : 1;
+	uint64_t spb_sizem1 : 6;
+	uint64_t wqe_skip : 2;
+	uint64_t spb_high_sizem1 : 3;
+	uint64_t spb_ena : 1;
+	uint64_t lpb_sizem1 : 12;
+	uint64_t first_skip : 7;
+	uint64_t sso_bp_ena : 1;
+	uint64_t later_skip : 6;
+	uint64_t xqe_imm_size : 6;
+	uint64_t band_prof_id_h : 4;
+	uint64_t rsvd_189_188 : 2;
+	uint64_t xqe_imm_copy : 1;
+	uint64_t xqe_hdr_split : 1;
+	uint64_t xqe_drop : 8; /* W3 */
+	uint64_t xqe_pass : 8;
+	uint64_t wqe_pool_drop : 8;
+	uint64_t wqe_pool_pass : 8;
+	uint64_t spb_aura_drop : 8;
+	uint64_t spb_aura_pass : 8;
+	uint64_t spb_pool_drop : 8;
+	uint64_t spb_pool_pass : 8;
+	uint64_t lpb_aura_drop : 8; /* W4 */
+	uint64_t lpb_aura_pass : 8;
+	uint64_t lpb_pool_drop : 8;
+	uint64_t lpb_pool_pass : 8;
+	uint64_t rsvd_319_288 : 32;
+	uint64_t ltag : 24; /* W5 */
+	uint64_t good_utag : 8;
+	uint64_t bad_utag : 8;
+	uint64_t flow_tagw : 6;
+	uint64_t rsvd_366  : 1;
+	uint64_t rsvd_367  : 1;
+	uint64_t rsvd_375_368 : 8;
+	uint64_t rsvd_379_376 : 4;
+	uint64_t rsvd_381_380 : 2;
+	uint64_t rsvd_383_382 : 2;
+	uint64_t octs : 48; /* W6 */
+	uint64_t rsvd_447_432 : 16;
+	uint64_t pkts : 48; /* W7 */
+	uint64_t rsvd_511_496 : 16;
+	uint64_t drop_octs : 48; /* W8 */
+	uint64_t rsvd_575_560 : 16;
+	uint64_t drop_pkts : 48; /* W9 */
+	uint64_t rsvd_639_624 : 16;
+	uint64_t re_pkts : 48; /* W10 */
+	uint64_t rsvd_702_688 : 15;
+	uint64_t ena_copy : 1;
+	uint64_t rsvd_739_704 : 36; /* W11 */
+	uint64_t rq_int : 8;
+	uint64_t rq_int_ena : 8;
+	uint64_t qint_idx : 7;
+	uint64_t rsvd_767_763 : 5;
+	uint64_t rsvd_831_768 : 64;  /* W12 */
+	uint64_t rsvd_895_832 : 64;  /* W13 */
+	uint64_t rsvd_959_896 : 64;  /* W14 */
+	uint64_t rsvd_1023_960 : 64; /* W15 */
+};
+
+/* [CN20K, .) NIX Receive queue context structure */
+struct nix_cn20k_rq_ctx_s {
+	uint64_t ena : 1; /* W0 */
+	uint64_t sso_ena : 1;
+	uint64_t ipsech_ena : 1;
+	uint64_t ena_wqwd : 1;
+	uint64_t cq : 20;
+	uint64_t reserved_24_34 : 11;
+	uint64_t port_il4_dis : 1;
+	uint64_t port_ol4_dis : 1;
+	uint64_t lenerr_dis : 1;
+	uint64_t csum_il4_dis : 1;
+	uint64_t csum_ol4_dis : 1;
+	uint64_t len_il4_dis : 1;
+	uint64_t len_il3_dis : 1;
+	uint64_t len_ol4_dis : 1;
+	uint64_t len_ol3_dis : 1;
+	uint64_t wqe_aura : 20;
+	uint64_t spb_aura : 20; /* W1 */
+	uint64_t lpb_aura : 20;
+	uint64_t sso_grp : 10;
+	uint64_t sso_tt : 2;
+	uint64_t pb_caching : 2;
+	uint64_t wqe_caching : 1;
+	uint64_t xqe_drop_ena : 1;
+	uint64_t spb_drop_ena : 1;
+	uint64_t lpb_drop_ena : 1;
+	uint64_t pb_stashing : 1;
+	uint64_t ipsecd_drop_en : 1;
+	uint64_t chi_ena : 1;
+	uint64_t reserved_125_127 : 3;
+	uint64_t band_prof_id_l : 10; /* W2 */
+	uint64_t sso_fc_ena : 1;
+	uint64_t policer_ena : 1;
+	uint64_t spb_sizem1 : 6;
+	uint64_t wqe_skip : 2;
+	uint64_t spb_high_sizem1 : 3;
+	uint64_t spb_ena : 1;
+	uint64_t lpb_sizem1 : 12;
+	uint64_t first_skip : 7;
+	uint64_t sso_bp_ena : 1;
+	uint64_t later_skip : 6;
+	uint64_t xqe_imm_size : 6;
+	uint64_t band_prof_id_h : 4;
+	uint64_t reserved_188_189 : 2;
+	uint64_t xqe_imm_copy : 1;
+	uint64_t xqe_hdr_split : 1;
+	uint64_t xqe_drop : 8; /* W3 */
+	uint64_t xqe_pass : 8;
+	uint64_t wqe_pool_drop : 8;
+	uint64_t wqe_pool_pass : 8;
+	uint64_t spb_aura_drop : 8;
+	uint64_t spb_aura_pass : 8;
+	uint64_t spb_pool_drop : 8;
+	uint64_t spb_pool_pass : 8;
+	uint64_t lpb_aura_drop : 8; /* W4 */
+	uint64_t lpb_aura_pass : 8;
+	uint64_t lpb_pool_drop : 8;
+	uint64_t lpb_pool_pass : 8;
+	uint64_t reserved_288_291 : 4;
+	uint64_t rq_int : 8;
+	uint64_t rq_int_ena : 8;
+	uint64_t qint_idx : 7;
+	uint64_t reserved_315_319 : 5;
+	uint64_t ltag : 24; /* W5 */
+	uint64_t good_utag : 8;
+	uint64_t bad_utag : 8;
+	uint64_t flow_tagw : 6;
+	uint64_t reserved_366_383 : 18;
+	uint64_t octs : 48; /* W6 */
+	uint64_t reserved_432_447 : 16;
+	uint64_t pkts : 48; /* W7 */
+	uint64_t reserved_496_511 : 16;
+	uint64_t drop_octs : 48; /* W8 */
+	uint64_t reserved_560_575 : 16;
+	uint64_t drop_pkts : 48; /* W9 */
+	uint64_t reserved_624_639 : 16;
+	uint64_t re_pkts : 48; /* W10 */
+	uint64_t reserved_688_703 : 16;
+	uint64_t reserved_704_767 : 64; /* W11 */
+	uint64_t reserved_768_831 : 64; /* W12 */
+	uint64_t reserved_832_895 : 64; /* W13 */
+	uint64_t reserved_896_959 : 64; /* W14 */
+	uint64_t reserved_960_1023 : 64; /* W15 */
+};
+
 /* [CN10K, .) NIX receive queue context structure */
 struct nix_cn10k_rq_ctx_hw_s {
 	uint64_t ena : 1;
@@ -1493,13 +1750,13 @@ union nix_rx_parse_u {
 		uint64_t lhptr : 8;
 		uint64_t vtag0_ptr : 8;
 		uint64_t vtag1_ptr : 8;
-		uint64_t flow_key_alg : 5;
-		uint64_t rsvd_341 : 1;
+		uint64_t flow_key_alg : 6;
 		uint64_t rsvd_349_342 : 8;
 		uint64_t rsvd_353_350 : 4;
 		uint64_t rsvd_359_354 : 6;
 		uint64_t color : 2;
-		uint64_t rsvd_381_362 : 20;
+		uint64_t mcs_mdata    : 14;
+		uint64_t rsvd_381_376 : 6;
 		uint64_t rsvd_382 : 1;
 		uint64_t rsvd_383 : 1;
 		uint64_t rsvd_447_384 : 64; /* W6 */
@@ -1652,7 +1909,9 @@ union nix_send_ext_w1_u {
 		uint64_t vlan0_ins_ena : 1;
 		uint64_t vlan1_ins_ena : 1;
 		uint64_t init_color : 2;
-		uint64_t rsvd_127_116 : 12;
+		uint64_t flow_id       : 7;
+		uint64_t flow_override : 1;
+		uint64_t rsvd_127_124 : 4;
 	};
 	struct {
 		uint64_t vlan0_ins_ptr : 8;
@@ -1675,7 +1934,7 @@ union nix_send_hdr_w0_u {
 	uint64_t u;
 	struct {
 		uint64_t total : 18;
-		uint64_t rsvd_18 : 1;
+		uint64_t cpt_error : 1;
 		uint64_t df : 1;
 		uint64_t aura : 20;
 		uint64_t sizem1 : 3;
@@ -1718,7 +1977,8 @@ struct nix_send_jump_s {
 	uint64_t rsvd_13_7 : 7;
 	uint64_t ld_type : 2;
 	uint64_t aura : 20;
-	uint64_t rsvd_58_36 : 23;
+	uint64_t refcnt_en  : 1;
+	uint64_t rsvd_58_37 : 22;
 	uint64_t f : 1;
 	uint64_t subdc : 4;
 	uint64_t addr : 64; /* W1 */
@@ -1729,7 +1989,10 @@ union nix_send_mem_w0_u {
 	uint64_t u;
 	struct {
 		uint64_t offset : 16;
-		uint64_t rsvd_51_16 : 36;
+		uint64_t base_ns     : 32;
+		uint64_t step_type   : 1;
+		uint64_t rsvd_50_49  : 2;
+		uint64_t coas_en     : 1;
 		uint64_t per_lso_seg : 1;
 		uint64_t wmem : 1;
 		uint64_t dsz : 2;
@@ -1760,7 +2023,8 @@ union nix_send_sg2_s {
 		uint64_t i1 : 1;
 		uint64_t fabs : 1;
 		uint64_t foff : 8;
-		uint64_t rsvd_57_46 : 12;
+		uint64_t refcnt_en1 : 1;
+		uint64_t rsvd_57_47 : 11;
 		uint64_t ld_type : 2;
 		uint64_t subdc : 4;
 	};
@@ -1773,7 +2037,10 @@ union nix_send_sg_s {
 		uint64_t seg2_size : 16;
 		uint64_t seg3_size : 16;
 		uint64_t segs : 2;
-		uint64_t rsvd_54_50 : 5;
+		uint64_t rsvd_51_50 : 2;
+		uint64_t refcnt_en1 : 1;
+		uint64_t refcnt_en2 : 1;
+		uint64_t refcnt_en3 : 1;
 		uint64_t i1 : 1;
 		uint64_t i2 : 1;
 		uint64_t i3 : 1;
@@ -1792,6 +2059,133 @@ struct nix_send_work_s {
 	uint64_t addr : 64; /* W1 */
 };
 
+/* [CN20K, .) NIX sq context hardware structure */
+struct nix_cn20k_sq_ctx_hw_s {
+	uint64_t ena : 1;
+	uint64_t substream : 20;
+	uint64_t max_sqe_size : 2;
+	uint64_t sqe_way_mask : 16;
+	uint64_t sqb_aura : 20;
+	uint64_t gbl_rsvd1 : 5;
+	uint64_t cq_id : 20; /* W1 */
+	uint64_t cq_ena : 1;
+	uint64_t qint_idx : 6;
+	uint64_t gbl_rsvd2 : 1;
+	uint64_t sq_int : 8;
+	uint64_t sq_int_ena : 8;
+	uint64_t xoff : 1;
+	uint64_t sqe_stype : 2;
+	uint64_t gbl_rsvd : 17;
+	uint64_t head_sqb : 64; /* W2 */
+	uint64_t head_offset : 6; /* W3 */
+	uint64_t sqb_dequeue_count : 16;
+	uint64_t default_chan : 12;
+	uint64_t sdp_mcast : 1;
+	uint64_t sso_ena : 1;
+	uint64_t dse_rsvd1 : 28;
+	uint64_t sqb_enqueue_count : 16; /* W4 */
+	uint64_t tail_offset : 6;
+	uint64_t lmt_dis : 1;
+	uint64_t smq_rr_weight : 14;
+	uint64_t dnq_rsvd1 : 27;
+	uint64_t tail_sqb : 64; /* W5 */
+	uint64_t next_sqb : 64; /* W6 */
+	uint64_t smq : 11; /* W7 */
+	uint64_t smq_pend : 1;
+	uint64_t smq_next_sq : 20;
+	uint64_t smq_next_sq_vld : 1;
+	uint64_t mnq_dis : 1;
+	uint64_t scm1_rsvd2 : 30;
+	uint64_t smenq_sqb : 64; /* W8 */
+	uint64_t smenq_offset : 6; /* W9 */
+	uint64_t cq_limit : 8;
+	uint64_t smq_rr_count : 32;
+	uint64_t scm_lso_rem : 18;
+	uint64_t smq_lso_segnum : 8; /* W10 */
+	uint64_t vfi_lso_total : 18;
+	uint64_t vfi_lso_sizem1 : 3;
+	uint64_t vfi_lso_sb : 8;
+	uint64_t vfi_lso_mps : 14;
+	uint64_t vfi_lso_vlan0_ins_ena : 1;
+	uint64_t vfi_lso_vlan1_ins_ena : 1;
+	uint64_t vfi_lso_vld : 1;
+	uint64_t smenq_next_sqb_vld : 1;
+	uint64_t scm_dq_rsvd1 : 9;
+	uint64_t smenq_next_sqb : 64; /* W11 */
+	uint64_t age_drop_octs : 32; /* W12 */
+	uint64_t age_drop_pkts : 32;
+	uint64_t drop_pkts : 48; /* W13 */
+	uint64_t drop_octs_lsw : 16;
+	uint64_t drop_octs_msw : 32; /* W14 */
+	uint64_t pkts_lsw : 32;
+	uint64_t pkts_msw : 16; /* W15 */
+	uint64_t octs : 48;
+};
+
+/* [CN20K, .) NIX Send queue context structure */
+struct nix_cn20k_sq_ctx_s {
+	uint64_t ena : 1; /* W0 */
+	uint64_t qint_idx : 6;
+	uint64_t substream : 20;
+	uint64_t sdp_mcast :  1;
+	uint64_t cq : 20;
+	uint64_t sqe_way_mask : 16;
+	uint64_t smq : 11; /* W1 */
+	uint64_t cq_ena : 1;
+	uint64_t xoff : 1;
+	uint64_t sso_ena : 1;
+	uint64_t smq_rr_weight : 14;
+	uint64_t default_chan : 12;
+	uint64_t sqb_count : 16;
+	uint64_t reserved_120_120 : 1;
+	uint64_t smq_rr_count_lb : 7;
+	uint64_t smq_rr_count_ub : 25; /* W2 */
+	uint64_t sqb_aura : 20;
+	uint64_t sq_int : 8;
+	uint64_t sq_int_ena : 8;
+	uint64_t sqe_stype : 2;
+	uint64_t reserved_191_191 : 1;
+	uint64_t max_sqe_size : 2; /* W3 */
+	uint64_t cq_limit : 8;
+	uint64_t lmt_dis : 1;
+	uint64_t mnq_dis : 1;
+	uint64_t smq_next_sq : 20;
+	uint64_t smq_lso_segnum :  8;
+	uint64_t tail_offset :  6;
+	uint64_t smenq_offset :  6;
+	uint64_t head_offset :  6;
+	uint64_t smenq_next_sqb_vld :  1;
+	uint64_t smq_pend :  1;
+	uint64_t smq_next_sq_vld :  1;
+	uint64_t reserved_253_255 :  3;
+	uint64_t next_sqb : 64; /* W4 */
+	uint64_t tail_sqb : 64; /* W5 */
+	uint64_t smenq_sqb : 64; /* W6 */
+	uint64_t smenq_next_sqb : 64; /* W7 */
+	uint64_t head_sqb : 64; /* W8 */
+	uint64_t reserved_576_583 : 8; /* W9 */
+	uint64_t vfi_lso_total : 18;
+	uint64_t vfi_lso_sizem1 : 3;
+	uint64_t vfi_lso_sb : 8;
+	uint64_t vfi_lso_mps : 14;
+	uint64_t vfi_lso_vlan0_ins_ena : 1;
+	uint64_t vfi_lso_vlan1_ins_ena : 1;
+	uint64_t vfi_lso_vld : 1;
+	uint64_t reserved_630_639 : 10;
+	uint64_t scm_lso_rem : 18; /* W10 */
+	uint64_t reserved_658_703 : 46;
+	uint64_t octs : 48; /* W11 */
+	uint64_t reserved_752_767 : 16;
+	uint64_t pkts : 48; /* W12 */
+	uint64_t reserved_816_831 : 16;
+	uint64_t aged_drop_octs : 32; /* W13 */
+	uint64_t aged_drop_pkts : 32;
+	uint64_t drop_octs : 48; /* W14 */
+	uint64_t reserved_944_959 : 16;
+	uint64_t drop_pkts : 48; /* W15 */
+	uint64_t reserved_1008_1023 : 16;
+};
+
 /* [CN10K, .) NIX sq context hardware structure */
 struct nix_cn10k_sq_ctx_hw_s {
 	uint64_t ena : 1;
@@ -2234,17 +2628,24 @@ struct nix_lso_format {
 #define NIX_CN9K_TM_RR_QUANTUM_MAX (BIT_ULL(24) - 1)
 #define NIX_TM_RR_WEIGHT_MAX	   (BIT_ULL(14) - 1)
 
-/* [CN9K, CN10K) */
-#define NIX_CN9K_TXSCH_LVL_SMQ_MAX 512
-
-/* [CN10K, .) */
-#define NIX_TXSCH_LVL_SMQ_MAX 832
-
 /* [CN9K, .) */
-#define NIX_TXSCH_LVL_TL4_MAX 512
-#define NIX_TXSCH_LVL_TL3_MAX 256
-#define NIX_TXSCH_LVL_TL2_MAX 256
 #define NIX_TXSCH_LVL_TL1_MAX 28
+#define NIX_TXSCH_LVL_TL2_MAX 256
+
+/* CN9K */
+#define NIX_CN9K_TXSCH_LVL_TL3_MAX 256
+#define NIX_CN9K_TXSCH_LVL_TL4_MAX 512
+#define NIX_CN9K_TXSCH_LVL_SMQ_MAX 512
+
+/* CN10K */
+#define NIX_CN10K_TXSCH_LVL_TL3_MAX 256
+#define NIX_CN10K_TXSCH_LVL_TL4_MAX 512
+#define NIX_CN10K_TXSCH_LVL_SMQ_MAX 832
+
+/* [CN20K, .) */
+#define NIX_TXSCH_LVL_TL3_MAX 512
+#define NIX_TXSCH_LVL_TL4_MAX 1280
+#define NIX_TXSCH_LVL_SMQ_MAX 2048
 
 #define NIX_CQ_OP_STAT_OP_ERR 63
 #define NIX_CQ_OP_STAT_CQ_ERR 46
@@ -2265,4 +2666,9 @@ struct nix_lso_format {
 #define NIX_SENDSTAT_IOFFSET_MASK 0xFFF
 #define NIX_SENDSTAT_OOFFSET_MASK 0xFFF
 
+/* The mask is to extract lower 10-bits of channel number
+ * which CPT will pass to X2P.
+ */
+#define NIX_CHAN_CPT_X2P_MASK (0x3ffull)
+
 #endif /* __NIX_HW_H__ */
diff --git a/drivers/common/cnxk/hw/rvu.h b/drivers/common/cnxk/hw/rvu.h
index ee6cf30c5d..ed2ba996e0 100644
--- a/drivers/common/cnxk/hw/rvu.h
+++ b/drivers/common/cnxk/hw/rvu.h
@@ -67,7 +67,9 @@
 #define RVU_PF_VFX_PFVF_MBOXX(a, b)                                            \
 	(0x0ull | (uint64_t)(a) << 12 | (uint64_t)(b) << 3)
 #define RVU_PF_VF_BAR4_ADDR		 (0x10ull)
-#define RVU_PF_BLOCK_ADDRX_DISC(a)	 (0x200ull | (uint64_t)(a) << 3)
+
+#define RVU_PF_DISC			 (0x0ull)  /* [CN20K, .) */
+#define RVU_PF_BLOCK_ADDRX_DISC(a)	 (0x200ull | (uint64_t)(a) << 3)  /* [CN9K, CN20K) */
 #define RVU_PF_VFME_STATUSX(a)		 (0x800ull | (uint64_t)(a) << 3)
 #define RVU_PF_VFTRPENDX(a)		 (0x820ull | (uint64_t)(a) << 3)
 #define RVU_PF_VFTRPEND_W1SX(a)		 (0x840ull | (uint64_t)(a) << 3)
@@ -91,7 +93,8 @@
 #define RVU_PF_MSIX_VECX_ADDR(a)	 (0x80000ull | (uint64_t)(a) << 4)
 #define RVU_PF_MSIX_VECX_CTL(a)		 (0x80008ull | (uint64_t)(a) << 4)
 #define RVU_PF_MSIX_PBAX(a)		 (0xf0000ull | (uint64_t)(a) << 3)
-#define RVU_VF_VFPF_MBOXX(a)		 (0x0ull | (uint64_t)(a) << 3)
+#define RVU_VF_DISC			 (0x0ull)  /* [CN20K, .) */
+#define RVU_VF_VFPF_MBOXX(a)		 (0x0ull | (uint64_t)(a) << 3) /* [CN9K, CN20K) */
 #define RVU_VF_INT			 (0x20ull)
 #define RVU_VF_INT_W1S			 (0x28ull)
 #define RVU_VF_INT_ENA_W1S		 (0x30ull)
diff --git a/drivers/common/cnxk/roc_mbox.h b/drivers/common/cnxk/roc_mbox.h
index 9a9dcbdbda..dd65946e9e 100644
--- a/drivers/common/cnxk/roc_mbox.h
+++ b/drivers/common/cnxk/roc_mbox.h
@@ -309,6 +309,7 @@ struct mbox_msghdr {
 	M(NIX_MCAST_GRP_UPDATE, 0x802d, nix_mcast_grp_update, nix_mcast_grp_update_req,            \
 	  nix_mcast_grp_update_rsp)                                                                \
 	M(NIX_GET_LF_STATS,    0x802e, nix_get_lf_stats, nix_get_lf_stats_req, nix_lf_stats_rsp)   \
+	M(NIX_CN20K_AQ_ENQ, 0x802f, nix_cn20k_aq_enq, nix_cn20k_aq_enq_req, nix_cn20k_aq_enq_rsp)  \
 	/* MCS mbox IDs (range 0xa000 - 0xbFFF) */                                                 \
 	M(MCS_ALLOC_RESOURCES, 0xa000, mcs_alloc_resources, mcs_alloc_rsrc_req,                    \
 	  mcs_alloc_rsrc_rsp)                                                                      \
@@ -1442,6 +1443,57 @@ struct nix_lf_free_req {
 	uint64_t __io flags;
 };
 
+/* CN20x NIX AQ enqueue msg */
+struct nix_cn20k_aq_enq_req {
+	struct mbox_msghdr hdr;
+	uint32_t __io qidx;
+	uint8_t __io ctype;
+	uint8_t __io op;
+	union {
+		/* Valid when op == WRITE/INIT and ctype == NIX_AQ_CTYPE_RQ */
+		__io struct nix_cn20k_rq_ctx_s rq;
+		/* Valid when op == WRITE/INIT and ctype == NIX_AQ_CTYPE_SQ */
+		__io struct nix_cn20k_sq_ctx_s sq;
+		/* Valid when op == WRITE/INIT and ctype == NIX_AQ_CTYPE_CQ */
+		__io struct nix_cn20k_cq_ctx_s cq;
+		/* Valid when op == WRITE/INIT and ctype == NIX_AQ_CTYPE_RSS */
+		__io struct nix_rsse_s rss;
+		/* Valid when op == WRITE/INIT and ctype == NIX_AQ_CTYPE_MCE */
+		__io struct nix_rx_mce_s mce;
+		/* Valid when op == WRITE/INIT and
+		 * ctype == NIX_AQ_CTYPE_BAND_PROF
+		 */
+		__io struct nix_band_prof_s prof;
+	};
+	/* Mask data when op == WRITE (1=write, 0=don't write) */
+	union {
+		/* Valid when op == WRITE and ctype == NIX_AQ_CTYPE_RQ */
+		__io struct nix_cn20k_rq_ctx_s rq_mask;
+		/* Valid when op == WRITE and ctype == NIX_AQ_CTYPE_SQ */
+		__io struct nix_cn20k_sq_ctx_s sq_mask;
+		/* Valid when op == WRITE and ctype == NIX_AQ_CTYPE_CQ */
+		__io struct nix_cn20k_cq_ctx_s cq_mask;
+		/* Valid when op == WRITE and ctype == NIX_AQ_CTYPE_RSS */
+		__io struct nix_rsse_s rss_mask;
+		/* Valid when op == WRITE and ctype == NIX_AQ_CTYPE_MCE */
+		__io struct nix_rx_mce_s mce_mask;
+		/* Valid when op == WRITE and ctype == NIX_AQ_CTYPE_BAND_PROF */
+		__io struct nix_band_prof_s prof_mask;
+	};
+};
+
+struct nix_cn20k_aq_enq_rsp {
+	struct mbox_msghdr hdr;
+	union {
+		__io struct nix_cn20k_rq_ctx_s rq;
+		__io struct nix_cn20k_sq_ctx_s sq;
+		__io struct nix_cn20k_cq_ctx_s cq;
+		__io struct nix_rsse_s rss;
+		__io struct nix_rx_mce_s mce;
+		__io struct nix_band_prof_s prof;
+	};
+};
+
 /* CN10x NIX AQ enqueue msg */
 struct nix_cn10k_aq_enq_req {
 	struct mbox_msghdr hdr;
diff --git a/drivers/common/cnxk/roc_nix.c b/drivers/common/cnxk/roc_nix.c
index 041621dfaa..e4d7e11121 100644
--- a/drivers/common/cnxk/roc_nix.c
+++ b/drivers/common/cnxk/roc_nix.c
@@ -398,15 +398,22 @@ sdp_lbk_id_update(struct plt_pci_device *pci_dev, struct nix *nix)
 uint64_t
 nix_get_blkaddr(struct dev *dev)
 {
+	uint64_t blkaddr;
 	uint64_t reg;
 
 	/* Reading the discovery register to know which NIX is the LF
 	 * attached to.
 	 */
-	reg = plt_read64(dev->bar2 +
-			 RVU_PF_BLOCK_ADDRX_DISC(RVU_BLOCK_ADDR_NIX0));
-
-	return reg & 0x1FFULL ? RVU_BLOCK_ADDR_NIX0 : RVU_BLOCK_ADDR_NIX1;
+	if (roc_model_is_cn9k() || roc_model_is_cn10k()) {
+		reg = plt_read64(dev->bar2 + RVU_PF_BLOCK_ADDRX_DISC(RVU_BLOCK_ADDR_NIX0));
+		blkaddr = reg & 0x1FFULL ? RVU_BLOCK_ADDR_NIX0 : RVU_BLOCK_ADDR_NIX1;
+	} else {
+		reg = plt_read64(dev->bar2 + RVU_PF_DISC);
+		blkaddr = reg & BIT_ULL(RVU_BLOCK_ADDR_NIX0) ? RVU_BLOCK_ADDR_NIX0 :
+			RVU_BLOCK_ADDR_NIX1;
+		blkaddr = RVU_BLOCK_ADDR_NIX0;
+	}
+	return blkaddr;
 }
 
 int
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 06/18] common/cnxk: support NIX queue config for cn20k
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
                     ` (4 preceding siblings ...)
  2024-09-26 16:01   ` [PATCH v2 05/18] common/cnxk: add cn20k NIX register definitions Nithin Dabilpuram
@ 2024-09-26 16:01   ` Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 07/18] common/cnxk: support bandwidth profile " Nithin Dabilpuram
                     ` (12 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-26 16:01 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

From: Satha Rao <skoteshwar@marvell.com>

Add support to setup NIX RQ, SQ, CQ for cn20k.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
---
 drivers/common/cnxk/roc_nix_fc.c     |  52 ++-
 drivers/common/cnxk/roc_nix_inl.c    |   2 +
 drivers/common/cnxk/roc_nix_priv.h   |   1 +
 drivers/common/cnxk/roc_nix_queue.c  | 532 ++++++++++++++++++++++++++-
 drivers/common/cnxk/roc_nix_stats.c  |  55 ++-
 drivers/common/cnxk/roc_nix_tm.c     |  22 +-
 drivers/common/cnxk/roc_nix_tm_ops.c |  14 +-
 7 files changed, 650 insertions(+), 28 deletions(-)

diff --git a/drivers/common/cnxk/roc_nix_fc.c b/drivers/common/cnxk/roc_nix_fc.c
index 2f72e67993..0676363c58 100644
--- a/drivers/common/cnxk/roc_nix_fc.c
+++ b/drivers/common/cnxk/roc_nix_fc.c
@@ -127,7 +127,7 @@ nix_fc_cq_config_get(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 		aq->qidx = fc_cfg->cq_cfg.rq;
 		aq->ctype = NIX_AQ_CTYPE_CQ;
 		aq->op = NIX_AQ_INSTOP_READ;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -136,6 +136,18 @@ nix_fc_cq_config_get(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 			goto exit;
 		}
 
+		aq->qidx = fc_cfg->cq_cfg.rq;
+		aq->ctype = NIX_AQ_CTYPE_CQ;
+		aq->op = NIX_AQ_INSTOP_READ;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
 		aq->qidx = fc_cfg->cq_cfg.rq;
 		aq->ctype = NIX_AQ_CTYPE_CQ;
 		aq->op = NIX_AQ_INSTOP_READ;
@@ -179,7 +191,7 @@ nix_fc_rq_config_get(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 		aq->qidx = fc_cfg->rq_cfg.rq;
 		aq->ctype = NIX_AQ_CTYPE_RQ;
 		aq->op = NIX_AQ_INSTOP_READ;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -188,6 +200,18 @@ nix_fc_rq_config_get(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 			goto exit;
 		}
 
+		aq->qidx = fc_cfg->rq_cfg.rq;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_READ;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
 		aq->qidx = fc_cfg->rq_cfg.rq;
 		aq->ctype = NIX_AQ_CTYPE_RQ;
 		aq->op = NIX_AQ_INSTOP_READ;
@@ -270,7 +294,7 @@ nix_fc_cq_config_set(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 
 		aq->cq.bp_ena = !!(fc_cfg->cq_cfg.enable);
 		aq->cq_mask.bp_ena = ~(aq->cq_mask.bp_ena);
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -290,6 +314,28 @@ nix_fc_cq_config_set(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 			aq->cq_mask.bp = ~(aq->cq_mask.bp);
 		}
 
+		aq->cq.bp_ena = !!(fc_cfg->cq_cfg.enable);
+		aq->cq_mask.bp_ena = ~(aq->cq_mask.bp_ena);
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = fc_cfg->cq_cfg.rq;
+		aq->ctype = NIX_AQ_CTYPE_CQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+
+		if (fc_cfg->cq_cfg.enable) {
+			aq->cq.bpid = nix->bpid[fc_cfg->cq_cfg.tc];
+			aq->cq_mask.bpid = ~(aq->cq_mask.bpid);
+			aq->cq.bp = fc_cfg->cq_cfg.cq_drop;
+			aq->cq_mask.bp = ~(aq->cq_mask.bp);
+		}
+
 		aq->cq.bp_ena = !!(fc_cfg->cq_cfg.enable);
 		aq->cq_mask.bp_ena = ~(aq->cq_mask.bp_ena);
 	}
diff --git a/drivers/common/cnxk/roc_nix_inl.c b/drivers/common/cnxk/roc_nix_inl.c
index a984ac56d9..a759052973 100644
--- a/drivers/common/cnxk/roc_nix_inl.c
+++ b/drivers/common/cnxk/roc_nix_inl.c
@@ -1385,6 +1385,8 @@ roc_nix_inl_dev_rq_get(struct roc_nix_rq *rq, bool enable)
 	mbox = mbox_get(dev->mbox);
 	if (roc_model_is_cn9k())
 		rc = nix_rq_cn9k_cfg(dev, inl_rq, inl_dev->qints, false, enable);
+	else if (roc_model_is_cn10k())
+		rc = nix_rq_cn10k_cfg(dev, inl_rq, inl_dev->qints, false, enable);
 	else
 		rc = nix_rq_cfg(dev, inl_rq, inl_dev->qints, false, enable);
 	if (rc) {
diff --git a/drivers/common/cnxk/roc_nix_priv.h b/drivers/common/cnxk/roc_nix_priv.h
index 275ffc8ea3..ade42c1878 100644
--- a/drivers/common/cnxk/roc_nix_priv.h
+++ b/drivers/common/cnxk/roc_nix_priv.h
@@ -409,6 +409,7 @@ int nix_tm_sq_sched_conf(struct nix *nix, struct nix_tm_node *node,
 
 int nix_rq_cn9k_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints,
 		    bool cfg, bool ena);
+int nix_rq_cn10k_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cfg, bool ena);
 int nix_rq_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cfg,
 	       bool ena);
 int nix_rq_ena_dis(struct dev *dev, struct roc_nix_rq *rq, bool enable);
diff --git a/drivers/common/cnxk/roc_nix_queue.c b/drivers/common/cnxk/roc_nix_queue.c
index f5441e0e6b..bb1b70424f 100644
--- a/drivers/common/cnxk/roc_nix_queue.c
+++ b/drivers/common/cnxk/roc_nix_queue.c
@@ -69,7 +69,7 @@ nix_rq_ena_dis(struct dev *dev, struct roc_nix_rq *rq, bool enable)
 
 		aq->rq.ena = enable;
 		aq->rq_mask.ena = ~(aq->rq_mask.ena);
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -82,6 +82,21 @@ nix_rq_ena_dis(struct dev *dev, struct roc_nix_rq *rq, bool enable)
 		aq->ctype = NIX_AQ_CTYPE_RQ;
 		aq->op = NIX_AQ_INSTOP_WRITE;
 
+		aq->rq.ena = enable;
+		aq->rq_mask.ena = ~(aq->rq_mask.ena);
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = rq->qid;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+
 		aq->rq.ena = enable;
 		aq->rq_mask.ena = ~(aq->rq_mask.ena);
 	}
@@ -150,7 +165,7 @@ roc_nix_rq_is_sso_enable(struct roc_nix *roc_nix, uint32_t qid)
 			goto exit;
 
 		sso_enable = rsp->rq.sso_ena;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_rsp *rsp;
 		struct nix_cn10k_aq_enq_req *aq;
 
@@ -164,6 +179,25 @@ roc_nix_rq_is_sso_enable(struct roc_nix *roc_nix, uint32_t qid)
 		aq->ctype = NIX_AQ_CTYPE_RQ;
 		aq->op = NIX_AQ_INSTOP_READ;
 
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc)
+			goto exit;
+
+		sso_enable = rsp->rq.sso_ena;
+	} else {
+		struct nix_cn20k_aq_enq_rsp *rsp;
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = qid;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_READ;
+
 		rc = mbox_process_msg(mbox, (void *)&rsp);
 		if (rc)
 			goto exit;
@@ -222,7 +256,7 @@ nix_rq_aura_buf_type_update(struct roc_nix_rq *rq, bool set)
 		if (rsp->rq.spb_ena)
 			spb_aura = roc_npa_aura_handle_gen(rsp->rq.spb_aura, aura_base);
 		mbox_put(mbox);
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_rsp *rsp;
 		struct nix_cn10k_aq_enq_req *aq;
 
@@ -249,6 +283,32 @@ nix_rq_aura_buf_type_update(struct roc_nix_rq *rq, bool set)
 		if (rsp->rq.vwqe_ena)
 			vwqe_aura = roc_npa_aura_handle_gen(rsp->rq.wqe_aura, aura_base);
 
+		mbox_put(mbox);
+	} else {
+		struct nix_cn20k_aq_enq_rsp *rsp;
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox_get(mbox));
+		if (!aq) {
+			mbox_put(mbox);
+			return -ENOSPC;
+		}
+
+		aq->qidx = rq->qid;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_READ;
+
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc) {
+			mbox_put(mbox);
+			return rc;
+		}
+
+		/* Get aura handle from aura */
+		lpb_aura = roc_npa_aura_handle_gen(rsp->rq.lpb_aura, aura_base);
+		if (rsp->rq.spb_ena)
+			spb_aura = roc_npa_aura_handle_gen(rsp->rq.spb_aura, aura_base);
+
 		mbox_put(mbox);
 	}
 
@@ -443,8 +503,7 @@ nix_rq_cn9k_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints,
 }
 
 int
-nix_rq_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cfg,
-	   bool ena)
+nix_rq_cn10k_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cfg, bool ena)
 {
 	struct nix_cn10k_aq_enq_req *aq;
 	struct mbox *mbox = dev->mbox;
@@ -667,6 +726,171 @@ nix_rq_cman_cfg(struct dev *dev, struct roc_nix_rq *rq)
 	return rc;
 }
 
+int
+nix_rq_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cfg, bool ena)
+{
+	struct nix_cn20k_aq_enq_req *aq;
+	struct mbox *mbox = dev->mbox;
+
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+	if (!aq)
+		return -ENOSPC;
+
+	aq->qidx = rq->qid;
+	aq->ctype = NIX_AQ_CTYPE_RQ;
+	aq->op = cfg ? NIX_AQ_INSTOP_WRITE : NIX_AQ_INSTOP_INIT;
+
+	if (rq->sso_ena) {
+		/* SSO mode */
+		aq->rq.sso_ena = 1;
+		aq->rq.sso_tt = rq->tt;
+		aq->rq.sso_grp = rq->hwgrp;
+		aq->rq.ena_wqwd = 1;
+		aq->rq.wqe_skip = rq->wqe_skip;
+		aq->rq.wqe_caching = 1;
+
+		aq->rq.good_utag = rq->tag_mask >> 24;
+		aq->rq.bad_utag = rq->tag_mask >> 24;
+		aq->rq.ltag = rq->tag_mask & BITMASK_ULL(24, 0);
+
+		if (rq->vwqe_ena)
+			aq->rq.wqe_aura = roc_npa_aura_handle_to_aura(rq->vwqe_aura_handle);
+	} else {
+		/* CQ mode */
+		aq->rq.sso_ena = 0;
+		aq->rq.good_utag = rq->tag_mask >> 24;
+		aq->rq.bad_utag = rq->tag_mask >> 24;
+		aq->rq.ltag = rq->tag_mask & BITMASK_ULL(24, 0);
+		aq->rq.cq = rq->cqid;
+	}
+
+	if (rq->ipsech_ena) {
+		aq->rq.ipsech_ena = 1;
+		aq->rq.ipsecd_drop_en = 1;
+		aq->rq.ena_wqwd = 1;
+		aq->rq.wqe_skip = rq->wqe_skip;
+		aq->rq.wqe_caching = 1;
+	}
+
+	aq->rq.lpb_aura = roc_npa_aura_handle_to_aura(rq->aura_handle);
+
+	/* Sizes must be aligned to 8 bytes */
+	if (rq->first_skip & 0x7 || rq->later_skip & 0x7 || rq->lpb_size & 0x7)
+		return -EINVAL;
+
+	/* Expressed in number of dwords */
+	aq->rq.first_skip = rq->first_skip / 8;
+	aq->rq.later_skip = rq->later_skip / 8;
+	aq->rq.flow_tagw = rq->flow_tag_width; /* 32-bits */
+	aq->rq.lpb_sizem1 = rq->lpb_size / 8;
+	aq->rq.lpb_sizem1 -= 1; /* Expressed in size minus one */
+	aq->rq.ena = ena;
+
+	if (rq->spb_ena) {
+		uint32_t spb_sizem1;
+
+		aq->rq.spb_ena = 1;
+		aq->rq.spb_aura =
+			roc_npa_aura_handle_to_aura(rq->spb_aura_handle);
+
+		if (rq->spb_size & 0x7 ||
+		    rq->spb_size > NIX_RQ_CN10K_SPB_MAX_SIZE)
+			return -EINVAL;
+
+		spb_sizem1 = rq->spb_size / 8; /* Expressed in no. of dwords */
+		spb_sizem1 -= 1;	       /* Expressed in size minus one */
+		aq->rq.spb_sizem1 = spb_sizem1 & 0x3F;
+		aq->rq.spb_high_sizem1 = (spb_sizem1 >> 6) & 0x7;
+	} else {
+		aq->rq.spb_ena = 0;
+	}
+
+	aq->rq.pb_caching = 0x2; /* First cache aligned block to LLC */
+	aq->rq.xqe_imm_size = 0; /* No pkt data copy to CQE */
+	aq->rq.rq_int_ena = 0;
+	/* Many to one reduction */
+	aq->rq.qint_idx = rq->qid % qints;
+	aq->rq.xqe_drop_ena = 0;
+	aq->rq.lpb_drop_ena = rq->lpb_drop_ena;
+	aq->rq.spb_drop_ena = rq->spb_drop_ena;
+
+	/* If RED enabled, then fill enable for all cases */
+	if (rq->red_pass && (rq->red_pass >= rq->red_drop)) {
+		aq->rq.spb_pool_pass = rq->spb_red_pass;
+		aq->rq.lpb_pool_pass = rq->red_pass;
+		aq->rq.wqe_pool_pass = rq->red_pass;
+		aq->rq.xqe_pass = rq->red_pass;
+
+		aq->rq.spb_pool_drop = rq->spb_red_drop;
+		aq->rq.lpb_pool_drop = rq->red_drop;
+		aq->rq.wqe_pool_drop = rq->red_drop;
+		aq->rq.xqe_drop = rq->red_drop;
+	}
+
+	if (cfg) {
+		if (rq->sso_ena) {
+			/* SSO mode */
+			aq->rq_mask.sso_ena = ~aq->rq_mask.sso_ena;
+			aq->rq_mask.sso_tt = ~aq->rq_mask.sso_tt;
+			aq->rq_mask.sso_grp = ~aq->rq_mask.sso_grp;
+			aq->rq_mask.ena_wqwd = ~aq->rq_mask.ena_wqwd;
+			aq->rq_mask.wqe_skip = ~aq->rq_mask.wqe_skip;
+			aq->rq_mask.wqe_caching = ~aq->rq_mask.wqe_caching;
+			aq->rq_mask.good_utag = ~aq->rq_mask.good_utag;
+			aq->rq_mask.bad_utag = ~aq->rq_mask.bad_utag;
+			aq->rq_mask.ltag = ~aq->rq_mask.ltag;
+			if (rq->vwqe_ena)
+				aq->rq_mask.wqe_aura = ~aq->rq_mask.wqe_aura;
+		} else {
+			/* CQ mode */
+			aq->rq_mask.sso_ena = ~aq->rq_mask.sso_ena;
+			aq->rq_mask.good_utag = ~aq->rq_mask.good_utag;
+			aq->rq_mask.bad_utag = ~aq->rq_mask.bad_utag;
+			aq->rq_mask.ltag = ~aq->rq_mask.ltag;
+			aq->rq_mask.cq = ~aq->rq_mask.cq;
+		}
+
+		if (rq->ipsech_ena)
+			aq->rq_mask.ipsech_ena = ~aq->rq_mask.ipsech_ena;
+
+		if (rq->spb_ena) {
+			aq->rq_mask.spb_aura = ~aq->rq_mask.spb_aura;
+			aq->rq_mask.spb_sizem1 = ~aq->rq_mask.spb_sizem1;
+			aq->rq_mask.spb_high_sizem1 =
+				~aq->rq_mask.spb_high_sizem1;
+		}
+
+		aq->rq_mask.spb_ena = ~aq->rq_mask.spb_ena;
+		aq->rq_mask.lpb_aura = ~aq->rq_mask.lpb_aura;
+		aq->rq_mask.first_skip = ~aq->rq_mask.first_skip;
+		aq->rq_mask.later_skip = ~aq->rq_mask.later_skip;
+		aq->rq_mask.flow_tagw = ~aq->rq_mask.flow_tagw;
+		aq->rq_mask.lpb_sizem1 = ~aq->rq_mask.lpb_sizem1;
+		aq->rq_mask.ena = ~aq->rq_mask.ena;
+		aq->rq_mask.pb_caching = ~aq->rq_mask.pb_caching;
+		aq->rq_mask.xqe_imm_size = ~aq->rq_mask.xqe_imm_size;
+		aq->rq_mask.rq_int_ena = ~aq->rq_mask.rq_int_ena;
+		aq->rq_mask.qint_idx = ~aq->rq_mask.qint_idx;
+		aq->rq_mask.xqe_drop_ena = ~aq->rq_mask.xqe_drop_ena;
+		aq->rq_mask.lpb_drop_ena = ~aq->rq_mask.lpb_drop_ena;
+		aq->rq_mask.spb_drop_ena = ~aq->rq_mask.spb_drop_ena;
+
+		if (rq->red_pass && (rq->red_pass >= rq->red_drop)) {
+			aq->rq_mask.spb_pool_pass = ~aq->rq_mask.spb_pool_pass;
+			aq->rq_mask.lpb_pool_pass = ~aq->rq_mask.lpb_pool_pass;
+			aq->rq_mask.wqe_pool_pass = ~aq->rq_mask.wqe_pool_pass;
+			aq->rq_mask.xqe_pass = ~aq->rq_mask.xqe_pass;
+
+			aq->rq_mask.spb_pool_drop = ~aq->rq_mask.spb_pool_drop;
+			aq->rq_mask.lpb_pool_drop = ~aq->rq_mask.lpb_pool_drop;
+			aq->rq_mask.wqe_pool_drop = ~aq->rq_mask.wqe_pool_drop;
+			aq->rq_mask.xqe_drop = ~aq->rq_mask.xqe_drop;
+		}
+	}
+
+	return 0;
+}
+
 int
 roc_nix_rq_init(struct roc_nix *roc_nix, struct roc_nix_rq *rq, bool ena)
 {
@@ -691,6 +915,8 @@ roc_nix_rq_init(struct roc_nix *roc_nix, struct roc_nix_rq *rq, bool ena)
 
 	if (is_cn9k)
 		rc = nix_rq_cn9k_cfg(dev, rq, nix->qints, false, ena);
+	else if (roc_model_is_cn10k())
+		rc = nix_rq_cn10k_cfg(dev, rq, nix->qints, false, ena);
 	else
 		rc = nix_rq_cfg(dev, rq, nix->qints, false, ena);
 
@@ -745,6 +971,8 @@ roc_nix_rq_modify(struct roc_nix *roc_nix, struct roc_nix_rq *rq, bool ena)
 	mbox = mbox_get(m_box);
 	if (is_cn9k)
 		rc = nix_rq_cn9k_cfg(dev, rq, nix->qints, true, ena);
+	else if (roc_model_is_cn10k())
+		rc = nix_rq_cn10k_cfg(dev, rq, nix->qints, true, ena);
 	else
 		rc = nix_rq_cfg(dev, rq, nix->qints, true, ena);
 
@@ -817,12 +1045,121 @@ roc_nix_rq_fini(struct roc_nix_rq *rq)
 	return 0;
 }
 
+static inline int
+roc_nix_cn20k_cq_init(struct roc_nix *roc_nix, struct roc_nix_cq *cq)
+{
+	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	struct mbox *mbox = (&nix->dev)->mbox;
+	volatile struct nix_cn20k_cq_ctx_s *cq_ctx;
+	uint16_t drop_thresh = NIX_CQ_THRESH_LEVEL;
+	uint16_t cpt_lbpid = nix->cpt_lbpid;
+	struct nix_cn20k_aq_enq_req *aq;
+	enum nix_q_size qsize;
+	size_t desc_sz;
+	int rc;
+
+	if (cq == NULL)
+		return NIX_ERR_PARAM;
+
+	qsize = nix_qsize_clampup(cq->nb_desc);
+	cq->nb_desc = nix_qsize_to_val(qsize);
+	cq->qmask = cq->nb_desc - 1;
+	cq->door = nix->base + NIX_LF_CQ_OP_DOOR;
+	cq->status = (int64_t *)(nix->base + NIX_LF_CQ_OP_STATUS);
+	cq->wdata = (uint64_t)cq->qid << 32;
+	cq->roc_nix = roc_nix;
+
+	/* CQE of W16 */
+	desc_sz = cq->nb_desc * NIX_CQ_ENTRY_SZ;
+	cq->desc_base = plt_zmalloc(desc_sz, NIX_CQ_ALIGN);
+	if (cq->desc_base == NULL) {
+		rc = NIX_ERR_NO_MEM;
+		goto fail;
+	}
+
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox_get(mbox));
+	if (!aq) {
+		mbox_put(mbox);
+		return -ENOSPC;
+	}
+
+	aq->qidx = cq->qid;
+	aq->ctype = NIX_AQ_CTYPE_CQ;
+	aq->op = NIX_AQ_INSTOP_INIT;
+	cq_ctx = &aq->cq;
+
+	cq_ctx->ena = 1;
+	cq_ctx->caching = 1;
+	cq_ctx->qsize = qsize;
+	cq_ctx->base = (uint64_t)cq->desc_base;
+	cq_ctx->avg_level = 0xff;
+	cq_ctx->cq_err_int_ena = BIT(NIX_CQERRINT_CQE_FAULT);
+	cq_ctx->cq_err_int_ena |= BIT(NIX_CQERRINT_DOOR_ERR);
+	if (roc_feature_nix_has_late_bp() && roc_nix_inl_inb_is_enabled(roc_nix)) {
+		cq_ctx->cq_err_int_ena |= BIT(NIX_CQERRINT_CPT_DROP);
+		cq_ctx->cpt_drop_err_en = 1;
+		/* Enable Late BP only when non zero CPT BPID */
+		if (cpt_lbpid) {
+			cq_ctx->lbp_ena = 1;
+			cq_ctx->lbpid_low = cpt_lbpid & 0x7;
+			cq_ctx->lbpid_med = (cpt_lbpid >> 3) & 0x7;
+			cq_ctx->lbpid_high = (cpt_lbpid >> 6) & 0x7;
+			cq_ctx->lbp_frac = NIX_CQ_LPB_THRESH_FRAC;
+		}
+		drop_thresh = NIX_CQ_SEC_THRESH_LEVEL;
+	}
+
+	/* Many to one reduction */
+	cq_ctx->qint_idx = cq->qid % nix->qints;
+	/* Map CQ0 [RQ0] to CINT0 and so on till max 64 irqs */
+	cq_ctx->cint_idx = cq->qid;
+
+	if (roc_errata_nix_has_cq_min_size_4k()) {
+		const float rx_cq_skid = NIX_CQ_FULL_ERRATA_SKID;
+		uint16_t min_rx_drop;
+
+		min_rx_drop = ceil(rx_cq_skid / (float)cq->nb_desc);
+		cq_ctx->drop = min_rx_drop;
+		cq_ctx->drop_ena = 1;
+		cq->drop_thresh = min_rx_drop;
+	} else {
+		cq->drop_thresh = drop_thresh;
+		/* Drop processing or red drop cannot be enabled due to
+		 * due to packets coming for second pass from CPT.
+		 */
+		if (!roc_nix_inl_inb_is_enabled(roc_nix)) {
+			cq_ctx->drop = cq->drop_thresh;
+			cq_ctx->drop_ena = 1;
+		}
+	}
+	cq_ctx->bp = cq->drop_thresh;
+
+	if (roc_feature_nix_has_cqe_stash()) {
+		if (cq_ctx->caching) {
+			cq_ctx->stashing = 1;
+			cq_ctx->stash_thresh = cq->stash_thresh;
+		}
+	}
+
+	rc = mbox_process(mbox);
+	mbox_put(mbox);
+	if (rc)
+		goto free_mem;
+
+	return nix_tel_node_add_cq(cq);
+
+free_mem:
+	plt_free(cq->desc_base);
+fail:
+	return rc;
+}
+
 int
 roc_nix_cq_init(struct roc_nix *roc_nix, struct roc_nix_cq *cq)
 {
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
 	struct mbox *mbox = (&nix->dev)->mbox;
-	volatile struct nix_cq_ctx_s *cq_ctx;
+	volatile struct nix_cq_ctx_s *cq_ctx = NULL;
 	uint16_t drop_thresh = NIX_CQ_THRESH_LEVEL;
 	uint16_t cpt_lbpid = nix->cpt_lbpid;
 	enum nix_q_size qsize;
@@ -832,6 +1169,9 @@ roc_nix_cq_init(struct roc_nix *roc_nix, struct roc_nix_cq *cq)
 	if (cq == NULL)
 		return NIX_ERR_PARAM;
 
+	if (roc_model_is_cn20k())
+		return roc_nix_cn20k_cq_init(roc_nix, cq);
+
 	qsize = nix_qsize_clampup(cq->nb_desc);
 	cq->nb_desc = nix_qsize_to_val(qsize);
 	cq->qmask = cq->nb_desc - 1;
@@ -861,7 +1201,7 @@ roc_nix_cq_init(struct roc_nix *roc_nix, struct roc_nix_cq *cq)
 		aq->ctype = NIX_AQ_CTYPE_CQ;
 		aq->op = NIX_AQ_INSTOP_INIT;
 		cq_ctx = &aq->cq;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox_get(mbox));
@@ -972,7 +1312,7 @@ roc_nix_cq_fini(struct roc_nix_cq *cq)
 		aq->cq.bp_ena = 0;
 		aq->cq_mask.ena = ~aq->cq_mask.ena;
 		aq->cq_mask.bp_ena = ~aq->cq_mask.bp_ena;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -981,6 +1321,26 @@ roc_nix_cq_fini(struct roc_nix_cq *cq)
 			return -ENOSPC;
 		}
 
+		aq->qidx = cq->qid;
+		aq->ctype = NIX_AQ_CTYPE_CQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		aq->cq.ena = 0;
+		aq->cq.bp_ena = 0;
+		aq->cq_mask.ena = ~aq->cq_mask.ena;
+		aq->cq_mask.bp_ena = ~aq->cq_mask.bp_ena;
+		if (roc_feature_nix_has_late_bp() && roc_nix_inl_inb_is_enabled(cq->roc_nix)) {
+			aq->cq.lbp_ena = 0;
+			aq->cq_mask.lbp_ena = ~aq->cq_mask.lbp_ena;
+		}
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			mbox_put(mbox);
+			return -ENOSPC;
+		}
+
 		aq->qidx = cq->qid;
 		aq->ctype = NIX_AQ_CTYPE_CQ;
 		aq->op = NIX_AQ_INSTOP_WRITE;
@@ -1227,14 +1587,152 @@ sq_cn9k_fini(struct nix *nix, struct roc_nix_sq *sq)
 	return 0;
 }
 
+static int
+sq_cn10k_init(struct nix *nix, struct roc_nix_sq *sq, uint32_t rr_quantum, uint16_t smq)
+{
+	struct roc_nix *roc_nix = nix_priv_to_roc_nix(nix);
+	struct mbox *mbox = (&nix->dev)->mbox;
+	struct nix_cn10k_aq_enq_req *aq;
+
+	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	if (!aq)
+		return -ENOSPC;
+
+	aq->qidx = sq->qid;
+	aq->ctype = NIX_AQ_CTYPE_SQ;
+	aq->op = NIX_AQ_INSTOP_INIT;
+	aq->sq.max_sqe_size = sq->max_sqe_sz;
+
+	aq->sq.max_sqe_size = sq->max_sqe_sz;
+	aq->sq.smq = smq;
+	aq->sq.smq_rr_weight = rr_quantum;
+	if (roc_nix_is_sdp(roc_nix))
+		aq->sq.default_chan = nix->tx_chan_base + (sq->qid % nix->tx_chan_cnt);
+	else
+		aq->sq.default_chan = nix->tx_chan_base;
+	aq->sq.sqe_stype = NIX_STYPE_STF;
+	aq->sq.ena = 1;
+	aq->sq.sso_ena = !!sq->sso_ena;
+	aq->sq.cq_ena = !!sq->cq_ena;
+	aq->sq.cq = sq->cqid;
+	aq->sq.cq_limit = sq->cq_drop_thresh;
+	if (aq->sq.max_sqe_size == NIX_MAXSQESZ_W8)
+		aq->sq.sqe_stype = NIX_STYPE_STP;
+	aq->sq.sqb_aura = roc_npa_aura_handle_to_aura(sq->aura_handle);
+	aq->sq.sq_int_ena = BIT(NIX_SQINT_LMT_ERR);
+	aq->sq.sq_int_ena |= BIT(NIX_SQINT_SQB_ALLOC_FAIL);
+	aq->sq.sq_int_ena |= BIT(NIX_SQINT_SEND_ERR);
+	aq->sq.sq_int_ena |= BIT(NIX_SQINT_MNQ_ERR);
+
+	/* Many to one reduction */
+	aq->sq.qint_idx = sq->qid % nix->qints;
+	if (roc_errata_nix_assign_incorrect_qint()) {
+		/* Assigning QINT 0 to all the SQs, an errata exists where NIXTX can
+		 * send incorrect QINT_IDX when reporting queue interrupt (QINT). This
+		 * might result in software missing the interrupt.
+		 */
+		aq->sq.qint_idx = 0;
+	}
+	return 0;
+}
+
+static int
+sq_cn10k_fini(struct nix *nix, struct roc_nix_sq *sq)
+{
+	struct mbox *mbox = mbox_get((&nix->dev)->mbox);
+	struct nix_cn10k_aq_enq_rsp *rsp;
+	struct nix_cn10k_aq_enq_req *aq;
+	uint16_t sqes_per_sqb;
+	void *sqb_buf;
+	int rc, count;
+
+	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	if (!aq) {
+		mbox_put(mbox);
+		return -ENOSPC;
+	}
+
+	aq->qidx = sq->qid;
+	aq->ctype = NIX_AQ_CTYPE_SQ;
+	aq->op = NIX_AQ_INSTOP_READ;
+	rc = mbox_process_msg(mbox, (void *)&rsp);
+	if (rc) {
+		mbox_put(mbox);
+		return rc;
+	}
+
+	/* Check if sq is already cleaned up */
+	if (!rsp->sq.ena) {
+		mbox_put(mbox);
+		return 0;
+	}
+
+	/* Disable sq */
+	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	if (!aq) {
+		mbox_put(mbox);
+		return -ENOSPC;
+	}
+
+	aq->qidx = sq->qid;
+	aq->ctype = NIX_AQ_CTYPE_SQ;
+	aq->op = NIX_AQ_INSTOP_WRITE;
+	aq->sq_mask.ena = ~aq->sq_mask.ena;
+	aq->sq.ena = 0;
+	rc = mbox_process(mbox);
+	if (rc) {
+		mbox_put(mbox);
+		return rc;
+	}
+
+	/* Read SQ and free sqb's */
+	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	if (!aq) {
+		mbox_put(mbox);
+		return -ENOSPC;
+	}
+
+	aq->qidx = sq->qid;
+	aq->ctype = NIX_AQ_CTYPE_SQ;
+	aq->op = NIX_AQ_INSTOP_READ;
+	rc = mbox_process_msg(mbox, (void *)&rsp);
+	if (rc) {
+		mbox_put(mbox);
+		return rc;
+	}
+
+	if (aq->sq.smq_pend)
+		plt_err("SQ has pending SQE's");
+
+	count = aq->sq.sqb_count;
+	sqes_per_sqb = 1 << sq->sqes_per_sqb_log2;
+	/* Free SQB's that are used */
+	sqb_buf = (void *)rsp->sq.head_sqb;
+	while (count) {
+		void *next_sqb;
+
+		next_sqb = *(void **)((uint64_t *)sqb_buf +
+				      (uint32_t)((sqes_per_sqb - 1) * (0x2 >> sq->max_sqe_sz) * 8));
+		roc_npa_aura_op_free(sq->aura_handle, 1, (uint64_t)sqb_buf);
+		sqb_buf = next_sqb;
+		count--;
+	}
+
+	/* Free next to use sqb */
+	if (rsp->sq.next_sqb)
+		roc_npa_aura_op_free(sq->aura_handle, 1, rsp->sq.next_sqb);
+	mbox_put(mbox);
+	return 0;
+}
+
 static int
 sq_init(struct nix *nix, struct roc_nix_sq *sq, uint32_t rr_quantum, uint16_t smq)
 {
 	struct roc_nix *roc_nix = nix_priv_to_roc_nix(nix);
 	struct mbox *mbox = (&nix->dev)->mbox;
-	struct nix_cn10k_aq_enq_req *aq;
+	struct nix_cn20k_aq_enq_req *aq;
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
 	if (!aq)
 		return -ENOSPC;
 
@@ -1280,13 +1778,13 @@ static int
 sq_fini(struct nix *nix, struct roc_nix_sq *sq)
 {
 	struct mbox *mbox = mbox_get((&nix->dev)->mbox);
-	struct nix_cn10k_aq_enq_rsp *rsp;
-	struct nix_cn10k_aq_enq_req *aq;
+	struct nix_cn20k_aq_enq_rsp *rsp;
+	struct nix_cn20k_aq_enq_req *aq;
 	uint16_t sqes_per_sqb;
 	void *sqb_buf;
 	int rc, count;
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
 	if (!aq) {
 		mbox_put(mbox);
 		return -ENOSPC;
@@ -1308,7 +1806,7 @@ sq_fini(struct nix *nix, struct roc_nix_sq *sq)
 	}
 
 	/* Disable sq */
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
 	if (!aq) {
 		mbox_put(mbox);
 		return -ENOSPC;
@@ -1326,7 +1824,7 @@ sq_fini(struct nix *nix, struct roc_nix_sq *sq)
 	}
 
 	/* Read SQ and free sqb's */
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
 	if (!aq) {
 		mbox_put(mbox);
 		return -ENOSPC;
@@ -1408,6 +1906,8 @@ roc_nix_sq_init(struct roc_nix *roc_nix, struct roc_nix_sq *sq)
 	/* Init SQ context */
 	if (roc_model_is_cn9k())
 		rc = sq_cn9k_init(nix, sq, rr_quantum, smq);
+	else if (roc_model_is_cn10k())
+		rc = sq_cn10k_init(nix, sq, rr_quantum, smq);
 	else
 		rc = sq_init(nix, sq, rr_quantum, smq);
 
@@ -1464,6 +1964,8 @@ roc_nix_sq_fini(struct roc_nix_sq *sq)
 	/* Release SQ context */
 	if (roc_model_is_cn9k())
 		rc |= sq_cn9k_fini(roc_nix_to_nix_priv(sq->roc_nix), sq);
+	else if (roc_model_is_cn10k())
+		rc |= sq_cn10k_fini(roc_nix_to_nix_priv(sq->roc_nix), sq);
 	else
 		rc |= sq_fini(roc_nix_to_nix_priv(sq->roc_nix), sq);
 
diff --git a/drivers/common/cnxk/roc_nix_stats.c b/drivers/common/cnxk/roc_nix_stats.c
index 7a9619b39d..6f241c72de 100644
--- a/drivers/common/cnxk/roc_nix_stats.c
+++ b/drivers/common/cnxk/roc_nix_stats.c
@@ -173,7 +173,7 @@ nix_stat_rx_queue_reset(struct nix *nix, uint16_t qid)
 		aq->rq_mask.drop_octs = ~(aq->rq_mask.drop_octs);
 		aq->rq_mask.drop_pkts = ~(aq->rq_mask.drop_pkts);
 		aq->rq_mask.re_pkts = ~(aq->rq_mask.re_pkts);
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -192,6 +192,30 @@ nix_stat_rx_queue_reset(struct nix *nix, uint16_t qid)
 		aq->rq.drop_pkts = 0;
 		aq->rq.re_pkts = 0;
 
+		aq->rq_mask.octs = ~(aq->rq_mask.octs);
+		aq->rq_mask.pkts = ~(aq->rq_mask.pkts);
+		aq->rq_mask.drop_octs = ~(aq->rq_mask.drop_octs);
+		aq->rq_mask.drop_pkts = ~(aq->rq_mask.drop_pkts);
+		aq->rq_mask.re_pkts = ~(aq->rq_mask.re_pkts);
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = qid;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+
+		aq->rq.octs = 0;
+		aq->rq.pkts = 0;
+		aq->rq.drop_octs = 0;
+		aq->rq.drop_pkts = 0;
+		aq->rq.re_pkts = 0;
+
 		aq->rq_mask.octs = ~(aq->rq_mask.octs);
 		aq->rq_mask.pkts = ~(aq->rq_mask.pkts);
 		aq->rq_mask.drop_octs = ~(aq->rq_mask.drop_octs);
@@ -233,7 +257,7 @@ nix_stat_tx_queue_reset(struct nix *nix, uint16_t qid)
 		aq->sq_mask.pkts = ~(aq->sq_mask.pkts);
 		aq->sq_mask.drop_octs = ~(aq->sq_mask.drop_octs);
 		aq->sq_mask.drop_pkts = ~(aq->sq_mask.drop_pkts);
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -250,6 +274,29 @@ nix_stat_tx_queue_reset(struct nix *nix, uint16_t qid)
 		aq->sq.drop_octs = 0;
 		aq->sq.drop_pkts = 0;
 
+		aq->sq_mask.octs = ~(aq->sq_mask.octs);
+		aq->sq_mask.pkts = ~(aq->sq_mask.pkts);
+		aq->sq_mask.drop_octs = ~(aq->sq_mask.drop_octs);
+		aq->sq_mask.drop_pkts = ~(aq->sq_mask.drop_pkts);
+		aq->sq_mask.aged_drop_octs = ~(aq->sq_mask.aged_drop_octs);
+		aq->sq_mask.aged_drop_pkts = ~(aq->sq_mask.aged_drop_pkts);
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = qid;
+		aq->ctype = NIX_AQ_CTYPE_SQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		aq->sq.octs = 0;
+		aq->sq.pkts = 0;
+		aq->sq.drop_octs = 0;
+		aq->sq.drop_pkts = 0;
+
 		aq->sq_mask.octs = ~(aq->sq_mask.octs);
 		aq->sq_mask.pkts = ~(aq->sq_mask.pkts);
 		aq->sq_mask.drop_octs = ~(aq->sq_mask.drop_octs);
@@ -375,7 +422,7 @@ roc_nix_xstats_get(struct roc_nix *roc_nix, struct roc_nix_xstat *xstats,
 	xstats[count].id = count;
 	count++;
 
-	if (roc_model_is_cn10k()) {
+	if (roc_model_is_cn10k() || roc_model_is_cn20k()) {
 		for (i = 0; i < CNXK_NIX_NUM_CN10K_RX_XSTATS; i++) {
 			xstats[count].value =
 				NIX_RX_STATS(nix_cn10k_rx_xstats[i].offset);
@@ -492,7 +539,7 @@ roc_nix_xstats_names_get(struct roc_nix *roc_nix,
 		count++;
 	}
 
-	if (roc_model_is_cn10k()) {
+	if (roc_model_is_cn10k() || roc_model_is_cn20k()) {
 		for (i = 0; i < CNXK_NIX_NUM_CN10K_RX_XSTATS; i++) {
 			NIX_XSTATS_NAME_PRINT(xstats_names, count,
 					      nix_cn10k_rx_xstats, i);
diff --git a/drivers/common/cnxk/roc_nix_tm.c b/drivers/common/cnxk/roc_nix_tm.c
index ac522f8235..5725ef568a 100644
--- a/drivers/common/cnxk/roc_nix_tm.c
+++ b/drivers/common/cnxk/roc_nix_tm.c
@@ -1058,7 +1058,7 @@ nix_tm_sq_sched_conf(struct nix *nix, struct nix_tm_node *node,
 		}
 		aq->sq.smq_rr_quantum = rr_quantum;
 		aq->sq_mask.smq_rr_quantum = ~aq->sq_mask.smq_rr_quantum;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -1071,6 +1071,26 @@ nix_tm_sq_sched_conf(struct nix *nix, struct nix_tm_node *node,
 		aq->ctype = NIX_AQ_CTYPE_SQ;
 		aq->op = NIX_AQ_INSTOP_WRITE;
 
+		/* smq update only when needed */
+		if (!rr_quantum_only) {
+			aq->sq.smq = smq;
+			aq->sq_mask.smq = ~aq->sq_mask.smq;
+		}
+		aq->sq.smq_rr_weight = rr_quantum;
+		aq->sq_mask.smq_rr_weight = ~aq->sq_mask.smq_rr_weight;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = qid;
+		aq->ctype = NIX_AQ_CTYPE_SQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+
 		/* smq update only when needed */
 		if (!rr_quantum_only) {
 			aq->sq.smq = smq;
diff --git a/drivers/common/cnxk/roc_nix_tm_ops.c b/drivers/common/cnxk/roc_nix_tm_ops.c
index 8144675f89..a9cfadd1b0 100644
--- a/drivers/common/cnxk/roc_nix_tm_ops.c
+++ b/drivers/common/cnxk/roc_nix_tm_ops.c
@@ -1294,15 +1294,19 @@ roc_nix_tm_rsrc_max(bool pf, uint16_t schq[ROC_TM_LVL_MAX])
 
 		switch (hw_lvl) {
 		case NIX_TXSCH_LVL_SMQ:
-			max = (roc_model_is_cn9k() ?
-					     NIX_CN9K_TXSCH_LVL_SMQ_MAX :
-					     NIX_TXSCH_LVL_SMQ_MAX);
+			max = (roc_model_is_cn9k() ? NIX_CN9K_TXSCH_LVL_SMQ_MAX :
+				(roc_model_is_cn10k() ? NIX_CN10K_TXSCH_LVL_SMQ_MAX :
+				 NIX_TXSCH_LVL_SMQ_MAX));
 			break;
 		case NIX_TXSCH_LVL_TL4:
-			max = NIX_TXSCH_LVL_TL4_MAX;
+			max = (roc_model_is_cn9k() ? NIX_CN9K_TXSCH_LVL_TL4_MAX :
+				(roc_model_is_cn10k() ? NIX_CN10K_TXSCH_LVL_TL4_MAX :
+							NIX_TXSCH_LVL_TL4_MAX));
 			break;
 		case NIX_TXSCH_LVL_TL3:
-			max = NIX_TXSCH_LVL_TL3_MAX;
+			max = (roc_model_is_cn9k() ? NIX_CN9K_TXSCH_LVL_TL3_MAX :
+				(roc_model_is_cn10k() ? NIX_CN10K_TXSCH_LVL_TL3_MAX :
+							NIX_TXSCH_LVL_TL3_MAX));
 			break;
 		case NIX_TXSCH_LVL_TL2:
 			max = pf ? NIX_TXSCH_LVL_TL2_MAX : 1;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 07/18] common/cnxk: support bandwidth profile for cn20k
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
                     ` (5 preceding siblings ...)
  2024-09-26 16:01   ` [PATCH v2 06/18] common/cnxk: support NIX queue config for cn20k Nithin Dabilpuram
@ 2024-09-26 16:01   ` Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 08/18] common/cnxk: support NIX debug " Nithin Dabilpuram
                     ` (11 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-26 16:01 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

From: Satha Rao <skoteshwar@marvell.com>

Add support to setup bandwidth profile config for cn20k
for Rx policier.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
---
 drivers/common/cnxk/roc_nix_bpf.c   | 528 ++++++++++++++++++----------
 drivers/common/cnxk/roc_nix_queue.c | 136 ++++---
 2 files changed, 425 insertions(+), 239 deletions(-)

diff --git a/drivers/common/cnxk/roc_nix_bpf.c b/drivers/common/cnxk/roc_nix_bpf.c
index d60396289b..98c9855a5b 100644
--- a/drivers/common/cnxk/roc_nix_bpf.c
+++ b/drivers/common/cnxk/roc_nix_bpf.c
@@ -547,9 +547,9 @@ roc_nix_bpf_config(struct roc_nix *roc_nix, uint16_t id,
 {
 	uint64_t exponent_p = 0, mantissa_p = 0, div_exp_p = 0;
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	volatile struct nix_band_prof_s *prof, *prof_mask;
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = dev->mbox;
-	struct nix_cn10k_aq_enq_req *aq;
 	uint32_t policer_timeunit;
 	uint8_t level_idx;
 	int rc;
@@ -568,103 +568,122 @@ roc_nix_bpf_config(struct roc_nix *roc_nix, uint16_t id,
 	if (level_idx == ROC_NIX_BPF_LEVEL_IDX_INVALID)
 		return NIX_ERR_PARAM;
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox_get(mbox));
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox_get(mbox));
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | id;
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		prof = &aq->prof;
+		prof_mask = &aq->prof_mask;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox_get(mbox));
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | id;
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		prof = &aq->prof;
+		prof_mask = &aq->prof_mask;
 	}
-	aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | id;
-	aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
-	aq->op = NIX_AQ_INSTOP_WRITE;
 
-	aq->prof.adjust_exponent = NIX_BPF_DEFAULT_ADJUST_EXPONENT;
-	aq->prof.adjust_mantissa = NIX_BPF_DEFAULT_ADJUST_MANTISSA;
+	prof->adjust_exponent = NIX_BPF_DEFAULT_ADJUST_EXPONENT;
+	prof->adjust_mantissa = NIX_BPF_DEFAULT_ADJUST_MANTISSA;
 	if (cfg->lmode == ROC_NIX_BPF_LMODE_BYTE)
-		aq->prof.adjust_mantissa = NIX_BPF_DEFAULT_ADJUST_MANTISSA / 2;
+		prof->adjust_mantissa = NIX_BPF_DEFAULT_ADJUST_MANTISSA / 2;
 
-	aq->prof_mask.adjust_exponent = ~(aq->prof_mask.adjust_exponent);
-	aq->prof_mask.adjust_mantissa = ~(aq->prof_mask.adjust_mantissa);
+	prof_mask->adjust_exponent = ~(prof_mask->adjust_exponent);
+	prof_mask->adjust_mantissa = ~(prof_mask->adjust_mantissa);
 
 	switch (cfg->alg) {
 	case ROC_NIX_BPF_ALGO_2697:
 		meter_rate_to_nix(cfg->algo2697.cir, &exponent_p, &mantissa_p,
 				  &div_exp_p, policer_timeunit);
-		aq->prof.cir_mantissa = mantissa_p;
-		aq->prof.cir_exponent = exponent_p;
+		prof->cir_mantissa = mantissa_p;
+		prof->cir_exponent = exponent_p;
 
 		meter_burst_to_nix(cfg->algo2697.cbs, &exponent_p, &mantissa_p);
-		aq->prof.cbs_mantissa = mantissa_p;
-		aq->prof.cbs_exponent = exponent_p;
+		prof->cbs_mantissa = mantissa_p;
+		prof->cbs_exponent = exponent_p;
 
 		meter_burst_to_nix(cfg->algo2697.ebs, &exponent_p, &mantissa_p);
-		aq->prof.pebs_mantissa = mantissa_p;
-		aq->prof.pebs_exponent = exponent_p;
+		prof->pebs_mantissa = mantissa_p;
+		prof->pebs_exponent = exponent_p;
 
-		aq->prof_mask.cir_mantissa = ~(aq->prof_mask.cir_mantissa);
-		aq->prof_mask.cbs_mantissa = ~(aq->prof_mask.cbs_mantissa);
-		aq->prof_mask.pebs_mantissa = ~(aq->prof_mask.pebs_mantissa);
-		aq->prof_mask.cir_exponent = ~(aq->prof_mask.cir_exponent);
-		aq->prof_mask.cbs_exponent = ~(aq->prof_mask.cbs_exponent);
-		aq->prof_mask.pebs_exponent = ~(aq->prof_mask.pebs_exponent);
+		prof_mask->cir_mantissa = ~(prof_mask->cir_mantissa);
+		prof_mask->cbs_mantissa = ~(prof_mask->cbs_mantissa);
+		prof_mask->pebs_mantissa = ~(prof_mask->pebs_mantissa);
+		prof_mask->cir_exponent = ~(prof_mask->cir_exponent);
+		prof_mask->cbs_exponent = ~(prof_mask->cbs_exponent);
+		prof_mask->pebs_exponent = ~(prof_mask->pebs_exponent);
 		break;
 
 	case ROC_NIX_BPF_ALGO_2698:
 		meter_rate_to_nix(cfg->algo2698.cir, &exponent_p, &mantissa_p,
 				  &div_exp_p, policer_timeunit);
-		aq->prof.cir_mantissa = mantissa_p;
-		aq->prof.cir_exponent = exponent_p;
+		prof->cir_mantissa = mantissa_p;
+		prof->cir_exponent = exponent_p;
 
 		meter_rate_to_nix(cfg->algo2698.pir, &exponent_p, &mantissa_p,
 				  &div_exp_p, policer_timeunit);
-		aq->prof.peir_mantissa = mantissa_p;
-		aq->prof.peir_exponent = exponent_p;
+		prof->peir_mantissa = mantissa_p;
+		prof->peir_exponent = exponent_p;
 
 		meter_burst_to_nix(cfg->algo2698.cbs, &exponent_p, &mantissa_p);
-		aq->prof.cbs_mantissa = mantissa_p;
-		aq->prof.cbs_exponent = exponent_p;
+		prof->cbs_mantissa = mantissa_p;
+		prof->cbs_exponent = exponent_p;
 
 		meter_burst_to_nix(cfg->algo2698.pbs, &exponent_p, &mantissa_p);
-		aq->prof.pebs_mantissa = mantissa_p;
-		aq->prof.pebs_exponent = exponent_p;
+		prof->pebs_mantissa = mantissa_p;
+		prof->pebs_exponent = exponent_p;
 
-		aq->prof_mask.cir_mantissa = ~(aq->prof_mask.cir_mantissa);
-		aq->prof_mask.peir_mantissa = ~(aq->prof_mask.peir_mantissa);
-		aq->prof_mask.cbs_mantissa = ~(aq->prof_mask.cbs_mantissa);
-		aq->prof_mask.pebs_mantissa = ~(aq->prof_mask.pebs_mantissa);
-		aq->prof_mask.cir_exponent = ~(aq->prof_mask.cir_exponent);
-		aq->prof_mask.peir_exponent = ~(aq->prof_mask.peir_exponent);
-		aq->prof_mask.cbs_exponent = ~(aq->prof_mask.cbs_exponent);
-		aq->prof_mask.pebs_exponent = ~(aq->prof_mask.pebs_exponent);
+		prof_mask->cir_mantissa = ~(prof_mask->cir_mantissa);
+		prof_mask->peir_mantissa = ~(prof_mask->peir_mantissa);
+		prof_mask->cbs_mantissa = ~(prof_mask->cbs_mantissa);
+		prof_mask->pebs_mantissa = ~(prof_mask->pebs_mantissa);
+		prof_mask->cir_exponent = ~(prof_mask->cir_exponent);
+		prof_mask->peir_exponent = ~(prof_mask->peir_exponent);
+		prof_mask->cbs_exponent = ~(prof_mask->cbs_exponent);
+		prof_mask->pebs_exponent = ~(prof_mask->pebs_exponent);
 		break;
 
 	case ROC_NIX_BPF_ALGO_4115:
 		meter_rate_to_nix(cfg->algo4115.cir, &exponent_p, &mantissa_p,
 				  &div_exp_p, policer_timeunit);
-		aq->prof.cir_mantissa = mantissa_p;
-		aq->prof.cir_exponent = exponent_p;
+		prof->cir_mantissa = mantissa_p;
+		prof->cir_exponent = exponent_p;
 
 		meter_rate_to_nix(cfg->algo4115.eir, &exponent_p, &mantissa_p,
 				  &div_exp_p, policer_timeunit);
-		aq->prof.peir_mantissa = mantissa_p;
-		aq->prof.peir_exponent = exponent_p;
+		prof->peir_mantissa = mantissa_p;
+		prof->peir_exponent = exponent_p;
 
 		meter_burst_to_nix(cfg->algo4115.cbs, &exponent_p, &mantissa_p);
-		aq->prof.cbs_mantissa = mantissa_p;
-		aq->prof.cbs_exponent = exponent_p;
+		prof->cbs_mantissa = mantissa_p;
+		prof->cbs_exponent = exponent_p;
 
 		meter_burst_to_nix(cfg->algo4115.ebs, &exponent_p, &mantissa_p);
-		aq->prof.pebs_mantissa = mantissa_p;
-		aq->prof.pebs_exponent = exponent_p;
+		prof->pebs_mantissa = mantissa_p;
+		prof->pebs_exponent = exponent_p;
 
-		aq->prof_mask.cir_mantissa = ~(aq->prof_mask.cir_mantissa);
-		aq->prof_mask.peir_mantissa = ~(aq->prof_mask.peir_mantissa);
-		aq->prof_mask.cbs_mantissa = ~(aq->prof_mask.cbs_mantissa);
-		aq->prof_mask.pebs_mantissa = ~(aq->prof_mask.pebs_mantissa);
+		prof_mask->cir_mantissa = ~(prof_mask->cir_mantissa);
+		prof_mask->peir_mantissa = ~(prof_mask->peir_mantissa);
+		prof_mask->cbs_mantissa = ~(prof_mask->cbs_mantissa);
+		prof_mask->pebs_mantissa = ~(prof_mask->pebs_mantissa);
 
-		aq->prof_mask.cir_exponent = ~(aq->prof_mask.cir_exponent);
-		aq->prof_mask.peir_exponent = ~(aq->prof_mask.peir_exponent);
-		aq->prof_mask.cbs_exponent = ~(aq->prof_mask.cbs_exponent);
-		aq->prof_mask.pebs_exponent = ~(aq->prof_mask.pebs_exponent);
+		prof_mask->cir_exponent = ~(prof_mask->cir_exponent);
+		prof_mask->peir_exponent = ~(prof_mask->peir_exponent);
+		prof_mask->cbs_exponent = ~(prof_mask->cbs_exponent);
+		prof_mask->pebs_exponent = ~(prof_mask->pebs_exponent);
 		break;
 
 	default:
@@ -672,23 +691,23 @@ roc_nix_bpf_config(struct roc_nix *roc_nix, uint16_t id,
 		goto exit;
 	}
 
-	aq->prof.lmode = cfg->lmode;
-	aq->prof.icolor = cfg->icolor;
-	aq->prof.meter_algo = cfg->alg;
-	aq->prof.pc_mode = cfg->pc_mode;
-	aq->prof.tnl_ena = cfg->tnl_ena;
-	aq->prof.gc_action = cfg->action[ROC_NIX_BPF_COLOR_GREEN];
-	aq->prof.yc_action = cfg->action[ROC_NIX_BPF_COLOR_YELLOW];
-	aq->prof.rc_action = cfg->action[ROC_NIX_BPF_COLOR_RED];
+	prof->lmode = cfg->lmode;
+	prof->icolor = cfg->icolor;
+	prof->meter_algo = cfg->alg;
+	prof->pc_mode = cfg->pc_mode;
+	prof->tnl_ena = cfg->tnl_ena;
+	prof->gc_action = cfg->action[ROC_NIX_BPF_COLOR_GREEN];
+	prof->yc_action = cfg->action[ROC_NIX_BPF_COLOR_YELLOW];
+	prof->rc_action = cfg->action[ROC_NIX_BPF_COLOR_RED];
 
-	aq->prof_mask.lmode = ~(aq->prof_mask.lmode);
-	aq->prof_mask.icolor = ~(aq->prof_mask.icolor);
-	aq->prof_mask.meter_algo = ~(aq->prof_mask.meter_algo);
-	aq->prof_mask.pc_mode = ~(aq->prof_mask.pc_mode);
-	aq->prof_mask.tnl_ena = ~(aq->prof_mask.tnl_ena);
-	aq->prof_mask.gc_action = ~(aq->prof_mask.gc_action);
-	aq->prof_mask.yc_action = ~(aq->prof_mask.yc_action);
-	aq->prof_mask.rc_action = ~(aq->prof_mask.rc_action);
+	prof_mask->lmode = ~(prof_mask->lmode);
+	prof_mask->icolor = ~(prof_mask->icolor);
+	prof_mask->meter_algo = ~(prof_mask->meter_algo);
+	prof_mask->pc_mode = ~(prof_mask->pc_mode);
+	prof_mask->tnl_ena = ~(prof_mask->tnl_ena);
+	prof_mask->gc_action = ~(prof_mask->gc_action);
+	prof_mask->yc_action = ~(prof_mask->yc_action);
+	prof_mask->rc_action = ~(prof_mask->rc_action);
 
 	rc = mbox_process(mbox);
 exit:
@@ -703,7 +722,6 @@ roc_nix_bpf_ena_dis(struct roc_nix *roc_nix, uint16_t id, struct roc_nix_rq *rq,
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = mbox_get(dev->mbox);
-	struct nix_cn10k_aq_enq_req *aq;
 	int rc;
 
 	if (roc_model_is_cn9k()) {
@@ -716,25 +734,53 @@ roc_nix_bpf_ena_dis(struct roc_nix *roc_nix, uint16_t id, struct roc_nix_rq *rq,
 		goto exit;
 	}
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
-	}
-	aq->qidx = rq->qid;
-	aq->ctype = NIX_AQ_CTYPE_RQ;
-	aq->op = NIX_AQ_INSTOP_WRITE;
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_req *aq;
 
-	aq->rq.policer_ena = enable;
-	aq->rq_mask.policer_ena = ~(aq->rq_mask.policer_ena);
-	if (enable) {
-		aq->rq.band_prof_id = id;
-		aq->rq_mask.band_prof_id = ~(aq->rq_mask.band_prof_id);
-	}
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = rq->qid;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+
+		aq->rq.policer_ena = enable;
+		aq->rq_mask.policer_ena = ~(aq->rq_mask.policer_ena);
+		if (enable) {
+			aq->rq.band_prof_id = id;
+			aq->rq_mask.band_prof_id = ~(aq->rq_mask.band_prof_id);
+		}
+
+		rc = mbox_process(mbox);
+		if (rc)
+			goto exit;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
 
-	rc = mbox_process(mbox);
-	if (rc)
-		goto exit;
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = rq->qid;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+
+		aq->rq.policer_ena = enable;
+		aq->rq_mask.policer_ena = ~(aq->rq_mask.policer_ena);
+		if (enable) {
+			aq->rq.band_prof_id_l = id & 0x3FF;
+			aq->rq.band_prof_id_h = (id >> 10) & 0xF;
+			aq->rq_mask.band_prof_id_l = ~(aq->rq_mask.band_prof_id_l);
+			aq->rq_mask.band_prof_id_h = ~(aq->rq_mask.band_prof_id_h);
+		}
+
+		rc = mbox_process(mbox);
+		if (rc)
+			goto exit;
+	}
 
 	rq->bpf_id = id;
 
@@ -750,8 +796,7 @@ roc_nix_bpf_dump(struct roc_nix *roc_nix, uint16_t id,
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = mbox_get(dev->mbox);
-	struct nix_cn10k_aq_enq_rsp *rsp;
-	struct nix_cn10k_aq_enq_req *aq;
+	volatile struct nix_band_prof_s *prof;
 	uint8_t level_idx;
 	int rc;
 
@@ -765,19 +810,42 @@ roc_nix_bpf_dump(struct roc_nix *roc_nix, uint16_t id,
 		rc = NIX_ERR_PARAM;
 		goto exit;
 	}
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_rsp *rsp;
+		struct nix_cn10k_aq_enq_req *aq;
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_READ;
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc)
+			goto exit;
+		prof = &rsp->prof;
+	} else {
+		struct nix_cn20k_aq_enq_rsp *rsp;
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_READ;
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc)
+			goto exit;
+		prof = &rsp->prof;
 	}
-	aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
-	aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
-	aq->op = NIX_AQ_INSTOP_READ;
-	rc = mbox_process_msg(mbox, (void *)&rsp);
 	if (!rc) {
 		plt_dump("============= band prof id =%d ===============", id);
-		nix_lf_bpf_dump(&rsp->prof);
+		nix_lf_bpf_dump(prof);
 	}
 exit:
 	mbox_put(mbox);
@@ -792,7 +860,6 @@ roc_nix_bpf_pre_color_tbl_setup(struct roc_nix *roc_nix, uint16_t id,
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = dev->mbox;
-	struct nix_cn10k_aq_enq_req *aq;
 	uint8_t pc_mode, tn_ena;
 	uint8_t level_idx;
 	int rc;
@@ -856,21 +923,43 @@ roc_nix_bpf_pre_color_tbl_setup(struct roc_nix *roc_nix, uint16_t id,
 		goto exit;
 	}
 
-	/* Update corresponding bandwidth profile too */
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox_get(mbox));
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_req *aq;
+
+		/* Update corresponding bandwidth profile too */
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox_get(mbox));
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | id;
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		aq->prof.pc_mode = pc_mode;
+		aq->prof.tnl_ena = tn_ena;
+		aq->prof_mask.pc_mode = ~(aq->prof_mask.pc_mode);
+		aq->prof_mask.tnl_ena = ~(aq->prof_mask.tnl_ena);
+
+		rc = mbox_process(mbox);
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		/* Update corresponding bandwidth profile too */
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox_get(mbox));
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | id;
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		aq->prof.pc_mode = pc_mode;
+		aq->prof.tnl_ena = tn_ena;
+		aq->prof_mask.pc_mode = ~(aq->prof_mask.pc_mode);
+		aq->prof_mask.tnl_ena = ~(aq->prof_mask.tnl_ena);
+
+		rc = mbox_process(mbox);
 	}
-	aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | id;
-	aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
-	aq->op = NIX_AQ_INSTOP_WRITE;
-	aq->prof.pc_mode = pc_mode;
-	aq->prof.tnl_ena = tn_ena;
-	aq->prof_mask.pc_mode = ~(aq->prof_mask.pc_mode);
-	aq->prof_mask.tnl_ena = ~(aq->prof_mask.tnl_ena);
-
-	rc = mbox_process(mbox);
 
 exit:
 	mbox_put(mbox);
@@ -883,9 +972,9 @@ roc_nix_bpf_connect(struct roc_nix *roc_nix,
 		    uint16_t dst_id)
 {
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	volatile struct nix_band_prof_s *prof, *prof_mask;
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = mbox_get(dev->mbox);
-	struct nix_cn10k_aq_enq_req *aq;
 	uint8_t level_idx;
 	int rc;
 
@@ -900,23 +989,42 @@ roc_nix_bpf_connect(struct roc_nix *roc_nix,
 		goto exit;
 	}
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | src_id;
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		prof = &aq->prof;
+		prof_mask = &aq->prof_mask;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | src_id;
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		prof = &aq->prof;
+		prof_mask = &aq->prof_mask;
 	}
-	aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | src_id;
-	aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
-	aq->op = NIX_AQ_INSTOP_WRITE;
 
 	if (dst_id == ROC_NIX_BPF_ID_INVALID) {
-		aq->prof.hl_en = false;
-		aq->prof_mask.hl_en = ~(aq->prof_mask.hl_en);
+		prof->hl_en = false;
+		prof_mask->hl_en = ~(prof_mask->hl_en);
 	} else {
-		aq->prof.hl_en = true;
-		aq->prof.band_prof_id = dst_id;
-		aq->prof_mask.hl_en = ~(aq->prof_mask.hl_en);
-		aq->prof_mask.band_prof_id = ~(aq->prof_mask.band_prof_id);
+		prof->hl_en = true;
+		prof->band_prof_id = dst_id;
+		prof_mask->hl_en = ~(prof_mask->hl_en);
+		prof_mask->band_prof_id = ~(prof_mask->band_prof_id);
 	}
 
 	rc = mbox_process(mbox);
@@ -937,8 +1045,7 @@ roc_nix_bpf_stats_read(struct roc_nix *roc_nix, uint16_t id, uint64_t mask,
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = mbox_get(dev->mbox);
-	struct nix_cn10k_aq_enq_rsp *rsp;
-	struct nix_cn10k_aq_enq_req *aq;
+	volatile struct nix_band_prof_s *prof;
 	uint8_t level_idx;
 	int rc;
 
@@ -953,17 +1060,39 @@ roc_nix_bpf_stats_read(struct roc_nix *roc_nix, uint16_t id, uint64_t mask,
 		goto exit;
 	}
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_rsp *rsp;
+		struct nix_cn10k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_READ;
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc)
+			goto exit;
+		prof = &rsp->prof;
+	} else {
+		struct nix_cn20k_aq_enq_rsp *rsp;
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_READ;
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc)
+			goto exit;
+		prof = &rsp->prof;
 	}
-	aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
-	aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
-	aq->op = NIX_AQ_INSTOP_READ;
-	rc = mbox_process_msg(mbox, (void *)&rsp);
-	if (rc)
-		goto exit;
 
 	green_pkt_pass =
 		roc_nix_bpf_stats_to_idx(mask & ROC_NIX_BPF_GREEN_PKT_F_PASS);
@@ -991,40 +1120,40 @@ roc_nix_bpf_stats_read(struct roc_nix *roc_nix, uint16_t id, uint64_t mask,
 		roc_nix_bpf_stats_to_idx(mask & ROC_NIX_BPF_RED_OCTS_F_DROP);
 
 	if (green_pkt_pass != ROC_NIX_BPF_STATS_MAX)
-		stats[green_pkt_pass] = rsp->prof.green_pkt_pass;
+		stats[green_pkt_pass] = prof->green_pkt_pass;
 
 	if (green_octs_pass != ROC_NIX_BPF_STATS_MAX)
-		stats[green_octs_pass] = rsp->prof.green_octs_pass;
+		stats[green_octs_pass] = prof->green_octs_pass;
 
 	if (green_pkt_drop != ROC_NIX_BPF_STATS_MAX)
-		stats[green_pkt_drop] = rsp->prof.green_pkt_drop;
+		stats[green_pkt_drop] = prof->green_pkt_drop;
 
 	if (green_octs_drop != ROC_NIX_BPF_STATS_MAX)
-		stats[green_octs_drop] = rsp->prof.green_octs_pass;
+		stats[green_octs_drop] = prof->green_octs_pass;
 
 	if (yellow_pkt_pass != ROC_NIX_BPF_STATS_MAX)
-		stats[yellow_pkt_pass] = rsp->prof.yellow_pkt_pass;
+		stats[yellow_pkt_pass] = prof->yellow_pkt_pass;
 
 	if (yellow_octs_pass != ROC_NIX_BPF_STATS_MAX)
-		stats[yellow_octs_pass] = rsp->prof.yellow_octs_pass;
+		stats[yellow_octs_pass] = prof->yellow_octs_pass;
 
 	if (yellow_pkt_drop != ROC_NIX_BPF_STATS_MAX)
-		stats[yellow_pkt_drop] = rsp->prof.yellow_pkt_drop;
+		stats[yellow_pkt_drop] = prof->yellow_pkt_drop;
 
 	if (yellow_octs_drop != ROC_NIX_BPF_STATS_MAX)
-		stats[yellow_octs_drop] = rsp->prof.yellow_octs_drop;
+		stats[yellow_octs_drop] = prof->yellow_octs_drop;
 
 	if (red_pkt_pass != ROC_NIX_BPF_STATS_MAX)
-		stats[red_pkt_pass] = rsp->prof.red_pkt_pass;
+		stats[red_pkt_pass] = prof->red_pkt_pass;
 
 	if (red_octs_pass != ROC_NIX_BPF_STATS_MAX)
-		stats[red_octs_pass] = rsp->prof.red_octs_pass;
+		stats[red_octs_pass] = prof->red_octs_pass;
 
 	if (red_pkt_drop != ROC_NIX_BPF_STATS_MAX)
-		stats[red_pkt_drop] = rsp->prof.red_pkt_drop;
+		stats[red_pkt_drop] = prof->red_pkt_drop;
 
 	if (red_octs_drop != ROC_NIX_BPF_STATS_MAX)
-		stats[red_octs_drop] = rsp->prof.red_octs_drop;
+		stats[red_octs_drop] = prof->red_octs_drop;
 
 	rc = 0;
 exit:
@@ -1037,9 +1166,9 @@ roc_nix_bpf_stats_reset(struct roc_nix *roc_nix, uint16_t id, uint64_t mask,
 			enum roc_nix_bpf_level_flag lvl_flag)
 {
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	volatile struct nix_band_prof_s *prof, *prof_mask;
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = mbox_get(dev->mbox);
-	struct nix_cn10k_aq_enq_req *aq;
 	uint8_t level_idx;
 	int rc;
 
@@ -1054,68 +1183,81 @@ roc_nix_bpf_stats_reset(struct roc_nix *roc_nix, uint16_t id, uint64_t mask,
 		goto exit;
 	}
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		prof = &aq->prof;
+		prof_mask = &aq->prof_mask;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		prof = &aq->prof;
+		prof_mask = &aq->prof_mask;
 	}
-	aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
-	aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
-	aq->op = NIX_AQ_INSTOP_WRITE;
 
 	if (mask & ROC_NIX_BPF_GREEN_PKT_F_PASS) {
-		aq->prof.green_pkt_pass = 0;
-		aq->prof_mask.green_pkt_pass = ~(aq->prof_mask.green_pkt_pass);
+		prof->green_pkt_pass = 0;
+		prof_mask->green_pkt_pass = ~(prof_mask->green_pkt_pass);
 	}
 	if (mask & ROC_NIX_BPF_GREEN_OCTS_F_PASS) {
-		aq->prof.green_octs_pass = 0;
-		aq->prof_mask.green_octs_pass =
-			~(aq->prof_mask.green_octs_pass);
+		prof->green_octs_pass = 0;
+		prof_mask->green_octs_pass = ~(prof_mask->green_octs_pass);
 	}
 	if (mask & ROC_NIX_BPF_GREEN_PKT_F_DROP) {
-		aq->prof.green_pkt_drop = 0;
-		aq->prof_mask.green_pkt_drop = ~(aq->prof_mask.green_pkt_drop);
+		prof->green_pkt_drop = 0;
+		prof_mask->green_pkt_drop = ~(prof_mask->green_pkt_drop);
 	}
 	if (mask & ROC_NIX_BPF_GREEN_OCTS_F_DROP) {
-		aq->prof.green_octs_drop = 0;
-		aq->prof_mask.green_octs_drop =
-			~(aq->prof_mask.green_octs_drop);
+		prof->green_octs_drop = 0;
+		prof_mask->green_octs_drop = ~(prof_mask->green_octs_drop);
 	}
 	if (mask & ROC_NIX_BPF_YELLOW_PKT_F_PASS) {
-		aq->prof.yellow_pkt_pass = 0;
-		aq->prof_mask.yellow_pkt_pass =
-			~(aq->prof_mask.yellow_pkt_pass);
+		prof->yellow_pkt_pass = 0;
+		prof_mask->yellow_pkt_pass = ~(prof_mask->yellow_pkt_pass);
 	}
 	if (mask & ROC_NIX_BPF_YELLOW_OCTS_F_PASS) {
-		aq->prof.yellow_octs_pass = 0;
-		aq->prof_mask.yellow_octs_pass =
-			~(aq->prof_mask.yellow_octs_pass);
+		prof->yellow_octs_pass = 0;
+		prof_mask->yellow_octs_pass = ~(prof_mask->yellow_octs_pass);
 	}
 	if (mask & ROC_NIX_BPF_YELLOW_PKT_F_DROP) {
-		aq->prof.yellow_pkt_drop = 0;
-		aq->prof_mask.yellow_pkt_drop =
-			~(aq->prof_mask.yellow_pkt_drop);
+		prof->yellow_pkt_drop = 0;
+		prof_mask->yellow_pkt_drop = ~(prof_mask->yellow_pkt_drop);
 	}
 	if (mask & ROC_NIX_BPF_YELLOW_OCTS_F_DROP) {
-		aq->prof.yellow_octs_drop = 0;
-		aq->prof_mask.yellow_octs_drop =
-			~(aq->prof_mask.yellow_octs_drop);
+		prof->yellow_octs_drop = 0;
+		prof_mask->yellow_octs_drop = ~(prof_mask->yellow_octs_drop);
 	}
 	if (mask & ROC_NIX_BPF_RED_PKT_F_PASS) {
-		aq->prof.red_pkt_pass = 0;
-		aq->prof_mask.red_pkt_pass = ~(aq->prof_mask.red_pkt_pass);
+		prof->red_pkt_pass = 0;
+		prof_mask->red_pkt_pass = ~(prof_mask->red_pkt_pass);
 	}
 	if (mask & ROC_NIX_BPF_RED_OCTS_F_PASS) {
-		aq->prof.red_octs_pass = 0;
-		aq->prof_mask.red_octs_pass = ~(aq->prof_mask.red_octs_pass);
+		prof->red_octs_pass = 0;
+		prof_mask->red_octs_pass = ~(prof_mask->red_octs_pass);
 	}
 	if (mask & ROC_NIX_BPF_RED_PKT_F_DROP) {
-		aq->prof.red_pkt_drop = 0;
-		aq->prof_mask.red_pkt_drop = ~(aq->prof_mask.red_pkt_drop);
+		prof->red_pkt_drop = 0;
+		prof_mask->red_pkt_drop = ~(prof_mask->red_pkt_drop);
 	}
 	if (mask & ROC_NIX_BPF_RED_OCTS_F_DROP) {
-		aq->prof.red_octs_drop = 0;
-		aq->prof_mask.red_octs_drop = ~(aq->prof_mask.red_octs_drop);
+		prof->red_octs_drop = 0;
+		prof_mask->red_octs_drop = ~(prof_mask->red_octs_drop);
 	}
 
 	rc = mbox_process(mbox);
diff --git a/drivers/common/cnxk/roc_nix_queue.c b/drivers/common/cnxk/roc_nix_queue.c
index bb1b70424f..06029275af 100644
--- a/drivers/common/cnxk/roc_nix_queue.c
+++ b/drivers/common/cnxk/roc_nix_queue.c
@@ -384,6 +384,94 @@ nix_rq_cn9k_cman_cfg(struct dev *dev, struct roc_nix_rq *rq)
 	return rc;
 }
 
+static int
+nix_rq_cn10k_cman_cfg(struct dev *dev, struct roc_nix_rq *rq)
+{
+	struct nix_cn10k_aq_enq_req *aq;
+	struct mbox *mbox = mbox_get(dev->mbox);
+	int rc;
+
+	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	if (!aq) {
+		rc = -ENOSPC;
+		goto exit;
+	}
+
+	aq->qidx = rq->qid;
+	aq->ctype = NIX_AQ_CTYPE_RQ;
+	aq->op = NIX_AQ_INSTOP_WRITE;
+
+	if (rq->red_pass && (rq->red_pass >= rq->red_drop)) {
+		aq->rq.lpb_pool_pass = rq->red_pass;
+		aq->rq.lpb_pool_drop = rq->red_drop;
+		aq->rq_mask.lpb_pool_pass = ~(aq->rq_mask.lpb_pool_pass);
+		aq->rq_mask.lpb_pool_drop = ~(aq->rq_mask.lpb_pool_drop);
+	}
+
+	if (rq->spb_red_pass && (rq->spb_red_pass >= rq->spb_red_drop)) {
+		aq->rq.spb_pool_pass = rq->spb_red_pass;
+		aq->rq.spb_pool_drop = rq->spb_red_drop;
+		aq->rq_mask.spb_pool_pass = ~(aq->rq_mask.spb_pool_pass);
+		aq->rq_mask.spb_pool_drop = ~(aq->rq_mask.spb_pool_drop);
+	}
+
+	if (rq->xqe_red_pass && (rq->xqe_red_pass >= rq->xqe_red_drop)) {
+		aq->rq.xqe_pass = rq->xqe_red_pass;
+		aq->rq.xqe_drop = rq->xqe_red_drop;
+		aq->rq_mask.xqe_drop = ~(aq->rq_mask.xqe_drop);
+		aq->rq_mask.xqe_pass = ~(aq->rq_mask.xqe_pass);
+	}
+
+	rc = mbox_process(mbox);
+exit:
+	mbox_put(mbox);
+	return rc;
+}
+
+static int
+nix_rq_cman_cfg(struct dev *dev, struct roc_nix_rq *rq)
+{
+	struct nix_cn20k_aq_enq_req *aq;
+	struct mbox *mbox = mbox_get(dev->mbox);
+	int rc;
+
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+	if (!aq) {
+		rc = -ENOSPC;
+		goto exit;
+	}
+
+	aq->qidx = rq->qid;
+	aq->ctype = NIX_AQ_CTYPE_RQ;
+	aq->op = NIX_AQ_INSTOP_WRITE;
+
+	if (rq->red_pass && (rq->red_pass >= rq->red_drop)) {
+		aq->rq.lpb_pool_pass = rq->red_pass;
+		aq->rq.lpb_pool_drop = rq->red_drop;
+		aq->rq_mask.lpb_pool_pass = ~(aq->rq_mask.lpb_pool_pass);
+		aq->rq_mask.lpb_pool_drop = ~(aq->rq_mask.lpb_pool_drop);
+	}
+
+	if (rq->spb_red_pass && (rq->spb_red_pass >= rq->spb_red_drop)) {
+		aq->rq.spb_pool_pass = rq->spb_red_pass;
+		aq->rq.spb_pool_drop = rq->spb_red_drop;
+		aq->rq_mask.spb_pool_pass = ~(aq->rq_mask.spb_pool_pass);
+		aq->rq_mask.spb_pool_drop = ~(aq->rq_mask.spb_pool_drop);
+	}
+
+	if (rq->xqe_red_pass && (rq->xqe_red_pass >= rq->xqe_red_drop)) {
+		aq->rq.xqe_pass = rq->xqe_red_pass;
+		aq->rq.xqe_drop = rq->xqe_red_drop;
+		aq->rq_mask.xqe_drop = ~(aq->rq_mask.xqe_drop);
+		aq->rq_mask.xqe_pass = ~(aq->rq_mask.xqe_pass);
+	}
+
+	rc = mbox_process(mbox);
+exit:
+	mbox_put(mbox);
+	return rc;
+}
+
 int
 nix_rq_cn9k_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints,
 		bool cfg, bool ena)
@@ -680,52 +768,6 @@ nix_rq_cn10k_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cf
 	return 0;
 }
 
-static int
-nix_rq_cman_cfg(struct dev *dev, struct roc_nix_rq *rq)
-{
-	struct nix_cn10k_aq_enq_req *aq;
-	struct mbox *mbox = mbox_get(dev->mbox);
-	int rc;
-
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
-	if (!aq) {
-		rc = -ENOSPC;
-		goto exit;
-	}
-
-	aq->qidx = rq->qid;
-	aq->ctype = NIX_AQ_CTYPE_RQ;
-	aq->op = NIX_AQ_INSTOP_WRITE;
-
-	if (rq->red_pass && (rq->red_pass >= rq->red_drop)) {
-		aq->rq.lpb_pool_pass = rq->red_pass;
-		aq->rq.lpb_pool_drop = rq->red_drop;
-		aq->rq_mask.lpb_pool_pass = ~(aq->rq_mask.lpb_pool_pass);
-		aq->rq_mask.lpb_pool_drop = ~(aq->rq_mask.lpb_pool_drop);
-
-	}
-
-	if (rq->spb_red_pass && (rq->spb_red_pass >= rq->spb_red_drop)) {
-		aq->rq.spb_pool_pass = rq->spb_red_pass;
-		aq->rq.spb_pool_drop = rq->spb_red_drop;
-		aq->rq_mask.spb_pool_pass = ~(aq->rq_mask.spb_pool_pass);
-		aq->rq_mask.spb_pool_drop = ~(aq->rq_mask.spb_pool_drop);
-
-	}
-
-	if (rq->xqe_red_pass && (rq->xqe_red_pass >= rq->xqe_red_drop)) {
-		aq->rq.xqe_pass = rq->xqe_red_pass;
-		aq->rq.xqe_drop = rq->xqe_red_drop;
-		aq->rq_mask.xqe_drop = ~(aq->rq_mask.xqe_drop);
-		aq->rq_mask.xqe_pass = ~(aq->rq_mask.xqe_pass);
-	}
-
-	rc = mbox_process(mbox);
-exit:
-	mbox_put(mbox);
-	return rc;
-}
-
 int
 nix_rq_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cfg, bool ena)
 {
@@ -1021,6 +1063,8 @@ roc_nix_rq_cman_config(struct roc_nix *roc_nix, struct roc_nix_rq *rq)
 
 	if (is_cn9k)
 		rc = nix_rq_cn9k_cman_cfg(dev, rq);
+	else if (roc_model_is_cn10k())
+		rc = nix_rq_cn10k_cman_cfg(dev, rq);
 	else
 		rc = nix_rq_cman_cfg(dev, rq);
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 08/18] common/cnxk: support NIX debug for cn20k
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
                     ` (6 preceding siblings ...)
  2024-09-26 16:01   ` [PATCH v2 07/18] common/cnxk: support bandwidth profile " Nithin Dabilpuram
@ 2024-09-26 16:01   ` Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 09/18] common/cnxk: add RSS support " Nithin Dabilpuram
                     ` (10 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-26 16:01 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

From: Satha Rao <skoteshwar@marvell.com>

Add support to dump cn20k queue structs and also provide the same
in telemetry data.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
---
 drivers/common/cnxk/cnxk_telemetry_nix.c | 260 ++++++++++++++++++++++-
 drivers/common/cnxk/roc_nix_debug.c      | 234 +++++++++++++++++++-
 drivers/common/cnxk/roc_nix_priv.h       |   3 +-
 3 files changed, 488 insertions(+), 9 deletions(-)

diff --git a/drivers/common/cnxk/cnxk_telemetry_nix.c b/drivers/common/cnxk/cnxk_telemetry_nix.c
index ccae5d7853..abeefafe1e 100644
--- a/drivers/common/cnxk/cnxk_telemetry_nix.c
+++ b/drivers/common/cnxk/cnxk_telemetry_nix.c
@@ -346,7 +346,7 @@ nix_rq_ctx_cn9k(volatile void *qctx, struct plt_tel_data *d)
 }
 
 static void
-nix_rq_ctx(volatile void *qctx, struct plt_tel_data *d)
+nix_rq_ctx_cn10k(volatile void *qctx, struct plt_tel_data *d)
 {
 	volatile struct nix_cn10k_rq_ctx_s *ctx;
 
@@ -438,6 +438,100 @@ nix_rq_ctx(volatile void *qctx, struct plt_tel_data *d)
 	CNXK_TEL_DICT_U64(d, ctx, re_pkts, w10_);
 }
 
+static void
+nix_rq_ctx(volatile void *qctx, struct plt_tel_data *d)
+{
+	volatile struct nix_cn20k_rq_ctx_s *ctx;
+
+	ctx = (volatile struct nix_cn20k_rq_ctx_s *)qctx;
+
+	/* W0 */
+	CNXK_TEL_DICT_INT(d, ctx, wqe_aura, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, len_ol3_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, len_ol4_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, len_il3_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, len_il4_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, csum_ol4_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, csum_il4_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, lenerr_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, port_ol4_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, port_il4_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, ena_wqwd, w0);
+	CNXK_TEL_DICT_INT(d, ctx, ipsech_ena, w0);
+	CNXK_TEL_DICT_INT(d, ctx, sso_ena, w0);
+	CNXK_TEL_DICT_INT(d, ctx, ena, w0);
+
+	/* W1 */
+	CNXK_TEL_DICT_INT(d, ctx, chi_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, ipsecd_drop_en, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, pb_stashing, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_drop_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_drop_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, xqe_drop_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, wqe_caching, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, pb_caching, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, sso_tt, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, sso_grp, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_aura, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_aura, w1_);
+
+	/* W2 */
+	CNXK_TEL_DICT_INT(d, ctx, xqe_hdr_split, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, xqe_imm_copy, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, band_prof_id_h, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, xqe_imm_size, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, later_skip, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, sso_bp_ena, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, first_skip, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_sizem1, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_ena, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_high_sizem1, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, wqe_skip, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_sizem1, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, policer_ena, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, band_prof_id_l, w2_);
+
+	/* W3 */
+	CNXK_TEL_DICT_INT(d, ctx, spb_pool_pass, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_pool_drop, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_aura_pass, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_aura_drop, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, wqe_pool_pass, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, wqe_pool_drop, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, xqe_pass, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, xqe_drop, w3_);
+
+	/* W4 */
+	CNXK_TEL_DICT_INT(d, ctx, qint_idx, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, rq_int_ena, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, rq_int, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_pool_pass, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_pool_drop, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_aura_pass, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_aura_drop, w4_);
+
+	/* W5 */
+	CNXK_TEL_DICT_INT(d, ctx, flow_tagw, w5_);
+	CNXK_TEL_DICT_INT(d, ctx, bad_utag, w5_);
+	CNXK_TEL_DICT_INT(d, ctx, good_utag, w5_);
+	CNXK_TEL_DICT_INT(d, ctx, ltag, w5_);
+
+	/* W6 */
+	CNXK_TEL_DICT_U64(d, ctx, octs, w6_);
+
+	/* W7 */
+	CNXK_TEL_DICT_U64(d, ctx, pkts, w7_);
+
+	/* W8 */
+	CNXK_TEL_DICT_U64(d, ctx, drop_octs, w8_);
+
+	/* W9 */
+	CNXK_TEL_DICT_U64(d, ctx, drop_pkts, w9_);
+
+	/* W10 */
+	CNXK_TEL_DICT_U64(d, ctx, re_pkts, w10_);
+}
+
 static int
 cnxk_tel_nix_rq_ctx(struct roc_nix *roc_nix, uint8_t n, struct plt_tel_data *d)
 {
@@ -459,12 +553,77 @@ cnxk_tel_nix_rq_ctx(struct roc_nix *roc_nix, uint8_t n, struct plt_tel_data *d)
 
 	if (roc_model_is_cn9k())
 		nix_rq_ctx_cn9k(qctx, d);
+	else if (roc_model_is_cn10k())
+		nix_rq_ctx_cn10k(qctx, d);
 	else
 		nix_rq_ctx(qctx, d);
 
 	return 0;
 }
 
+static int
+cnxk_tel_nix_cq_ctx_cn20k(struct roc_nix *roc_nix, uint8_t n, struct plt_tel_data *d)
+{
+	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	struct dev *dev = &nix->dev;
+	struct npa_lf *npa_lf;
+	volatile struct nix_cn20k_cq_ctx_s *ctx;
+	int rc = -1;
+
+	npa_lf = idev_npa_obj_get();
+	if (npa_lf == NULL)
+		return NPA_ERR_DEVICE_NOT_BOUNDED;
+
+	rc = nix_q_ctx_get(dev, NIX_AQ_CTYPE_CQ, n, (void *)&ctx);
+	if (rc) {
+		plt_err("Failed to get cq context");
+		return rc;
+	}
+
+	/* W0 */
+	CNXK_TEL_DICT_PTR(d, ctx, base, w0_);
+
+	/* W1 */
+	CNXK_TEL_DICT_U64(d, ctx, wrptr, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, avg_con, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, cint_idx, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, cq_err, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, qint_idx, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, lbpid_high, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, bpid, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, lbpid_med, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, bp_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, lbpid_low, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, lbp_ena, w1_);
+
+	/* W2 */
+	CNXK_TEL_DICT_INT(d, ctx, update_time, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, avg_level, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, head, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, tail, w2_);
+
+	/* W3 */
+	CNXK_TEL_DICT_INT(d, ctx, cq_err_int_ena, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, cq_err_int, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, qsize, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, stashing, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, caching, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, lbp_frac, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, stash_thresh, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, msh_valid, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, msh_dst, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, cpt_drop_err_en, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, ena, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, drop_ena, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, drop, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, bp, w3_);
+
+	CNXK_TEL_DICT_INT(d, ctx, lbpid_ext, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, bpid_ext, w4_);
+
+	return 0;
+}
+
 static int
 cnxk_tel_nix_cq_ctx(struct roc_nix *roc_nix, uint8_t n, struct plt_tel_data *d)
 {
@@ -474,6 +633,9 @@ cnxk_tel_nix_cq_ctx(struct roc_nix *roc_nix, uint8_t n, struct plt_tel_data *d)
 	volatile struct nix_cq_ctx_s *ctx;
 	int rc = -1;
 
+	if (roc_model_is_cn20k())
+		return cnxk_tel_nix_cq_ctx_cn20k(roc_nix, n, d);
+
 	npa_lf = idev_npa_obj_get();
 	if (npa_lf == NULL)
 		return NPA_ERR_DEVICE_NOT_BOUNDED;
@@ -602,7 +764,7 @@ nix_sq_ctx_cn9k(volatile void *qctx, struct plt_tel_data *d)
 }
 
 static void
-nix_sq_ctx(volatile void *qctx, struct plt_tel_data *d)
+nix_sq_ctx_cn10k(volatile void *qctx, struct plt_tel_data *d)
 {
 	volatile struct nix_cn10k_sq_ctx_s *ctx;
 
@@ -617,6 +779,97 @@ nix_sq_ctx(volatile void *qctx, struct plt_tel_data *d)
 	CNXK_TEL_DICT_INT(d, ctx, ena, w0_);
 
 	/* W1 */
+	CNXK_TEL_DICT_INT(d, ctx, smq_rr_count_lb, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, sqb_count, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, default_chan, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, smq_rr_weight, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, sso_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, xoff, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, cq_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, smq, w1_);
+
+	/* W2 */
+	CNXK_TEL_DICT_INT(d, ctx, sqe_stype, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, sq_int_ena, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, sq_int, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, sqb_aura, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, smq_rr_count_ub, w2_);
+
+	/* W3 */
+	CNXK_TEL_DICT_INT(d, ctx, smq_next_sq_vld, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, smq_pend, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, smenq_next_sqb_vld, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, head_offset, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, smenq_offset, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, tail_offset, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, smq_lso_segnum, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, smq_next_sq, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, mnq_dis, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, lmt_dis, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, cq_limit, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, max_sqe_size, w3_);
+
+	/* W4 */
+	CNXK_TEL_DICT_PTR(d, ctx, next_sqb, w4_);
+
+	/* W5 */
+	CNXK_TEL_DICT_PTR(d, ctx, tail_sqb, w5_);
+
+	/* W6 */
+	CNXK_TEL_DICT_PTR(d, ctx, smenq_sqb, w6_);
+
+	/* W7 */
+	CNXK_TEL_DICT_PTR(d, ctx, smenq_next_sqb, w7_);
+
+	/* W8 */
+	CNXK_TEL_DICT_PTR(d, ctx, head_sqb, w8_);
+
+	/* W9 */
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_vld, w9_);
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_vlan1_ins_ena, w9_);
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_vlan0_ins_ena, w9_);
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_mps, w9_);
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_sb, w9_);
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_sizem1, w9_);
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_total, w9_);
+
+	/* W10 */
+	CNXK_TEL_DICT_BF_PTR(d, ctx, scm_lso_rem, w10_);
+
+	/* W11 */
+	CNXK_TEL_DICT_BF_PTR(d, ctx, octs, w11_);
+
+	/* W12 */
+	CNXK_TEL_DICT_BF_PTR(d, ctx, pkts, w12_);
+
+	/* W13 */
+	CNXK_TEL_DICT_INT(d, ctx, aged_drop_octs, w13_);
+	CNXK_TEL_DICT_INT(d, ctx, aged_drop_pkts, w13_);
+
+	/* W14 */
+	CNXK_TEL_DICT_BF_PTR(d, ctx, drop_octs, w14_);
+
+	/* W15 */
+	CNXK_TEL_DICT_BF_PTR(d, ctx, drop_pkts, w15_);
+}
+
+static void
+nix_sq_ctx(volatile void *qctx, struct plt_tel_data *d)
+{
+	volatile struct nix_cn20k_sq_ctx_s *ctx;
+
+	ctx = (volatile struct nix_cn20k_sq_ctx_s *)qctx;
+
+	/* W0 */
+	CNXK_TEL_DICT_INT(d, ctx, sqe_way_mask, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, cq, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, sdp_mcast, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, substream, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, qint_idx, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, ena, w0_);
+
+	/* W1 */
+	CNXK_TEL_DICT_INT(d, ctx, smq_rr_count_lb, w1_);
 	CNXK_TEL_DICT_INT(d, ctx, sqb_count, w1_);
 	CNXK_TEL_DICT_INT(d, ctx, default_chan, w1_);
 	CNXK_TEL_DICT_INT(d, ctx, smq_rr_weight, w1_);
@@ -631,7 +884,6 @@ nix_sq_ctx(volatile void *qctx, struct plt_tel_data *d)
 	CNXK_TEL_DICT_INT(d, ctx, sq_int, w2_);
 	CNXK_TEL_DICT_INT(d, ctx, sqb_aura, w2_);
 	CNXK_TEL_DICT_INT(d, ctx, smq_rr_count_ub, w2_);
-	CNXK_TEL_DICT_INT(d, ctx, smq_rr_count_lb, w2_);
 
 	/* W3 */
 	CNXK_TEL_DICT_INT(d, ctx, smq_next_sq_vld, w3_);
@@ -712,6 +964,8 @@ cnxk_tel_nix_sq_ctx(struct roc_nix *roc_nix, uint8_t n, struct plt_tel_data *d)
 
 	if (roc_model_is_cn9k())
 		nix_sq_ctx_cn9k(qctx, d);
+	else if (roc_model_is_cn10k())
+		nix_sq_ctx_cn10k(qctx, d);
 	else
 		nix_sq_ctx(qctx, d);
 
diff --git a/drivers/common/cnxk/roc_nix_debug.c b/drivers/common/cnxk/roc_nix_debug.c
index 2e91470c09..0cc8d7cc1e 100644
--- a/drivers/common/cnxk/roc_nix_debug.c
+++ b/drivers/common/cnxk/roc_nix_debug.c
@@ -358,7 +358,7 @@ nix_q_ctx_get(struct dev *dev, uint8_t ctype, uint16_t qid, __io void **ctx_p)
 			*ctx_p = &rsp->sq;
 		else
 			*ctx_p = &rsp->cq;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_rsp *rsp;
 		struct nix_cn10k_aq_enq_req *aq;
 
@@ -372,6 +372,30 @@ nix_q_ctx_get(struct dev *dev, uint8_t ctype, uint16_t qid, __io void **ctx_p)
 		aq->ctype = ctype;
 		aq->op = NIX_AQ_INSTOP_READ;
 
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc)
+			goto exit;
+
+		if (ctype == NIX_AQ_CTYPE_RQ)
+			*ctx_p = &rsp->rq;
+		else if (ctype == NIX_AQ_CTYPE_SQ)
+			*ctx_p = &rsp->sq;
+		else
+			*ctx_p = &rsp->cq;
+	} else {
+		struct nix_cn20k_aq_enq_rsp *rsp;
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = qid;
+		aq->ctype = ctype;
+		aq->op = NIX_AQ_INSTOP_READ;
+
 		rc = mbox_process_msg(mbox, (void *)&rsp);
 		if (rc)
 			goto exit;
@@ -452,7 +476,69 @@ nix_cn9k_lf_sq_dump(__io struct nix_sq_ctx_s *ctx, uint32_t *sqb_aura_p, FILE *f
 }
 
 static inline void
-nix_lf_sq_dump(__io struct nix_cn10k_sq_ctx_s *ctx, uint32_t *sqb_aura_p, FILE *file)
+nix_cn10k_lf_sq_dump(__io struct nix_cn10k_sq_ctx_s *ctx, uint32_t *sqb_aura_p, FILE *file)
+{
+	nix_dump(file, "W0: sqe_way_mask \t\t%d\nW0: cq \t\t\t\t%d",
+		 ctx->sqe_way_mask, ctx->cq);
+	nix_dump(file, "W0: sdp_mcast \t\t\t%d\nW0: substream \t\t\t0x%03x",
+		 ctx->sdp_mcast, ctx->substream);
+	nix_dump(file, "W0: qint_idx \t\t\t%d\nW0: ena \t\t\t%d\n", ctx->qint_idx,
+		 ctx->ena);
+
+	nix_dump(file, "W1: sqb_count \t\t\t%d\nW1: default_chan \t\t%d",
+		 ctx->sqb_count, ctx->default_chan);
+	nix_dump(file, "W1: smq_rr_weight \t\t%d\nW1: sso_ena \t\t\t%d",
+		 ctx->smq_rr_weight, ctx->sso_ena);
+	nix_dump(file, "W1: xoff \t\t\t%d\nW1: cq_ena \t\t\t%d\nW1: smq\t\t\t\t%d\n",
+		 ctx->xoff, ctx->cq_ena, ctx->smq);
+
+	nix_dump(file, "W2: sqe_stype \t\t\t%d\nW2: sq_int_ena \t\t\t%d",
+		 ctx->sqe_stype, ctx->sq_int_ena);
+	nix_dump(file, "W2: sq_int  \t\t\t%d\nW2: sqb_aura \t\t\t%d", ctx->sq_int,
+		 ctx->sqb_aura);
+	nix_dump(file, "W2: smq_rr_count[ub:lb] \t\t%x:%x\n", ctx->smq_rr_count_ub,
+		 ctx->smq_rr_count_lb);
+
+	nix_dump(file, "W3: smq_next_sq_vld\t\t%d\nW3: smq_pend\t\t\t%d",
+		 ctx->smq_next_sq_vld, ctx->smq_pend);
+	nix_dump(file, "W3: smenq_next_sqb_vld  \t%d\nW3: head_offset\t\t\t%d",
+		 ctx->smenq_next_sqb_vld, ctx->head_offset);
+	nix_dump(file, "W3: smenq_offset\t\t%d\nW3: tail_offset \t\t%d",
+		 ctx->smenq_offset, ctx->tail_offset);
+	nix_dump(file, "W3: smq_lso_segnum \t\t%d\nW3: smq_next_sq \t\t%d",
+		 ctx->smq_lso_segnum, ctx->smq_next_sq);
+	nix_dump(file, "W3: mnq_dis \t\t\t%d\nW3: lmt_dis \t\t\t%d", ctx->mnq_dis,
+		 ctx->lmt_dis);
+	nix_dump(file, "W3: cq_limit\t\t\t%d\nW3: max_sqe_size\t\t%d\n",
+		 ctx->cq_limit, ctx->max_sqe_size);
+
+	nix_dump(file, "W4: next_sqb \t\t\t0x%" PRIx64 "", ctx->next_sqb);
+	nix_dump(file, "W5: tail_sqb \t\t\t0x%" PRIx64 "", ctx->tail_sqb);
+	nix_dump(file, "W6: smenq_sqb \t\t\t0x%" PRIx64 "", ctx->smenq_sqb);
+	nix_dump(file, "W7: smenq_next_sqb \t\t0x%" PRIx64 "", ctx->smenq_next_sqb);
+	nix_dump(file, "W8: head_sqb \t\t\t0x%" PRIx64 "", ctx->head_sqb);
+
+	nix_dump(file, "W9: vfi_lso_vld \t\t%d\nW9: vfi_lso_vlan1_ins_ena\t%d", ctx->vfi_lso_vld,
+		 ctx->vfi_lso_vlan1_ins_ena);
+	nix_dump(file, "W9: vfi_lso_vlan0_ins_ena\t%d\nW9: vfi_lso_mps\t\t\t%d",
+		 ctx->vfi_lso_vlan0_ins_ena, ctx->vfi_lso_mps);
+	nix_dump(file, "W9: vfi_lso_sb \t\t\t%d\nW9: vfi_lso_sizem1\t\t%d", ctx->vfi_lso_sb,
+		 ctx->vfi_lso_sizem1);
+	nix_dump(file, "W9: vfi_lso_total\t\t%d", ctx->vfi_lso_total);
+
+	nix_dump(file, "W10: scm_lso_rem \t\t0x%" PRIx64 "", (uint64_t)ctx->scm_lso_rem);
+	nix_dump(file, "W11: octs \t\t\t0x%" PRIx64 "", (uint64_t)ctx->octs);
+	nix_dump(file, "W12: pkts \t\t\t0x%" PRIx64 "", (uint64_t)ctx->pkts);
+	nix_dump(file, "W13: aged_drop_pkts \t\t\t0x%" PRIx64 "", (uint64_t)ctx->aged_drop_pkts);
+	nix_dump(file, "W13: aged_drop_octs \t\t\t0x%" PRIx64 "", (uint64_t)ctx->aged_drop_octs);
+	nix_dump(file, "W14: dropped_octs \t\t0x%" PRIx64 "", (uint64_t)ctx->drop_octs);
+	nix_dump(file, "W15: dropped_pkts \t\t0x%" PRIx64 "", (uint64_t)ctx->drop_pkts);
+
+	*sqb_aura_p = ctx->sqb_aura;
+}
+
+static inline void
+nix_lf_sq_dump(__io struct nix_cn20k_sq_ctx_s *ctx, uint32_t *sqb_aura_p, FILE *file)
 {
 	nix_dump(file, "W0: sqe_way_mask \t\t%d\nW0: cq \t\t\t\t%d",
 		 ctx->sqe_way_mask, ctx->cq);
@@ -574,7 +660,7 @@ nix_cn9k_lf_rq_dump(__io struct nix_rq_ctx_s *ctx, FILE *file)
 }
 
 void
-nix_lf_rq_dump(__io struct nix_cn10k_rq_ctx_s *ctx, FILE *file)
+nix_cn10k_lf_rq_dump(__io struct nix_cn10k_rq_ctx_s *ctx, FILE *file)
 {
 	nix_dump(file, "W0: wqe_aura \t\t\t%d\nW0: len_ol3_dis \t\t\t%d",
 		 ctx->wqe_aura, ctx->len_ol3_dis);
@@ -649,6 +735,124 @@ nix_lf_rq_dump(__io struct nix_cn10k_rq_ctx_s *ctx, FILE *file)
 	nix_dump(file, "W10: re_pkts \t\t\t0x%" PRIx64 "\n", (uint64_t)ctx->re_pkts);
 }
 
+void
+nix_lf_rq_dump(__io struct nix_cn20k_rq_ctx_s *ctx, FILE *file)
+{
+	nix_dump(file, "W0: wqe_aura \t\t\t%d\nW0: len_ol3_dis \t\t\t%d",
+		 ctx->wqe_aura, ctx->len_ol3_dis);
+	nix_dump(file, "W0: len_ol4_dis \t\t\t%d\nW0: len_il3_dis \t\t\t%d",
+		 ctx->len_ol4_dis, ctx->len_il3_dis);
+	nix_dump(file, "W0: len_il4_dis \t\t\t%d\nW0: csum_ol4_dis \t\t\t%d",
+		 ctx->len_il4_dis, ctx->csum_ol4_dis);
+	nix_dump(file, "W0: csum_il4_dis \t\t\t%d\nW0: lenerr_dis \t\t\t%d",
+		 ctx->csum_il4_dis, ctx->lenerr_dis);
+	nix_dump(file, "W0: port_ol4_dis \t\t\t%d\nW0: port_il4_dis\t\t\t%d",
+		 ctx->port_ol4_dis, ctx->port_il4_dis);
+	nix_dump(file, "W0: cq \t\t\t\t%d\nW0: ena_wqwd \t\t\t%d", ctx->cq,
+		 ctx->ena_wqwd);
+	nix_dump(file, "W0: ipsech_ena \t\t\t%d\nW0: sso_ena \t\t\t%d",
+		 ctx->ipsech_ena, ctx->sso_ena);
+	nix_dump(file, "W0: ena \t\t\t%d\n", ctx->ena);
+
+	nix_dump(file, "W1: chi_ena \t\t%d\nW1: ipsecd_drop_en \t\t%d", ctx->chi_ena,
+		 ctx->ipsecd_drop_en);
+	nix_dump(file, "W1: pb_stashing \t\t\t%d", ctx->pb_stashing);
+	nix_dump(file, "W1: lpb_drop_ena \t\t%d\nW1: spb_drop_ena \t\t%d",
+		 ctx->lpb_drop_ena, ctx->spb_drop_ena);
+	nix_dump(file, "W1: xqe_drop_ena \t\t%d\nW1: wqe_caching \t\t%d",
+		 ctx->xqe_drop_ena, ctx->wqe_caching);
+	nix_dump(file, "W1: pb_caching \t\t\t%d\nW1: sso_tt \t\t\t%d",
+		 ctx->pb_caching, ctx->sso_tt);
+	nix_dump(file, "W1: sso_grp \t\t\t%d\nW1: lpb_aura \t\t\t%d", ctx->sso_grp,
+		 ctx->lpb_aura);
+	nix_dump(file, "W1: spb_aura \t\t\t%d\n", ctx->spb_aura);
+
+	nix_dump(file, "W2: xqe_hdr_split \t\t%d\nW2: xqe_imm_copy \t\t%d",
+		 ctx->xqe_hdr_split, ctx->xqe_imm_copy);
+	nix_dump(file, "W2: band_prof_id\t\t%d\n",
+		 ((ctx->band_prof_id_h << 10) | ctx->band_prof_id_l));
+	nix_dump(file, "W2: xqe_imm_size \t\t%d\nW2: later_skip \t\t\t%d",
+		 ctx->xqe_imm_size, ctx->later_skip);
+	nix_dump(file, "W2: sso_bp_ena\t\t%d\n", ctx->sso_bp_ena);
+	nix_dump(file, "W2: first_skip \t\t\t%d\nW2: lpb_sizem1 \t\t\t%d",
+		 ctx->first_skip, ctx->lpb_sizem1);
+	nix_dump(file, "W2: spb_ena \t\t\t%d\nW2: spb_high_sizem1 \t\t\t%d", ctx->spb_ena,
+		 ctx->spb_high_sizem1);
+	nix_dump(file, "W2: wqe_skip \t\t\t%d", ctx->wqe_skip);
+	nix_dump(file, "W2: spb_sizem1 \t\t\t%d\nW2: policer_ena \t\t\t%d",
+		 ctx->spb_sizem1, ctx->policer_ena);
+	nix_dump(file, "W2: sso_fc_ena \t\t\t%d\n", ctx->sso_fc_ena);
+
+	nix_dump(file, "W3: spb_pool_pass \t\t%d\nW3: spb_pool_drop \t\t%d",
+		 ctx->spb_pool_pass, ctx->spb_pool_drop);
+	nix_dump(file, "W3: spb_aura_pass \t\t%d\nW3: spb_aura_drop \t\t%d",
+		 ctx->spb_aura_pass, ctx->spb_aura_drop);
+	nix_dump(file, "W3: wqe_pool_pass \t\t%d\nW3: wqe_pool_drop \t\t%d",
+		 ctx->wqe_pool_pass, ctx->wqe_pool_drop);
+	nix_dump(file, "W3: xqe_pass \t\t\t%d\nW3: xqe_drop \t\t\t%d\n",
+		 ctx->xqe_pass, ctx->xqe_drop);
+
+	nix_dump(file, "W4: qint_idx \t\t\t%d\nW4: rq_int_ena \t\t\t%d",
+		 ctx->qint_idx, ctx->rq_int_ena);
+	nix_dump(file, "W4: rq_int \t\t\t%d\nW4: lpb_pool_pass \t\t%d", ctx->rq_int,
+		 ctx->lpb_pool_pass);
+	nix_dump(file, "W4: lpb_pool_drop \t\t%d\nW4: lpb_aura_pass \t\t%d",
+		 ctx->lpb_pool_drop, ctx->lpb_aura_pass);
+	nix_dump(file, "W4: lpb_aura_drop \t\t%d\n", ctx->lpb_aura_drop);
+
+	nix_dump(file, "W5: flow_tagw \t\t\t%d\nW5: bad_utag \t\t\t%d",
+		 ctx->flow_tagw, ctx->bad_utag);
+	nix_dump(file, "W5: good_utag \t\t\t%d\nW5: ltag \t\t\t%d\n", ctx->good_utag,
+		 ctx->ltag);
+
+	nix_dump(file, "W6: octs \t\t\t0x%" PRIx64 "", (uint64_t)ctx->octs);
+	nix_dump(file, "W7: pkts \t\t\t0x%" PRIx64 "", (uint64_t)ctx->pkts);
+	nix_dump(file, "W8: drop_octs \t\t\t0x%" PRIx64 "", (uint64_t)ctx->drop_octs);
+	nix_dump(file, "W9: drop_pkts \t\t\t0x%" PRIx64 "", (uint64_t)ctx->drop_pkts);
+	nix_dump(file, "W10: re_pkts \t\t\t0x%" PRIx64 "\n", (uint64_t)ctx->re_pkts);
+}
+
+static inline void
+nix_cn20k_lf_cq_dump(__io struct nix_cn20k_cq_ctx_s *ctx, FILE *file)
+{
+	nix_dump(file, "W0: base \t\t\t0x%" PRIx64 "\n", ctx->base);
+
+	nix_dump(file, "W1: wrptr \t\t\t%" PRIx64 "", (uint64_t)ctx->wrptr);
+	nix_dump(file, "W1: avg_con \t\t\t%d\nW1: cint_idx \t\t\t%d", ctx->avg_con,
+		 ctx->cint_idx);
+	nix_dump(file, "W1: cq_err \t\t\t%d\nW1: qint_idx \t\t\t%d", ctx->cq_err,
+		 ctx->qint_idx);
+	nix_dump(file, "W1: bpid  \t\t\t%d\nW1: bp_ena \t\t\t%d\n", ctx->bpid,
+		 ctx->bp_ena);
+	nix_dump(file,
+		 "W1: lbpid_high \t\t\t0x%03x\nW1: lbpid_med \t\t\t0x%03x\n"
+		 "W1: lbpid_low \t\t\t0x%03x\n(W1: lbpid) \t\t\t0x%03x\n",
+		 ctx->lbpid_high, ctx->lbpid_med, ctx->lbpid_low, (unsigned int)
+		 (ctx->lbpid_high << 6 | ctx->lbpid_med << 3 | ctx->lbpid_low));
+	nix_dump(file, "W1: lbp_ena \t\t\t\t%d\n", ctx->lbp_ena);
+
+	nix_dump(file, "W2: update_time \t\t%d\nW2: avg_level \t\t\t%d",
+		 ctx->update_time, ctx->avg_level);
+	nix_dump(file, "W2: head \t\t\t%d\nW2: tail \t\t\t%d\n", ctx->head,
+		 ctx->tail);
+
+	nix_dump(file, "W3: cq_err_int_ena \t\t%d\nW3: cq_err_int \t\t\t%d",
+		 ctx->cq_err_int_ena, ctx->cq_err_int);
+	nix_dump(file, "W3: qsize \t\t\t%d\nW3: stashing \t\t\t%d", ctx->qsize,
+		 ctx->stashing);
+	nix_dump(file, "W3: caching \t\t\t%d\nW3: lbp_frac \t\t\t%d", ctx->caching, ctx->lbp_frac);
+	nix_dump(file, "W3: stash_thresh \t\t\t%d\nW3: msh_valid\t\t\t%d", ctx->stash_thresh,
+		 ctx->msh_valid);
+	nix_dump(file, "W3: msh_dst \t\t\t0x%03x\nW3: cpt_drop_err_en \t\t\t%d\n",
+		 ctx->msh_dst, ctx->cpt_drop_err_en);
+	nix_dump(file, "W3: ena \t\t\t%d\n", ctx->ena);
+	nix_dump(file, "W3: drop_ena \t\t\t%d\nW3: drop \t\t\t%d", ctx->drop_ena,
+		 ctx->drop);
+	nix_dump(file, "W3: bp \t\t\t\t%d\n", ctx->bp);
+	nix_dump(file, "W4: lbpid_ext \t\t\t%d\nW3: bpid_ext \t\t\t%d", ctx->lbpid_ext,
+		 ctx->bpid_ext);
+}
+
 static inline void
 nix_lf_cq_dump(__io struct nix_cq_ctx_s *ctx, FILE *file)
 {
@@ -713,7 +917,10 @@ roc_nix_queues_ctx_dump(struct roc_nix *roc_nix, FILE *file)
 		}
 		nix_dump(file, "============== port=%d cq=%d ===============",
 			 roc_nix->port_id, q);
-		nix_lf_cq_dump(ctx, file);
+		if (roc_model_is_cn20k())
+			nix_cn20k_lf_cq_dump(ctx, file);
+		else
+			nix_lf_cq_dump(ctx, file);
 	}
 
 	for (q = 0; q < rq; q++) {
@@ -726,6 +933,8 @@ roc_nix_queues_ctx_dump(struct roc_nix *roc_nix, FILE *file)
 			 roc_nix->port_id, q);
 		if (roc_model_is_cn9k())
 			nix_cn9k_lf_rq_dump(ctx, file);
+		else if (roc_model_is_cn10k())
+			nix_cn10k_lf_rq_dump(ctx, file);
 		else
 			nix_lf_rq_dump(ctx, file);
 	}
@@ -751,6 +960,8 @@ roc_nix_queues_ctx_dump(struct roc_nix *roc_nix, FILE *file)
 			 inl_rq->qid);
 		if (roc_model_is_cn9k())
 			nix_cn9k_lf_rq_dump(ctx, file);
+		else if (roc_model_is_cn10k())
+			nix_cn10k_lf_rq_dump(ctx, file);
 		else
 			nix_lf_rq_dump(ctx, file);
 	}
@@ -765,6 +976,8 @@ roc_nix_queues_ctx_dump(struct roc_nix *roc_nix, FILE *file)
 			 roc_nix->port_id, q);
 		if (roc_model_is_cn9k())
 			nix_cn9k_lf_sq_dump(ctx, &sqb_aura, file);
+		else if (roc_model_is_cn10k())
+			nix_cn10k_lf_sq_dump(ctx, &sqb_aura, file);
 		else
 			nix_lf_sq_dump(ctx, &sqb_aura, file);
 
@@ -1480,9 +1693,20 @@ roc_nix_sq_desc_dump(struct roc_nix *roc_nix, uint16_t q, uint16_t offset, uint1
 		tail_sqb = (void *)ctx->tail_sqb;
 		head_off = ctx->head_offset;
 		tail_off = ctx->tail_offset;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		volatile struct nix_cn10k_sq_ctx_s *ctx = (struct nix_cn10k_sq_ctx_s *)dat;
 
+		if (ctx->mnq_dis || ctx->lmt_dis)
+			full = 1;
+
+		count = ctx->sqb_count;
+		sqb_buf = (void *)ctx->head_sqb;
+		tail_sqb = (void *)ctx->tail_sqb;
+		head_off = ctx->head_offset;
+		tail_off = ctx->tail_offset;
+	} else {
+		volatile struct nix_cn20k_sq_ctx_s *ctx = (struct nix_cn20k_sq_ctx_s *)dat;
+
 		if (ctx->mnq_dis || ctx->lmt_dis)
 			full = 1;
 
diff --git a/drivers/common/cnxk/roc_nix_priv.h b/drivers/common/cnxk/roc_nix_priv.h
index ade42c1878..3fd6fcbe9f 100644
--- a/drivers/common/cnxk/roc_nix_priv.h
+++ b/drivers/common/cnxk/roc_nix_priv.h
@@ -469,7 +469,8 @@ struct nix_tm_shaper_profile *nix_tm_shaper_profile_alloc(void);
 void nix_tm_shaper_profile_free(struct nix_tm_shaper_profile *profile);
 
 uint64_t nix_get_blkaddr(struct dev *dev);
-void nix_lf_rq_dump(__io struct nix_cn10k_rq_ctx_s *ctx, FILE *file);
+void nix_cn10k_lf_rq_dump(__io struct nix_cn10k_rq_ctx_s *ctx, FILE *file);
+void nix_lf_rq_dump(__io struct nix_cn20k_rq_ctx_s *ctx, FILE *file);
 int nix_lf_gen_reg_dump(uintptr_t nix_lf_base, uint64_t *data);
 int nix_lf_stat_reg_dump(uintptr_t nix_lf_base, uint64_t *data, uint8_t lf_tx_stats,
 			 uint8_t lf_rx_stats);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 09/18] common/cnxk: add RSS support for cn20k
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
                     ` (7 preceding siblings ...)
  2024-09-26 16:01   ` [PATCH v2 08/18] common/cnxk: support NIX debug " Nithin Dabilpuram
@ 2024-09-26 16:01   ` Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 10/18] net/cnxk: add cn20k base control path support Nithin Dabilpuram
                     ` (9 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-26 16:01 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

From: Satha Rao <skoteshwar@marvell.com>

Add RSS configuration support for cn20k.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
---
 drivers/common/cnxk/roc_nix_rss.c | 74 +++++++++++++++++++++++++++++--
 1 file changed, 70 insertions(+), 4 deletions(-)

diff --git a/drivers/common/cnxk/roc_nix_rss.c b/drivers/common/cnxk/roc_nix_rss.c
index 2b88e1360d..fd1472e9b9 100644
--- a/drivers/common/cnxk/roc_nix_rss.c
+++ b/drivers/common/cnxk/roc_nix_rss.c
@@ -70,7 +70,7 @@ nix_cn9k_rss_reta_set(struct nix *nix, uint8_t group,
 				goto exit;
 			req = mbox_alloc_msg_nix_aq_enq(mbox);
 			if (!req) {
-				rc =  NIX_ERR_NO_MEM;
+				rc = NIX_ERR_NO_MEM;
 				goto exit;
 			}
 		}
@@ -93,7 +93,7 @@ nix_cn9k_rss_reta_set(struct nix *nix, uint8_t group,
 				goto exit;
 			req = mbox_alloc_msg_nix_aq_enq(mbox);
 			if (!req) {
-				rc =  NIX_ERR_NO_MEM;
+				rc = NIX_ERR_NO_MEM;
 				goto exit;
 			}
 		}
@@ -115,8 +115,8 @@ nix_cn9k_rss_reta_set(struct nix *nix, uint8_t group,
 }
 
 static int
-nix_rss_reta_set(struct nix *nix, uint8_t group,
-		 uint16_t reta[ROC_NIX_RSS_RETA_MAX], uint8_t lock_rx_ctx)
+nix_cn10k_rss_reta_set(struct nix *nix, uint8_t group, uint16_t reta[ROC_NIX_RSS_RETA_MAX],
+		       uint8_t lock_rx_ctx)
 {
 	struct mbox *mbox = mbox_get((&nix->dev)->mbox);
 	struct nix_cn10k_aq_enq_req *req;
@@ -178,6 +178,70 @@ nix_rss_reta_set(struct nix *nix, uint8_t group,
 	return rc;
 }
 
+static int
+nix_rss_reta_set(struct nix *nix, uint8_t group, uint16_t reta[ROC_NIX_RSS_RETA_MAX],
+		 uint8_t lock_rx_ctx)
+{
+	struct mbox *mbox = mbox_get((&nix->dev)->mbox);
+	struct nix_cn20k_aq_enq_req *req;
+	uint16_t idx;
+	int rc;
+
+	for (idx = 0; idx < nix->reta_sz; idx++) {
+		req = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!req) {
+			/* The shared memory buffer can be full.
+			 * Flush it and retry
+			 */
+			rc = mbox_process(mbox);
+			if (rc < 0)
+				goto exit;
+			req = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+			if (!req) {
+				rc =  NIX_ERR_NO_MEM;
+				goto exit;
+			}
+		}
+		req->rss.rq = reta[idx];
+		/* Fill AQ info */
+		req->qidx = (group * nix->reta_sz) + idx;
+		req->ctype = NIX_AQ_CTYPE_RSS;
+		req->op = NIX_AQ_INSTOP_INIT;
+
+		if (!lock_rx_ctx)
+			continue;
+
+		req = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!req) {
+			/* The shared memory buffer can be full.
+			 * Flush it and retry
+			 */
+			rc = mbox_process(mbox);
+			if (rc < 0)
+				goto exit;
+			req = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+			if (!req) {
+				rc =  NIX_ERR_NO_MEM;
+				goto exit;
+			}
+		}
+		req->rss.rq = reta[idx];
+		/* Fill AQ info */
+		req->qidx = (group * nix->reta_sz) + idx;
+		req->ctype = NIX_AQ_CTYPE_RSS;
+		req->op = NIX_AQ_INSTOP_LOCK;
+	}
+
+	rc = mbox_process(mbox);
+	if (rc < 0)
+		goto exit;
+
+	rc = 0;
+exit:
+	mbox_put(mbox);
+	return rc;
+}
+
 int
 roc_nix_rss_reta_set(struct roc_nix *roc_nix, uint8_t group,
 		     uint16_t reta[ROC_NIX_RSS_RETA_MAX])
@@ -191,6 +255,8 @@ roc_nix_rss_reta_set(struct roc_nix *roc_nix, uint8_t group,
 	if (roc_model_is_cn9k())
 		rc = nix_cn9k_rss_reta_set(nix, group, reta,
 					   roc_nix->lock_rx_ctx);
+	else if (roc_model_is_cn10k())
+		rc = nix_cn10k_rss_reta_set(nix, group, reta, roc_nix->lock_rx_ctx);
 	else
 		rc = nix_rss_reta_set(nix, group, reta, roc_nix->lock_rx_ctx);
 	if (rc)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 10/18] net/cnxk: add cn20k base control path support
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
                     ` (8 preceding siblings ...)
  2024-09-26 16:01   ` [PATCH v2 09/18] common/cnxk: add RSS support " Nithin Dabilpuram
@ 2024-09-26 16:01   ` Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 11/18] net/cnxk: support Rx function select for cn20k Nithin Dabilpuram
                     ` (8 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-26 16:01 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra, Anatoly Burakov
  Cc: dev, Rakesh Kudurumalla, Rahul Bhansali

Add cn20k base control path support for ethdev.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
---
 doc/guides/rel_notes/release_24_11.rst |   4 +
 drivers/net/cnxk/cn20k_ethdev.c        | 553 +++++++++++++++++++++++++
 drivers/net/cnxk/cn20k_ethdev.h        |  11 +
 drivers/net/cnxk/cn20k_rx.h            |  33 ++
 drivers/net/cnxk/cn20k_rxtx.h          |  89 ++++
 drivers/net/cnxk/cn20k_tx.h            |  35 ++
 drivers/net/cnxk/cnxk_ethdev_dp.h      |   3 +
 drivers/net/cnxk/meson.build           |  11 +-
 8 files changed, 738 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/cnxk/cn20k_ethdev.c
 create mode 100644 drivers/net/cnxk/cn20k_ethdev.h
 create mode 100644 drivers/net/cnxk/cn20k_rx.h
 create mode 100644 drivers/net/cnxk/cn20k_rxtx.h
 create mode 100644 drivers/net/cnxk/cn20k_tx.h

diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index 3c666ddd10..e7597bace3 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -59,6 +59,10 @@ New Features
 
   * Added support for HW mempool in CN20K SoC.
 
+* **Updated Marvell cnxk net driver.**
+
+  * Added support for ethdev in CN20K SoC.
+
 
 Removed Items
 -------------
diff --git a/drivers/net/cnxk/cn20k_ethdev.c b/drivers/net/cnxk/cn20k_ethdev.c
new file mode 100644
index 0000000000..b4d21fe4be
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_ethdev.c
@@ -0,0 +1,553 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+#include "cn20k_ethdev.h"
+#include "cn20k_rx.h"
+#include "cn20k_tx.h"
+
+static int
+cn20k_nix_ptypes_set(struct rte_eth_dev *eth_dev, uint32_t ptype_mask)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	if (ptype_mask) {
+		dev->rx_offload_flags |= NIX_RX_OFFLOAD_PTYPE_F;
+		dev->ptype_disable = 0;
+	} else {
+		dev->rx_offload_flags &= ~NIX_RX_OFFLOAD_PTYPE_F;
+		dev->ptype_disable = 1;
+	}
+
+	return 0;
+}
+
+static void
+nix_form_default_desc(struct cnxk_eth_dev *dev, struct cn20k_eth_txq *txq, uint16_t qid)
+{
+	union nix_send_hdr_w0_u send_hdr_w0;
+
+	/* Initialize the fields based on basic single segment packet */
+	send_hdr_w0.u = 0;
+	if (dev->tx_offload_flags & NIX_TX_NEED_EXT_HDR) {
+		/* 2(HDR) + 2(EXT_HDR) + 1(SG) + 1(IOVA) = 6/2 - 1 = 2 */
+		send_hdr_w0.sizem1 = 2;
+		if (dev->tx_offload_flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+			/* Default: one seg packet would have:
+			 * 2(HDR) + 2(EXT) + 1(SG) + 1(IOVA) + 2(MEM)
+			 * => 8/2 - 1 = 3
+			 */
+			send_hdr_w0.sizem1 = 3;
+
+			/* To calculate the offset for send_mem,
+			 * send_hdr->w0.sizem1 * 2
+			 */
+			txq->ts_mem = dev->tstamp.tx_tstamp_iova;
+		}
+	} else {
+		/* 2(HDR) + 1(SG) + 1(IOVA) = 4/2 - 1 = 1 */
+		send_hdr_w0.sizem1 = 1;
+	}
+	send_hdr_w0.sq = qid;
+	txq->send_hdr_w0 = send_hdr_w0.u;
+	rte_wmb();
+}
+
+static int
+cn20k_nix_tx_compl_setup(struct cnxk_eth_dev *dev, struct cn20k_eth_txq *txq, struct roc_nix_sq *sq,
+			 uint16_t nb_desc)
+{
+	struct roc_nix_cq *cq;
+
+	cq = &dev->cqs[sq->cqid];
+	txq->tx_compl.desc_base = (uintptr_t)cq->desc_base;
+	txq->tx_compl.cq_door = cq->door;
+	txq->tx_compl.cq_status = cq->status;
+	txq->tx_compl.wdata = cq->wdata;
+	txq->tx_compl.head = cq->head;
+	txq->tx_compl.qmask = cq->qmask;
+	/* Total array size holding buffers is equal to
+	 * number of entries in cq and sq
+	 * max buffer in array = desc in cq + desc in sq
+	 */
+	txq->tx_compl.nb_desc_mask = (2 * rte_align32pow2(nb_desc)) - 1;
+	txq->tx_compl.ena = true;
+
+	txq->tx_compl.ptr = (struct rte_mbuf **)plt_zmalloc(
+		txq->tx_compl.nb_desc_mask * sizeof(struct rte_mbuf *), 0);
+	if (!txq->tx_compl.ptr)
+		return -1;
+
+	return 0;
+}
+
+static void
+cn20k_nix_tx_queue_release(struct rte_eth_dev *eth_dev, uint16_t qid)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct roc_nix *nix = &dev->nix;
+	struct cn20k_eth_txq *txq;
+
+	cnxk_nix_tx_queue_release(eth_dev, qid);
+	txq = eth_dev->data->tx_queues[qid];
+
+	if (nix->tx_compl_ena)
+		plt_free(txq->tx_compl.ptr);
+}
+
+static int
+cn20k_nix_tx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid, uint16_t nb_desc,
+			 unsigned int socket, const struct rte_eth_txconf *tx_conf)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct roc_nix *nix = &dev->nix;
+	uint64_t mark_fmt, mark_flag;
+	struct roc_cpt_lf *inl_lf;
+	struct cn20k_eth_txq *txq;
+	struct roc_nix_sq *sq;
+	uint16_t crypto_qid;
+	int rc;
+
+	RTE_SET_USED(socket);
+
+	/* Common Tx queue setup */
+	rc = cnxk_nix_tx_queue_setup(eth_dev, qid, nb_desc, sizeof(struct cn20k_eth_txq), tx_conf);
+	if (rc)
+		return rc;
+
+	sq = &dev->sqs[qid];
+	/* Update fast path queue */
+	txq = eth_dev->data->tx_queues[qid];
+	txq->fc_mem = sq->fc;
+	if (nix->tx_compl_ena) {
+		rc = cn20k_nix_tx_compl_setup(dev, txq, sq, nb_desc);
+		if (rc)
+			return rc;
+	}
+
+	/* Set Txq flag for MT_LOCKFREE */
+	txq->flag = !!(dev->tx_offloads & RTE_ETH_TX_OFFLOAD_MT_LOCKFREE);
+
+	/* Store lmt base in tx queue for easy access */
+	txq->lmt_base = nix->lmt_base;
+	txq->io_addr = sq->io_addr;
+	txq->nb_sqb_bufs_adj = sq->nb_sqb_bufs_adj;
+	txq->sqes_per_sqb_log2 = sq->sqes_per_sqb_log2;
+
+	/* Fetch CPT LF info for outbound if present */
+	if (dev->outb.lf_base) {
+		crypto_qid = qid % dev->outb.nb_crypto_qs;
+		inl_lf = dev->outb.lf_base + crypto_qid;
+
+		txq->cpt_io_addr = inl_lf->io_addr;
+		txq->cpt_fc = inl_lf->fc_addr;
+		txq->cpt_fc_sw = (int32_t *)((uintptr_t)dev->outb.fc_sw_mem +
+					     crypto_qid * RTE_CACHE_LINE_SIZE);
+
+		txq->cpt_desc = inl_lf->nb_desc * 0.7;
+		txq->sa_base = (uint64_t)dev->outb.sa_base;
+		txq->sa_base |= (uint64_t)eth_dev->data->port_id;
+		PLT_STATIC_ASSERT(ROC_NIX_INL_SA_BASE_ALIGN == BIT_ULL(16));
+	}
+
+	/* Restore marking flag from roc */
+	mark_fmt = roc_nix_tm_mark_format_get(nix, &mark_flag);
+	txq->mark_flag = mark_flag & CNXK_TM_MARK_MASK;
+	txq->mark_fmt = mark_fmt & CNXK_TX_MARK_FMT_MASK;
+
+	nix_form_default_desc(dev, txq, qid);
+	txq->lso_tun_fmt = dev->lso_tun_fmt;
+	return 0;
+}
+
+static int
+cn20k_nix_rx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid, uint16_t nb_desc,
+			 unsigned int socket, const struct rte_eth_rxconf *rx_conf,
+			 struct rte_mempool *mp)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct cn20k_eth_rxq *rxq;
+	struct roc_nix_rq *rq;
+	struct roc_nix_cq *cq;
+	int rc;
+
+	RTE_SET_USED(socket);
+
+	/* CQ Errata needs min 4K ring */
+	if (dev->cq_min_4k && nb_desc < 4096)
+		nb_desc = 4096;
+
+	/* Common Rx queue setup */
+	rc = cnxk_nix_rx_queue_setup(eth_dev, qid, nb_desc, sizeof(struct cn20k_eth_rxq), rx_conf,
+				     mp);
+	if (rc)
+		return rc;
+
+	/* Do initial mtu setup for RQ0 before device start */
+	if (!qid) {
+		rc = nix_recalc_mtu(eth_dev);
+		if (rc)
+			return rc;
+	}
+
+	rq = &dev->rqs[qid];
+	cq = &dev->cqs[qid];
+
+	/* Update fast path queue */
+	rxq = eth_dev->data->rx_queues[qid];
+	rxq->rq = qid;
+	rxq->desc = (uintptr_t)cq->desc_base;
+	rxq->cq_door = cq->door;
+	rxq->cq_status = cq->status;
+	rxq->wdata = cq->wdata;
+	rxq->head = cq->head;
+	rxq->qmask = cq->qmask;
+	rxq->tstamp = &dev->tstamp;
+
+	/* Data offset from data to start of mbuf is first_skip */
+	rxq->data_off = rq->first_skip;
+	rxq->mbuf_initializer = cnxk_nix_rxq_mbuf_setup(dev);
+
+	/* Setup security related info */
+	if (dev->rx_offload_flags & NIX_RX_OFFLOAD_SECURITY_F) {
+		rxq->lmt_base = dev->nix.lmt_base;
+		rxq->sa_base = roc_nix_inl_inb_sa_base_get(&dev->nix, dev->inb.inl_dev);
+	}
+
+	/* Lookup mem */
+	rxq->lookup_mem = cnxk_nix_fastpath_lookup_mem_get();
+	return 0;
+}
+
+static int
+cn20k_nix_tx_queue_stop(struct rte_eth_dev *eth_dev, uint16_t qidx)
+{
+	struct cn20k_eth_txq *txq = eth_dev->data->tx_queues[qidx];
+	int rc;
+
+	rc = cnxk_nix_tx_queue_stop(eth_dev, qidx);
+	if (rc)
+		return rc;
+
+	/* Clear fc cache pkts to trigger worker stop */
+	txq->fc_cache_pkts = 0;
+
+	return 0;
+}
+
+static int
+cn20k_nix_configure(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	int rc;
+
+	/* Common nix configure */
+	rc = cnxk_nix_configure(eth_dev);
+	if (rc)
+		return rc;
+
+	/* reset reassembly dynfield/flag offset */
+	dev->reass_dynfield_off = -1;
+	dev->reass_dynflag_bit = -1;
+
+	plt_nix_dbg("Configured port%d platform specific rx_offload_flags=%x"
+		    " tx_offload_flags=0x%x",
+		    eth_dev->data->port_id, dev->rx_offload_flags, dev->tx_offload_flags);
+	return 0;
+}
+
+static int
+cn20k_nix_timesync_enable(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	int i, rc;
+
+	rc = cnxk_nix_timesync_enable(eth_dev);
+	if (rc)
+		return rc;
+
+	dev->rx_offload_flags |= NIX_RX_OFFLOAD_TSTAMP_F;
+	dev->tx_offload_flags |= NIX_TX_OFFLOAD_TSTAMP_F;
+
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
+		nix_form_default_desc(dev, eth_dev->data->tx_queues[i], i);
+
+	return 0;
+}
+
+static int
+cn20k_nix_timesync_disable(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	int i, rc;
+
+	rc = cnxk_nix_timesync_disable(eth_dev);
+	if (rc)
+		return rc;
+
+	dev->rx_offload_flags &= ~NIX_RX_OFFLOAD_TSTAMP_F;
+	dev->tx_offload_flags &= ~NIX_TX_OFFLOAD_TSTAMP_F;
+
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
+		nix_form_default_desc(dev, eth_dev->data->tx_queues[i], i);
+
+	return 0;
+}
+
+static int
+cn20k_nix_timesync_read_tx_timestamp(struct rte_eth_dev *eth_dev, struct timespec *timestamp)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct cnxk_timesync_info *tstamp = &dev->tstamp;
+	uint64_t ns;
+
+	if (*tstamp->tx_tstamp == 0)
+		return -EINVAL;
+
+	*tstamp->tx_tstamp =
+		((*tstamp->tx_tstamp >> 32) * NSEC_PER_SEC) + (*tstamp->tx_tstamp & 0xFFFFFFFFUL);
+	ns = rte_timecounter_update(&dev->tx_tstamp_tc, *tstamp->tx_tstamp);
+	*timestamp = rte_ns_to_timespec(ns);
+	*tstamp->tx_tstamp = 0;
+	rte_wmb();
+
+	return 0;
+}
+
+static int
+cn20k_nix_dev_start(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct roc_nix *nix = &dev->nix;
+	int rc;
+
+	/* Common eth dev start */
+	rc = cnxk_nix_dev_start(eth_dev);
+	if (rc)
+		return rc;
+
+	/* Set flags for Rx Inject feature */
+	if (roc_idev_nix_rx_inject_get(nix->port_id))
+		dev->rx_offload_flags |= NIX_RX_SEC_REASSEMBLY_F;
+
+	return 0;
+}
+
+static int
+cn20k_nix_reassembly_capability_get(struct rte_eth_dev *eth_dev,
+				    struct rte_eth_ip_reassembly_params *reassembly_capa)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	int rc = -ENOTSUP;
+	RTE_SET_USED(eth_dev);
+
+	if (!roc_feature_nix_has_reass())
+		return -ENOTSUP;
+
+	if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SECURITY) {
+		reassembly_capa->timeout_ms = 60 * 1000;
+		reassembly_capa->max_frags = 4;
+		reassembly_capa->flags =
+			RTE_ETH_DEV_REASSEMBLY_F_IPV4 | RTE_ETH_DEV_REASSEMBLY_F_IPV6;
+		rc = 0;
+	}
+
+	return rc;
+}
+
+static int
+cn20k_nix_reassembly_conf_get(struct rte_eth_dev *eth_dev,
+			      struct rte_eth_ip_reassembly_params *conf)
+{
+	RTE_SET_USED(eth_dev);
+	RTE_SET_USED(conf);
+	return -ENOTSUP;
+}
+
+static int
+cn20k_nix_reassembly_conf_set(struct rte_eth_dev *eth_dev,
+			      const struct rte_eth_ip_reassembly_params *conf)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	int rc = 0;
+
+	if (!roc_feature_nix_has_reass())
+		return -ENOTSUP;
+
+	if (!conf->flags) {
+		/* Clear offload flags on disable */
+		if (!dev->inb.nb_oop)
+			dev->rx_offload_flags &= ~NIX_RX_REAS_F;
+		dev->inb.reass_en = false;
+		return 0;
+	}
+
+	rc = roc_nix_reassembly_configure(conf->timeout_ms, conf->max_frags);
+	if (!rc && dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SECURITY) {
+		dev->rx_offload_flags |= NIX_RX_REAS_F;
+		dev->inb.reass_en = true;
+	}
+
+	return rc;
+}
+
+static int
+cn20k_nix_rx_avail_get(struct cn20k_eth_rxq *rxq)
+{
+	uint32_t qmask = rxq->qmask;
+	uint64_t reg, head, tail;
+	int available;
+
+	/* Use LDADDA version to avoid reorder */
+	reg = roc_atomic64_add_sync(rxq->wdata, rxq->cq_status);
+	/* CQ_OP_STATUS operation error */
+	if (reg & BIT_ULL(NIX_CQ_OP_STAT_OP_ERR) || reg & BIT_ULL(NIX_CQ_OP_STAT_CQ_ERR))
+		return 0;
+	tail = reg & 0xFFFFF;
+	head = (reg >> 20) & 0xFFFFF;
+	if (tail < head)
+		available = tail - head + qmask + 1;
+	else
+		available = tail - head;
+
+	return available;
+}
+
+static int
+cn20k_rx_descriptor_dump(const struct rte_eth_dev *eth_dev, uint16_t qid, uint16_t offset,
+			 uint16_t num, FILE *file)
+{
+	struct cn20k_eth_rxq *rxq = eth_dev->data->rx_queues[qid];
+	const uint64_t data_off = rxq->data_off;
+	const uint32_t qmask = rxq->qmask;
+	const uintptr_t desc = rxq->desc;
+	struct cpt_parse_hdr_s *cpth;
+	uint32_t head = rxq->head;
+	struct nix_cqe_hdr_s *cq;
+	uint16_t count = 0;
+	int available_pkts;
+	uint64_t cq_w1;
+
+	available_pkts = cn20k_nix_rx_avail_get(rxq);
+
+	if ((offset + num - 1) >= available_pkts) {
+		plt_err("Invalid BD num=%u\n", num);
+		return -EINVAL;
+	}
+
+	while (count < num) {
+		cq = (struct nix_cqe_hdr_s *)(desc + CQE_SZ(head) + count + offset);
+		cq_w1 = *((const uint64_t *)cq + 1);
+		if (cq_w1 & BIT(11)) {
+			rte_iova_t buff = *((rte_iova_t *)((uint64_t *)cq + 9));
+			struct rte_mbuf *mbuf = (struct rte_mbuf *)(buff - data_off);
+			cpth = (struct cpt_parse_hdr_s *)((uintptr_t)mbuf + (uint16_t)data_off);
+			roc_cpt_parse_hdr_dump(file, cpth);
+		} else {
+			roc_nix_cqe_dump(file, cq);
+		}
+
+		count++;
+		head &= qmask;
+	}
+	return 0;
+}
+
+/* Update platform specific eth dev ops */
+static void
+nix_eth_dev_ops_override(void)
+{
+	static int init_once;
+
+	if (init_once)
+		return;
+	init_once = 1;
+
+	/* Update platform specific ops */
+	cnxk_eth_dev_ops.dev_configure = cn20k_nix_configure;
+	cnxk_eth_dev_ops.tx_queue_setup = cn20k_nix_tx_queue_setup;
+	cnxk_eth_dev_ops.rx_queue_setup = cn20k_nix_rx_queue_setup;
+	cnxk_eth_dev_ops.tx_queue_release = cn20k_nix_tx_queue_release;
+	cnxk_eth_dev_ops.tx_queue_stop = cn20k_nix_tx_queue_stop;
+	cnxk_eth_dev_ops.dev_start = cn20k_nix_dev_start;
+	cnxk_eth_dev_ops.dev_ptypes_set = cn20k_nix_ptypes_set;
+	cnxk_eth_dev_ops.timesync_enable = cn20k_nix_timesync_enable;
+	cnxk_eth_dev_ops.timesync_disable = cn20k_nix_timesync_disable;
+	cnxk_eth_dev_ops.timesync_read_tx_timestamp = cn20k_nix_timesync_read_tx_timestamp;
+	cnxk_eth_dev_ops.ip_reassembly_capability_get = cn20k_nix_reassembly_capability_get;
+	cnxk_eth_dev_ops.ip_reassembly_conf_get = cn20k_nix_reassembly_conf_get;
+	cnxk_eth_dev_ops.ip_reassembly_conf_set = cn20k_nix_reassembly_conf_set;
+	cnxk_eth_dev_ops.eth_rx_descriptor_dump = cn20k_rx_descriptor_dump;
+}
+
+/* Update platform specific tm ops */
+static void
+nix_tm_ops_override(void)
+{
+	static int init_once;
+
+	if (init_once)
+		return;
+	init_once = 1;
+}
+
+static int
+cn20k_nix_remove(struct rte_pci_device *pci_dev)
+{
+	return cnxk_nix_remove(pci_dev);
+}
+
+static int
+cn20k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
+{
+	struct rte_eth_dev *eth_dev;
+	int rc;
+
+	rc = roc_plt_init();
+	if (rc) {
+		plt_err("Failed to initialize platform model, rc=%d", rc);
+		return rc;
+	}
+
+	nix_eth_dev_ops_override();
+	nix_tm_ops_override();
+
+	/* Common probe */
+	rc = cnxk_nix_probe(pci_drv, pci_dev);
+	if (rc)
+		return rc;
+
+	/* Find eth dev allocated */
+	eth_dev = rte_eth_dev_allocated(pci_dev->device.name);
+	if (!eth_dev) {
+		/* Ignore if ethdev is in mid of detach state in secondary */
+		if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+			return 0;
+		return -ENOENT;
+	}
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return 0;
+
+	return 0;
+}
+
+static const struct rte_pci_id cn20k_pci_nix_map[] = {
+	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN20KA, PCI_DEVID_CNXK_RVU_PF),
+	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN20KA, PCI_DEVID_CNXK_RVU_VF),
+	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN20KA, PCI_DEVID_CNXK_RVU_AF_VF),
+	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN20KA, PCI_DEVID_CNXK_RVU_SDP_VF),
+	{
+		.vendor_id = 0,
+	},
+};
+
+static struct rte_pci_driver cn20k_pci_nix = {
+	.id_table = cn20k_pci_nix_map,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA | RTE_PCI_DRV_INTR_LSC,
+	.probe = cn20k_nix_probe,
+	.remove = cn20k_nix_remove,
+};
+
+RTE_PMD_REGISTER_PCI(net_cn20k, cn20k_pci_nix);
+RTE_PMD_REGISTER_PCI_TABLE(net_cn20k, cn20k_pci_nix_map);
+RTE_PMD_REGISTER_KMOD_DEP(net_cn20k, "vfio-pci");
diff --git a/drivers/net/cnxk/cn20k_ethdev.h b/drivers/net/cnxk/cn20k_ethdev.h
new file mode 100644
index 0000000000..1af490befc
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_ethdev.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+#ifndef __CN20K_ETHDEV_H__
+#define __CN20K_ETHDEV_H__
+
+#include <cn20k_rxtx.h>
+#include <cnxk_ethdev.h>
+#include <cnxk_security.h>
+
+#endif /* __CN20K_ETHDEV_H__ */
diff --git a/drivers/net/cnxk/cn20k_rx.h b/drivers/net/cnxk/cn20k_rx.h
new file mode 100644
index 0000000000..58a2920a54
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_rx.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+#ifndef __CN20K_RX_H__
+#define __CN20K_RX_H__
+
+#include "cn20k_rxtx.h"
+#include <rte_ethdev.h>
+#include <rte_security_driver.h>
+#include <rte_vect.h>
+
+#define NSEC_PER_SEC 1000000000L
+
+#define NIX_RX_OFFLOAD_NONE	     (0)
+#define NIX_RX_OFFLOAD_RSS_F	     BIT(0)
+#define NIX_RX_OFFLOAD_PTYPE_F	     BIT(1)
+#define NIX_RX_OFFLOAD_CHECKSUM_F    BIT(2)
+#define NIX_RX_OFFLOAD_MARK_UPDATE_F BIT(3)
+#define NIX_RX_OFFLOAD_TSTAMP_F	     BIT(4)
+#define NIX_RX_OFFLOAD_VLAN_STRIP_F  BIT(5)
+#define NIX_RX_OFFLOAD_SECURITY_F    BIT(6)
+#define NIX_RX_OFFLOAD_MAX	     (NIX_RX_OFFLOAD_SECURITY_F << 1)
+
+/* Flags to control cqe_to_mbuf conversion function.
+ * Defining it from backwards to denote its been
+ * not used as offload flags to pick function
+ */
+#define NIX_RX_REAS_F	   BIT(12)
+#define NIX_RX_VWQE_F	   BIT(13)
+#define NIX_RX_MULTI_SEG_F BIT(14)
+
+#define NIX_RX_SEC_REASSEMBLY_F (NIX_RX_REAS_F | NIX_RX_OFFLOAD_SECURITY_F)
+#endif /* __CN20K_RX_H__ */
diff --git a/drivers/net/cnxk/cn20k_rxtx.h b/drivers/net/cnxk/cn20k_rxtx.h
new file mode 100644
index 0000000000..5cc445d4b1
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_rxtx.h
@@ -0,0 +1,89 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#ifndef __CN20K_RXTX_H__
+#define __CN20K_RXTX_H__
+
+#include <rte_security.h>
+
+/* ROC Constants */
+#include "roc_constants.h"
+
+/* Platform definition */
+#include "roc_platform.h"
+
+/* IO */
+#if defined(__aarch64__)
+#include "roc_io.h"
+#else
+#include "roc_io_generic.h"
+#endif
+
+/* HW structure definition */
+#include "hw/cpt.h"
+#include "hw/nix.h"
+#include "hw/npa.h"
+#include "hw/npc.h"
+#include "hw/ssow.h"
+
+#include "roc_ie_ot.h"
+
+/* NPA */
+#include "roc_npa_dp.h"
+
+/* SSO */
+#include "roc_sso_dp.h"
+
+/* CPT */
+#include "roc_cpt.h"
+
+/* NIX Inline dev */
+#include "roc_nix_inl_dp.h"
+
+#include "cnxk_ethdev_dp.h"
+
+struct cn20k_eth_txq {
+	uint64_t send_hdr_w0;
+	int64_t fc_cache_pkts;
+	uint64_t *fc_mem;
+	uintptr_t lmt_base;
+	rte_iova_t io_addr;
+	uint16_t sqes_per_sqb_log2;
+	int16_t nb_sqb_bufs_adj;
+	uint8_t flag;
+	rte_iova_t cpt_io_addr;
+	uint64_t sa_base;
+	uint64_t *cpt_fc;
+	uint16_t cpt_desc;
+	int32_t *cpt_fc_sw;
+	uint64_t lso_tun_fmt;
+	uint64_t ts_mem;
+	uint64_t mark_flag : 8;
+	uint64_t mark_fmt : 48;
+	struct cnxk_eth_txq_comp tx_compl;
+} __plt_cache_aligned;
+
+struct cn20k_eth_rxq {
+	uint64_t mbuf_initializer;
+	uintptr_t desc;
+	void *lookup_mem;
+	uintptr_t cq_door;
+	uint64_t wdata;
+	int64_t *cq_status;
+	uint32_t head;
+	uint32_t qmask;
+	uint32_t available;
+	uint16_t data_off;
+	uint64_t sa_base;
+	uint64_t lmt_base;
+	uint64_t meta_aura;
+	uintptr_t meta_pool;
+	uint16_t rq;
+	struct cnxk_timesync_info *tstamp;
+} __plt_cache_aligned;
+
+#define LMT_OFF(lmt_addr, lmt_num, offset)                                                         \
+	(void *)((uintptr_t)(lmt_addr) + ((uint64_t)(lmt_num) << ROC_LMT_LINE_SIZE_LOG2) + (offset))
+
+#endif /* __CN20K_RXTX_H__ */
diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
new file mode 100644
index 0000000000..a00c9d5776
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+#ifndef __CN20K_TX_H__
+#define __CN20K_TX_H__
+
+#include "cn20k_rxtx.h"
+#include <rte_eventdev.h>
+#include <rte_vect.h>
+
+#define NIX_TX_OFFLOAD_NONE	      (0)
+#define NIX_TX_OFFLOAD_L3_L4_CSUM_F   BIT(0)
+#define NIX_TX_OFFLOAD_OL3_OL4_CSUM_F BIT(1)
+#define NIX_TX_OFFLOAD_VLAN_QINQ_F    BIT(2)
+#define NIX_TX_OFFLOAD_MBUF_NOFF_F    BIT(3)
+#define NIX_TX_OFFLOAD_TSO_F	      BIT(4)
+#define NIX_TX_OFFLOAD_TSTAMP_F	      BIT(5)
+#define NIX_TX_OFFLOAD_SECURITY_F     BIT(6)
+#define NIX_TX_OFFLOAD_MAX	      (NIX_TX_OFFLOAD_SECURITY_F << 1)
+
+/* Flags to control xmit_prepare function.
+ * Defining it from backwards to denote its been
+ * not used as offload flags to pick function
+ */
+#define NIX_TX_VWQE_F	   BIT(14)
+#define NIX_TX_MULTI_SEG_F BIT(15)
+
+#define NIX_TX_NEED_SEND_HDR_W1                                                                    \
+	(NIX_TX_OFFLOAD_L3_L4_CSUM_F | NIX_TX_OFFLOAD_OL3_OL4_CSUM_F |                             \
+	 NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSO_F)
+
+#define NIX_TX_NEED_EXT_HDR                                                                        \
+	(NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSTAMP_F | NIX_TX_OFFLOAD_TSO_F)
+
+#endif /* __CN20K_TX_H__ */
diff --git a/drivers/net/cnxk/cnxk_ethdev_dp.h b/drivers/net/cnxk/cnxk_ethdev_dp.h
index 119bb1836a..100d22e759 100644
--- a/drivers/net/cnxk/cnxk_ethdev_dp.h
+++ b/drivers/net/cnxk/cnxk_ethdev_dp.h
@@ -59,6 +59,9 @@
 
 #define CNXK_TX_MARK_FMT_MASK (0xFFFFFFFFFFFFull)
 
+#define CNXK_NIX_CQ_ENTRY_SZ 128
+#define CQE_SZ(x)            ((x) * CNXK_NIX_CQ_ENTRY_SZ)
+
 struct cnxk_eth_txq_comp {
 	uintptr_t desc_base;
 	uintptr_t cq_door;
diff --git a/drivers/net/cnxk/meson.build b/drivers/net/cnxk/meson.build
index 7bce80098a..cf2ce09f77 100644
--- a/drivers/net/cnxk/meson.build
+++ b/drivers/net/cnxk/meson.build
@@ -14,7 +14,7 @@ else
         soc_type = platform
 endif
 
-if soc_type != 'cn9k' and soc_type != 'cn10k'
+if soc_type != 'cn9k' and soc_type != 'cn10k' and soc_type != 'cn20k'
         soc_type = 'all'
 endif
 
@@ -231,6 +231,15 @@ sources += files(
 endif
 endif
 
+
+if soc_type == 'cn20k' or soc_type == 'all'
+# CN20K
+sources += files(
+        'cn20k_ethdev.c',
+)
+endif
+
+
 deps += ['bus_pci', 'cryptodev', 'eventdev', 'security']
 deps += ['common_cnxk', 'mempool_cnxk']
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 11/18] net/cnxk: support Rx function select for cn20k
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
                     ` (9 preceding siblings ...)
  2024-09-26 16:01   ` [PATCH v2 10/18] net/cnxk: add cn20k base control path support Nithin Dabilpuram
@ 2024-09-26 16:01   ` Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 12/18] net/cnxk: support Tx " Nithin Dabilpuram
                     ` (7 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-26 16:01 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra, Anatoly Burakov
  Cc: dev

Add support to select Rx function based on offload flags
for cn20k.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
---
 drivers/net/cnxk/cn20k_ethdev.c               |  59 ++++-
 drivers/net/cnxk/cn20k_ethdev.h               |   3 +
 drivers/net/cnxk/cn20k_rx.h                   | 226 ++++++++++++++++++
 drivers/net/cnxk/cn20k_rx_select.c            | 162 +++++++++++++
 drivers/net/cnxk/meson.build                  |  44 ++++
 drivers/net/cnxk/rx/cn20k/rx_0_15.c           |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c      |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c       |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c  |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_112_127.c        |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c   |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c    |  20 ++
 .../net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c   |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_16_31.c          |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c     |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c      |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_32_47.c          |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c     |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c      |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_48_63.c          |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c     |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c      |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_64_79.c          |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c     |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c      |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_80_95.c          |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c     |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c      |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_96_111.c         |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c    |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c     |  20 ++
 .../net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c    |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_all_offload.c    |  57 +++++
 38 files changed, 1190 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/cnxk/cn20k_rx_select.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_all_offload.c

diff --git a/drivers/net/cnxk/cn20k_ethdev.c b/drivers/net/cnxk/cn20k_ethdev.c
index b4d21fe4be..d1cb3a52bf 100644
--- a/drivers/net/cnxk/cn20k_ethdev.c
+++ b/drivers/net/cnxk/cn20k_ethdev.c
@@ -5,6 +5,41 @@
 #include "cn20k_rx.h"
 #include "cn20k_tx.h"
 
+static uint16_t
+nix_rx_offload_flags(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct rte_eth_dev_data *data = eth_dev->data;
+	struct rte_eth_conf *conf = &data->dev_conf;
+	struct rte_eth_rxmode *rxmode = &conf->rxmode;
+	uint16_t flags = 0;
+
+	if (rxmode->mq_mode == RTE_ETH_MQ_RX_RSS &&
+	    (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH))
+		flags |= NIX_RX_OFFLOAD_RSS_F;
+
+	if (dev->rx_offloads & (RTE_ETH_RX_OFFLOAD_TCP_CKSUM | RTE_ETH_RX_OFFLOAD_UDP_CKSUM))
+		flags |= NIX_RX_OFFLOAD_CHECKSUM_F;
+
+	if (dev->rx_offloads &
+	    (RTE_ETH_RX_OFFLOAD_IPV4_CKSUM | RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM))
+		flags |= NIX_RX_OFFLOAD_CHECKSUM_F;
+
+	if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER)
+		flags |= NIX_RX_MULTI_SEG_F;
+
+	if ((dev->rx_offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP))
+		flags |= NIX_RX_OFFLOAD_TSTAMP_F;
+
+	if (!dev->ptype_disable)
+		flags |= NIX_RX_OFFLOAD_PTYPE_F;
+
+	if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SECURITY)
+		flags |= NIX_RX_OFFLOAD_SECURITY_F;
+
+	return flags;
+}
+
 static int
 cn20k_nix_ptypes_set(struct rte_eth_dev *eth_dev, uint32_t ptype_mask)
 {
@@ -18,6 +53,7 @@ cn20k_nix_ptypes_set(struct rte_eth_dev *eth_dev, uint32_t ptype_mask)
 		dev->ptype_disable = 1;
 	}
 
+	cn20k_eth_set_rx_function(eth_dev);
 	return 0;
 }
 
@@ -187,6 +223,9 @@ cn20k_nix_rx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid, uint16_t nb_
 		rc = nix_recalc_mtu(eth_dev);
 		if (rc)
 			return rc;
+
+		/* Update offload flags */
+		dev->rx_offload_flags = nix_rx_offload_flags(eth_dev);
 	}
 
 	rq = &dev->rqs[qid];
@@ -245,6 +284,8 @@ cn20k_nix_configure(struct rte_eth_dev *eth_dev)
 	if (rc)
 		return rc;
 
+	/* Update offload flags */
+	dev->rx_offload_flags = nix_rx_offload_flags(eth_dev);
 	/* reset reassembly dynfield/flag offset */
 	dev->reass_dynfield_off = -1;
 	dev->reass_dynflag_bit = -1;
@@ -271,6 +312,10 @@ cn20k_nix_timesync_enable(struct rte_eth_dev *eth_dev)
 	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
 		nix_form_default_desc(dev, eth_dev->data->tx_queues[i], i);
 
+	/* Setting up the rx[tx]_offload_flags due to change
+	 * in rx[tx]_offloads.
+	 */
+	cn20k_eth_set_rx_function(eth_dev);
 	return 0;
 }
 
@@ -290,6 +335,10 @@ cn20k_nix_timesync_disable(struct rte_eth_dev *eth_dev)
 	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
 		nix_form_default_desc(dev, eth_dev->data->tx_queues[i], i);
 
+	/* Setting up the rx[tx]_offload_flags due to change
+	 * in rx[tx]_offloads.
+	 */
+	cn20k_eth_set_rx_function(eth_dev);
 	return 0;
 }
 
@@ -325,10 +374,15 @@ cn20k_nix_dev_start(struct rte_eth_dev *eth_dev)
 	if (rc)
 		return rc;
 
+	/* Setting up the rx[tx]_offload_flags due to change
+	 * in rx[tx]_offloads.
+	 */
+	dev->rx_offload_flags |= nix_rx_offload_flags(eth_dev);
 	/* Set flags for Rx Inject feature */
 	if (roc_idev_nix_rx_inject_get(nix->port_id))
 		dev->rx_offload_flags |= NIX_RX_SEC_REASSEMBLY_F;
 
+	cn20k_eth_set_rx_function(eth_dev);
 	return 0;
 }
 
@@ -525,8 +579,11 @@ cn20k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		return -ENOENT;
 	}
 
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		/* Setup callbacks for secondary process */
+		cn20k_eth_set_rx_function(eth_dev);
 		return 0;
+	}
 
 	return 0;
 }
diff --git a/drivers/net/cnxk/cn20k_ethdev.h b/drivers/net/cnxk/cn20k_ethdev.h
index 1af490befc..2049ee7fa4 100644
--- a/drivers/net/cnxk/cn20k_ethdev.h
+++ b/drivers/net/cnxk/cn20k_ethdev.h
@@ -8,4 +8,7 @@
 #include <cnxk_ethdev.h>
 #include <cnxk_security.h>
 
+/* Rx and Tx routines */
+void cn20k_eth_set_rx_function(struct rte_eth_dev *eth_dev);
+
 #endif /* __CN20K_ETHDEV_H__ */
diff --git a/drivers/net/cnxk/cn20k_rx.h b/drivers/net/cnxk/cn20k_rx.h
index 58a2920a54..2cb77c0b46 100644
--- a/drivers/net/cnxk/cn20k_rx.h
+++ b/drivers/net/cnxk/cn20k_rx.h
@@ -30,4 +30,230 @@
 #define NIX_RX_MULTI_SEG_F BIT(14)
 
 #define NIX_RX_SEC_REASSEMBLY_F (NIX_RX_REAS_F | NIX_RX_OFFLOAD_SECURITY_F)
+
+#define RSS_F	  NIX_RX_OFFLOAD_RSS_F
+#define PTYPE_F	  NIX_RX_OFFLOAD_PTYPE_F
+#define CKSUM_F	  NIX_RX_OFFLOAD_CHECKSUM_F
+#define MARK_F	  NIX_RX_OFFLOAD_MARK_UPDATE_F
+#define TS_F	  NIX_RX_OFFLOAD_TSTAMP_F
+#define RX_VLAN_F NIX_RX_OFFLOAD_VLAN_STRIP_F
+#define R_SEC_F	  NIX_RX_OFFLOAD_SECURITY_F
+
+/* [R_SEC_F] [RX_VLAN_F] [TS] [MARK] [CKSUM] [PTYPE] [RSS] */
+#define NIX_RX_FASTPATH_MODES_0_15                                                                 \
+	R(no_offload, NIX_RX_OFFLOAD_NONE)                                                         \
+	R(rss, RSS_F)                                                                              \
+	R(ptype, PTYPE_F)                                                                          \
+	R(ptype_rss, PTYPE_F | RSS_F)                                                              \
+	R(cksum, CKSUM_F)                                                                          \
+	R(cksum_rss, CKSUM_F | RSS_F)                                                              \
+	R(cksum_ptype, CKSUM_F | PTYPE_F)                                                          \
+	R(cksum_ptype_rss, CKSUM_F | PTYPE_F | RSS_F)                                              \
+	R(mark, MARK_F)                                                                            \
+	R(mark_rss, MARK_F | RSS_F)                                                                \
+	R(mark_ptype, MARK_F | PTYPE_F)                                                            \
+	R(mark_ptype_rss, MARK_F | PTYPE_F | RSS_F)                                                \
+	R(mark_cksum, MARK_F | CKSUM_F)                                                            \
+	R(mark_cksum_rss, MARK_F | CKSUM_F | RSS_F)                                                \
+	R(mark_cksum_ptype, MARK_F | CKSUM_F | PTYPE_F)                                            \
+	R(mark_cksum_ptype_rss, MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_16_31                                                                \
+	R(ts, TS_F)                                                                                \
+	R(ts_rss, TS_F | RSS_F)                                                                    \
+	R(ts_ptype, TS_F | PTYPE_F)                                                                \
+	R(ts_ptype_rss, TS_F | PTYPE_F | RSS_F)                                                    \
+	R(ts_cksum, TS_F | CKSUM_F)                                                                \
+	R(ts_cksum_rss, TS_F | CKSUM_F | RSS_F)                                                    \
+	R(ts_cksum_ptype, TS_F | CKSUM_F | PTYPE_F)                                                \
+	R(ts_cksum_ptype_rss, TS_F | CKSUM_F | PTYPE_F | RSS_F)                                    \
+	R(ts_mark, TS_F | MARK_F)                                                                  \
+	R(ts_mark_rss, TS_F | MARK_F | RSS_F)                                                      \
+	R(ts_mark_ptype, TS_F | MARK_F | PTYPE_F)                                                  \
+	R(ts_mark_ptype_rss, TS_F | MARK_F | PTYPE_F | RSS_F)                                      \
+	R(ts_mark_cksum, TS_F | MARK_F | CKSUM_F)                                                  \
+	R(ts_mark_cksum_rss, TS_F | MARK_F | CKSUM_F | RSS_F)                                      \
+	R(ts_mark_cksum_ptype, TS_F | MARK_F | CKSUM_F | PTYPE_F)                                  \
+	R(ts_mark_cksum_ptype_rss, TS_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_32_47                                                                \
+	R(vlan, RX_VLAN_F)                                                                         \
+	R(vlan_rss, RX_VLAN_F | RSS_F)                                                             \
+	R(vlan_ptype, RX_VLAN_F | PTYPE_F)                                                         \
+	R(vlan_ptype_rss, RX_VLAN_F | PTYPE_F | RSS_F)                                             \
+	R(vlan_cksum, RX_VLAN_F | CKSUM_F)                                                         \
+	R(vlan_cksum_rss, RX_VLAN_F | CKSUM_F | RSS_F)                                             \
+	R(vlan_cksum_ptype, RX_VLAN_F | CKSUM_F | PTYPE_F)                                         \
+	R(vlan_cksum_ptype_rss, RX_VLAN_F | CKSUM_F | PTYPE_F | RSS_F)                             \
+	R(vlan_mark, RX_VLAN_F | MARK_F)                                                           \
+	R(vlan_mark_rss, RX_VLAN_F | MARK_F | RSS_F)                                               \
+	R(vlan_mark_ptype, RX_VLAN_F | MARK_F | PTYPE_F)                                           \
+	R(vlan_mark_ptype_rss, RX_VLAN_F | MARK_F | PTYPE_F | RSS_F)                               \
+	R(vlan_mark_cksum, RX_VLAN_F | MARK_F | CKSUM_F)                                           \
+	R(vlan_mark_cksum_rss, RX_VLAN_F | MARK_F | CKSUM_F | RSS_F)                               \
+	R(vlan_mark_cksum_ptype, RX_VLAN_F | MARK_F | CKSUM_F | PTYPE_F)                           \
+	R(vlan_mark_cksum_ptype_rss, RX_VLAN_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_48_63                                                                \
+	R(vlan_ts, RX_VLAN_F | TS_F)                                                               \
+	R(vlan_ts_rss, RX_VLAN_F | TS_F | RSS_F)                                                   \
+	R(vlan_ts_ptype, RX_VLAN_F | TS_F | PTYPE_F)                                               \
+	R(vlan_ts_ptype_rss, RX_VLAN_F | TS_F | PTYPE_F | RSS_F)                                   \
+	R(vlan_ts_cksum, RX_VLAN_F | TS_F | CKSUM_F)                                               \
+	R(vlan_ts_cksum_rss, RX_VLAN_F | TS_F | CKSUM_F | RSS_F)                                   \
+	R(vlan_ts_cksum_ptype, RX_VLAN_F | TS_F | CKSUM_F | PTYPE_F)                               \
+	R(vlan_ts_cksum_ptype_rss, RX_VLAN_F | TS_F | CKSUM_F | PTYPE_F | RSS_F)                   \
+	R(vlan_ts_mark, RX_VLAN_F | TS_F | MARK_F)                                                 \
+	R(vlan_ts_mark_rss, RX_VLAN_F | TS_F | MARK_F | RSS_F)                                     \
+	R(vlan_ts_mark_ptype, RX_VLAN_F | TS_F | MARK_F | PTYPE_F)                                 \
+	R(vlan_ts_mark_ptype_rss, RX_VLAN_F | TS_F | MARK_F | PTYPE_F | RSS_F)                     \
+	R(vlan_ts_mark_cksum, RX_VLAN_F | TS_F | MARK_F | CKSUM_F)                                 \
+	R(vlan_ts_mark_cksum_rss, RX_VLAN_F | TS_F | MARK_F | CKSUM_F | RSS_F)                     \
+	R(vlan_ts_mark_cksum_ptype, RX_VLAN_F | TS_F | MARK_F | CKSUM_F | PTYPE_F)                 \
+	R(vlan_ts_mark_cksum_ptype_rss, RX_VLAN_F | TS_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_64_79                                                                \
+	R(sec, R_SEC_F)                                                                            \
+	R(sec_rss, R_SEC_F | RSS_F)                                                                \
+	R(sec_ptype, R_SEC_F | PTYPE_F)                                                            \
+	R(sec_ptype_rss, R_SEC_F | PTYPE_F | RSS_F)                                                \
+	R(sec_cksum, R_SEC_F | CKSUM_F)                                                            \
+	R(sec_cksum_rss, R_SEC_F | CKSUM_F | RSS_F)                                                \
+	R(sec_cksum_ptype, R_SEC_F | CKSUM_F | PTYPE_F)                                            \
+	R(sec_cksum_ptype_rss, R_SEC_F | CKSUM_F | PTYPE_F | RSS_F)                                \
+	R(sec_mark, R_SEC_F | MARK_F)                                                              \
+	R(sec_mark_rss, R_SEC_F | MARK_F | RSS_F)                                                  \
+	R(sec_mark_ptype, R_SEC_F | MARK_F | PTYPE_F)                                              \
+	R(sec_mark_ptype_rss, R_SEC_F | MARK_F | PTYPE_F | RSS_F)                                  \
+	R(sec_mark_cksum, R_SEC_F | MARK_F | CKSUM_F)                                              \
+	R(sec_mark_cksum_rss, R_SEC_F | MARK_F | CKSUM_F | RSS_F)                                  \
+	R(sec_mark_cksum_ptype, R_SEC_F | MARK_F | CKSUM_F | PTYPE_F)                              \
+	R(sec_mark_cksum_ptype_rss, R_SEC_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_80_95                                                                \
+	R(sec_ts, R_SEC_F | TS_F)                                                                  \
+	R(sec_ts_rss, R_SEC_F | TS_F | RSS_F)                                                      \
+	R(sec_ts_ptype, R_SEC_F | TS_F | PTYPE_F)                                                  \
+	R(sec_ts_ptype_rss, R_SEC_F | TS_F | PTYPE_F | RSS_F)                                      \
+	R(sec_ts_cksum, R_SEC_F | TS_F | CKSUM_F)                                                  \
+	R(sec_ts_cksum_rss, R_SEC_F | TS_F | CKSUM_F | RSS_F)                                      \
+	R(sec_ts_cksum_ptype, R_SEC_F | TS_F | CKSUM_F | PTYPE_F)                                  \
+	R(sec_ts_cksum_ptype_rss, R_SEC_F | TS_F | CKSUM_F | PTYPE_F | RSS_F)                      \
+	R(sec_ts_mark, R_SEC_F | TS_F | MARK_F)                                                    \
+	R(sec_ts_mark_rss, R_SEC_F | TS_F | MARK_F | RSS_F)                                        \
+	R(sec_ts_mark_ptype, R_SEC_F | TS_F | MARK_F | PTYPE_F)                                    \
+	R(sec_ts_mark_ptype_rss, R_SEC_F | TS_F | MARK_F | PTYPE_F | RSS_F)                        \
+	R(sec_ts_mark_cksum, R_SEC_F | TS_F | MARK_F | CKSUM_F)                                    \
+	R(sec_ts_mark_cksum_rss, R_SEC_F | TS_F | MARK_F | CKSUM_F | RSS_F)                        \
+	R(sec_ts_mark_cksum_ptype, R_SEC_F | TS_F | MARK_F | CKSUM_F | PTYPE_F)                    \
+	R(sec_ts_mark_cksum_ptype_rss, R_SEC_F | TS_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_96_111                                                               \
+	R(sec_vlan, R_SEC_F | RX_VLAN_F)                                                           \
+	R(sec_vlan_rss, R_SEC_F | RX_VLAN_F | RSS_F)                                               \
+	R(sec_vlan_ptype, R_SEC_F | RX_VLAN_F | PTYPE_F)                                           \
+	R(sec_vlan_ptype_rss, R_SEC_F | RX_VLAN_F | PTYPE_F | RSS_F)                               \
+	R(sec_vlan_cksum, R_SEC_F | RX_VLAN_F | CKSUM_F)                                           \
+	R(sec_vlan_cksum_rss, R_SEC_F | RX_VLAN_F | CKSUM_F | RSS_F)                               \
+	R(sec_vlan_cksum_ptype, R_SEC_F | RX_VLAN_F | CKSUM_F | PTYPE_F)                           \
+	R(sec_vlan_cksum_ptype_rss, R_SEC_F | RX_VLAN_F | CKSUM_F | PTYPE_F | RSS_F)               \
+	R(sec_vlan_mark, R_SEC_F | RX_VLAN_F | MARK_F)                                             \
+	R(sec_vlan_mark_rss, R_SEC_F | RX_VLAN_F | MARK_F | RSS_F)                                 \
+	R(sec_vlan_mark_ptype, R_SEC_F | RX_VLAN_F | MARK_F | PTYPE_F)                             \
+	R(sec_vlan_mark_ptype_rss, R_SEC_F | RX_VLAN_F | MARK_F | PTYPE_F | RSS_F)                 \
+	R(sec_vlan_mark_cksum, R_SEC_F | RX_VLAN_F | MARK_F | CKSUM_F)                             \
+	R(sec_vlan_mark_cksum_rss, R_SEC_F | RX_VLAN_F | MARK_F | CKSUM_F | RSS_F)                 \
+	R(sec_vlan_mark_cksum_ptype, R_SEC_F | RX_VLAN_F | MARK_F | CKSUM_F | PTYPE_F)             \
+	R(sec_vlan_mark_cksum_ptype_rss, R_SEC_F | RX_VLAN_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_112_127                                                              \
+	R(sec_vlan_ts, R_SEC_F | RX_VLAN_F | TS_F)                                                 \
+	R(sec_vlan_ts_rss, R_SEC_F | RX_VLAN_F | TS_F | RSS_F)                                     \
+	R(sec_vlan_ts_ptype, R_SEC_F | RX_VLAN_F | TS_F | PTYPE_F)                                 \
+	R(sec_vlan_ts_ptype_rss, R_SEC_F | RX_VLAN_F | TS_F | PTYPE_F | RSS_F)                     \
+	R(sec_vlan_ts_cksum, R_SEC_F | RX_VLAN_F | TS_F | CKSUM_F)                                 \
+	R(sec_vlan_ts_cksum_rss, R_SEC_F | RX_VLAN_F | TS_F | CKSUM_F | RSS_F)                     \
+	R(sec_vlan_ts_cksum_ptype, R_SEC_F | RX_VLAN_F | TS_F | CKSUM_F | PTYPE_F)                 \
+	R(sec_vlan_ts_cksum_ptype_rss, R_SEC_F | RX_VLAN_F | TS_F | CKSUM_F | PTYPE_F | RSS_F)     \
+	R(sec_vlan_ts_mark, R_SEC_F | RX_VLAN_F | TS_F | MARK_F)                                   \
+	R(sec_vlan_ts_mark_rss, R_SEC_F | RX_VLAN_F | TS_F | MARK_F | RSS_F)                       \
+	R(sec_vlan_ts_mark_ptype, R_SEC_F | RX_VLAN_F | TS_F | MARK_F | PTYPE_F)                   \
+	R(sec_vlan_ts_mark_ptype_rss, R_SEC_F | RX_VLAN_F | TS_F | MARK_F | PTYPE_F | RSS_F)       \
+	R(sec_vlan_ts_mark_cksum, R_SEC_F | RX_VLAN_F | TS_F | MARK_F | CKSUM_F)                   \
+	R(sec_vlan_ts_mark_cksum_rss, R_SEC_F | RX_VLAN_F | TS_F | MARK_F | CKSUM_F | RSS_F)       \
+	R(sec_vlan_ts_mark_cksum_ptype, R_SEC_F | RX_VLAN_F | TS_F | MARK_F | CKSUM_F | PTYPE_F)   \
+	R(sec_vlan_ts_mark_cksum_ptype_rss,                                                        \
+	  R_SEC_F | RX_VLAN_F | TS_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES                                                                      \
+	NIX_RX_FASTPATH_MODES_0_15                                                                 \
+	NIX_RX_FASTPATH_MODES_16_31                                                                \
+	NIX_RX_FASTPATH_MODES_32_47                                                                \
+	NIX_RX_FASTPATH_MODES_48_63                                                                \
+	NIX_RX_FASTPATH_MODES_64_79                                                                \
+	NIX_RX_FASTPATH_MODES_80_95                                                                \
+	NIX_RX_FASTPATH_MODES_96_111                                                               \
+	NIX_RX_FASTPATH_MODES_112_127
+
+#define R(name, flags)                                                                             \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_##name(                              \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_mseg_##name(                         \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_vec_##name(                          \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_vec_mseg_##name(                     \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_reas_##name(                         \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_reas_mseg_##name(                    \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_reas_vec_##name(                     \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_reas_vec_mseg_##name(                \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);
+
+NIX_RX_FASTPATH_MODES
+#undef R
+
+#define NIX_RX_RECV(fn, flags)                                                                     \
+	uint16_t __rte_noinline __rte_hot fn(void *rx_queue, struct rte_mbuf **rx_pkts,            \
+					     uint16_t pkts)                                        \
+	{                                                                                          \
+		RTE_SET_USED(rx_queue);                                                            \
+		RTE_SET_USED(rx_pkts);                                                             \
+		RTE_SET_USED(pkts);                                                                \
+		return 0;                                                                          \
+	}
+
+#define NIX_RX_RECV_MSEG(fn, flags) NIX_RX_RECV(fn, flags | NIX_RX_MULTI_SEG_F)
+
+#define NIX_RX_RECV_VEC(fn, flags)                                                                 \
+	uint16_t __rte_noinline __rte_hot fn(void *rx_queue, struct rte_mbuf **rx_pkts,            \
+					     uint16_t pkts)                                        \
+	{                                                                                          \
+		RTE_SET_USED(rx_queue);                                                            \
+		RTE_SET_USED(rx_pkts);                                                             \
+		RTE_SET_USED(pkts);                                                                \
+		return 0;                                                                          \
+	}
+
+#define NIX_RX_RECV_VEC_MSEG(fn, flags) NIX_RX_RECV_VEC(fn, flags | NIX_RX_MULTI_SEG_F)
+
+uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_all_offload(void *rx_queue,
+								  struct rte_mbuf **rx_pkts,
+								  uint16_t pkts);
+
+uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_vec_all_offload(void *rx_queue,
+								      struct rte_mbuf **rx_pkts,
+								      uint16_t pkts);
+
+uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_all_offload_tst(void *rx_queue,
+								      struct rte_mbuf **rx_pkts,
+								      uint16_t pkts);
+
+uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_vec_all_offload_tst(void *rx_queue,
+									  struct rte_mbuf **rx_pkts,
+									  uint16_t pkts);
+
 #endif /* __CN20K_RX_H__ */
diff --git a/drivers/net/cnxk/cn20k_rx_select.c b/drivers/net/cnxk/cn20k_rx_select.c
new file mode 100644
index 0000000000..82e06a62ef
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_rx_select.c
@@ -0,0 +1,162 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_ethdev.h"
+#include "cn20k_rx.h"
+
+static __rte_used void
+pick_rx_func(struct rte_eth_dev *eth_dev, const eth_rx_burst_t rx_burst[NIX_RX_OFFLOAD_MAX])
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	/* [VLAN] [TSP] [MARK] [CKSUM] [PTYPE] [RSS] */
+	eth_dev->rx_pkt_burst = rx_burst[dev->rx_offload_flags & (NIX_RX_OFFLOAD_MAX - 1)];
+
+	if (eth_dev->data->dev_started)
+		rte_eth_fp_ops[eth_dev->data->port_id].rx_pkt_burst = eth_dev->rx_pkt_burst;
+
+	rte_atomic_thread_fence(rte_memory_order_release);
+}
+
+static uint16_t __rte_noinline __rte_hot __rte_unused
+cn20k_nix_flush_rx(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)
+{
+	RTE_SET_USED(rx_queue);
+	RTE_SET_USED(rx_pkts);
+	RTE_SET_USED(pkts);
+	return 0;
+}
+
+#if defined(RTE_ARCH_ARM64)
+static void
+cn20k_eth_set_rx_tmplt_func(struct rte_eth_dev *eth_dev)
+{
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	const eth_rx_burst_t nix_eth_rx_burst[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_##name,
+
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_burst_mseg[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_mseg_##name,
+
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_burst_reas[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_reas_##name,
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_burst_mseg_reas[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_reas_mseg_##name,
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_vec_burst[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_vec_##name,
+
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_vec_burst_mseg[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_vec_mseg_##name,
+
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_vec_burst_reas[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_reas_vec_##name,
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_vec_burst_mseg_reas[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_reas_vec_mseg_##name,
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	/* Copy multi seg version with security for tear down sequence */
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		dev->rx_pkt_burst_no_offload = cn20k_nix_flush_rx;
+
+	if (dev->scalar_ena) {
+		if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER) {
+			if (dev->rx_offload_flags & NIX_RX_REAS_F)
+				return pick_rx_func(eth_dev, nix_eth_rx_burst_mseg_reas);
+			else
+				return pick_rx_func(eth_dev, nix_eth_rx_burst_mseg);
+		}
+		if (dev->rx_offload_flags & NIX_RX_REAS_F)
+			return pick_rx_func(eth_dev, nix_eth_rx_burst_reas);
+		else
+			return pick_rx_func(eth_dev, nix_eth_rx_burst);
+	}
+
+	if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER) {
+		if (dev->rx_offload_flags & NIX_RX_REAS_F)
+			return pick_rx_func(eth_dev, nix_eth_rx_vec_burst_mseg_reas);
+		else
+			return pick_rx_func(eth_dev, nix_eth_rx_vec_burst_mseg);
+	}
+
+	if (dev->rx_offload_flags & NIX_RX_REAS_F)
+		return pick_rx_func(eth_dev, nix_eth_rx_vec_burst_reas);
+	else
+		return pick_rx_func(eth_dev, nix_eth_rx_vec_burst);
+#else
+	RTE_SET_USED(eth_dev);
+#endif
+}
+
+static void
+cn20k_eth_set_rx_blk_func(struct rte_eth_dev *eth_dev)
+{
+#if defined(CNXK_DIS_TMPLT_FUNC)
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	/* Copy multi seg version with security for tear down sequence */
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		dev->rx_pkt_burst_no_offload = cn20k_nix_flush_rx;
+
+	if (dev->scalar_ena) {
+		eth_dev->rx_pkt_burst = cn20k_nix_recv_pkts_all_offload;
+		if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
+			eth_dev->rx_pkt_burst = cn20k_nix_recv_pkts_all_offload_tst;
+	} else {
+		eth_dev->rx_pkt_burst = cn20k_nix_recv_pkts_vec_all_offload;
+		if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
+			eth_dev->rx_pkt_burst = cn20k_nix_recv_pkts_vec_all_offload_tst;
+	}
+
+	if (eth_dev->data->dev_started)
+		rte_eth_fp_ops[eth_dev->data->port_id].rx_pkt_burst = eth_dev->rx_pkt_burst;
+#else
+	RTE_SET_USED(eth_dev);
+#endif
+}
+#endif
+
+void
+cn20k_eth_set_rx_function(struct rte_eth_dev *eth_dev)
+{
+#if defined(RTE_ARCH_ARM64)
+	cn20k_eth_set_rx_blk_func(eth_dev);
+	cn20k_eth_set_rx_tmplt_func(eth_dev);
+
+	rte_atomic_thread_fence(rte_memory_order_release);
+#else
+	RTE_SET_USED(eth_dev);
+#endif
+}
diff --git a/drivers/net/cnxk/meson.build b/drivers/net/cnxk/meson.build
index cf2ce09f77..f41238be9c 100644
--- a/drivers/net/cnxk/meson.build
+++ b/drivers/net/cnxk/meson.build
@@ -236,7 +236,51 @@ if soc_type == 'cn20k' or soc_type == 'all'
 # CN20K
 sources += files(
         'cn20k_ethdev.c',
+        'cn20k_rx_select.c',
 )
+
+if host_machine.cpu_family().startswith('aarch') and not disable_template
+sources += files(
+        'rx/cn20k/rx_0_15.c',
+        'rx/cn20k/rx_16_31.c',
+        'rx/cn20k/rx_32_47.c',
+        'rx/cn20k/rx_48_63.c',
+        'rx/cn20k/rx_64_79.c',
+        'rx/cn20k/rx_80_95.c',
+        'rx/cn20k/rx_96_111.c',
+        'rx/cn20k/rx_112_127.c',
+        'rx/cn20k/rx_0_15_mseg.c',
+        'rx/cn20k/rx_16_31_mseg.c',
+        'rx/cn20k/rx_32_47_mseg.c',
+        'rx/cn20k/rx_48_63_mseg.c',
+        'rx/cn20k/rx_64_79_mseg.c',
+        'rx/cn20k/rx_80_95_mseg.c',
+        'rx/cn20k/rx_96_111_mseg.c',
+        'rx/cn20k/rx_112_127_mseg.c',
+        'rx/cn20k/rx_0_15_vec.c',
+        'rx/cn20k/rx_16_31_vec.c',
+        'rx/cn20k/rx_32_47_vec.c',
+        'rx/cn20k/rx_48_63_vec.c',
+        'rx/cn20k/rx_64_79_vec.c',
+        'rx/cn20k/rx_80_95_vec.c',
+        'rx/cn20k/rx_96_111_vec.c',
+        'rx/cn20k/rx_112_127_vec.c',
+        'rx/cn20k/rx_0_15_vec_mseg.c',
+        'rx/cn20k/rx_16_31_vec_mseg.c',
+        'rx/cn20k/rx_32_47_vec_mseg.c',
+        'rx/cn20k/rx_48_63_vec_mseg.c',
+        'rx/cn20k/rx_64_79_vec_mseg.c',
+        'rx/cn20k/rx_80_95_vec_mseg.c',
+        'rx/cn20k/rx_96_111_vec_mseg.c',
+        'rx/cn20k/rx_112_127_vec_mseg.c',
+        'rx/cn20k/rx_all_offload.c',
+)
+
+else
+sources += files(
+        'rx/cn20k/rx_all_offload.c',
+)
+endif
 endif
 
 
diff --git a/drivers/net/cnxk/rx/cn20k/rx_0_15.c b/drivers/net/cnxk/rx/cn20k/rx_0_15.c
new file mode 100644
index 0000000000..d248eb8c7e
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_0_15.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_0_15
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c
new file mode 100644
index 0000000000..b159632921
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_0_15
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c b/drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c
new file mode 100644
index 0000000000..76846bfea8
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_0_15
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c
new file mode 100644
index 0000000000..73533631ad
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_0_15
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_112_127.c b/drivers/net/cnxk/rx/cn20k/rx_112_127.c
new file mode 100644
index 0000000000..b7c53def26
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_112_127.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_112_127
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c
new file mode 100644
index 0000000000..ed3a95479c
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_112_127
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c b/drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c
new file mode 100644
index 0000000000..4bbba8bdbe
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_112_127
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c
new file mode 100644
index 0000000000..3a2b67436f
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_112_127
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_16_31.c b/drivers/net/cnxk/rx/cn20k/rx_16_31.c
new file mode 100644
index 0000000000..cd60faaefd
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_16_31.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_16_31
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c
new file mode 100644
index 0000000000..2f2d527def
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_16_31
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c b/drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c
new file mode 100644
index 0000000000..595ec8689e
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_16_31
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c
new file mode 100644
index 0000000000..7cf1c65f4a
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_16_31
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_32_47.c b/drivers/net/cnxk/rx/cn20k/rx_32_47.c
new file mode 100644
index 0000000000..e3778448ca
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_32_47.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_32_47
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c
new file mode 100644
index 0000000000..2203247aa4
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_32_47
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c b/drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c
new file mode 100644
index 0000000000..7aae8225e7
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_32_47
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c
new file mode 100644
index 0000000000..1a221ae095
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_32_47
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_48_63.c b/drivers/net/cnxk/rx/cn20k/rx_48_63.c
new file mode 100644
index 0000000000..c5fedd06cd
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_48_63.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_48_63
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c
new file mode 100644
index 0000000000..6c2d8ac331
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_48_63
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c b/drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c
new file mode 100644
index 0000000000..20a937e453
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_48_63
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c
new file mode 100644
index 0000000000..929d807c8d
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_48_63
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_64_79.c b/drivers/net/cnxk/rx/cn20k/rx_64_79.c
new file mode 100644
index 0000000000..30beebc326
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_64_79.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_64_79
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c
new file mode 100644
index 0000000000..30ece8f8ee
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_64_79
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c b/drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c
new file mode 100644
index 0000000000..1f533c01f6
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_64_79
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c
new file mode 100644
index 0000000000..ed3c012798
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_64_79
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_80_95.c b/drivers/net/cnxk/rx/cn20k/rx_80_95.c
new file mode 100644
index 0000000000..a13ecb244f
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_80_95.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_80_95
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c
new file mode 100644
index 0000000000..c6438120d8
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_80_95
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c b/drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c
new file mode 100644
index 0000000000..94c685ba7c
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_80_95
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c
new file mode 100644
index 0000000000..370376da7d
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_80_95
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_96_111.c b/drivers/net/cnxk/rx/cn20k/rx_96_111.c
new file mode 100644
index 0000000000..15b5375e3c
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_96_111.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_96_111
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c
new file mode 100644
index 0000000000..561b48c789
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_96_111
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c b/drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c
new file mode 100644
index 0000000000..17031f7b6f
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_96_111
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c
new file mode 100644
index 0000000000..9dd1f3f39a
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_96_111
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_all_offload.c b/drivers/net/cnxk/rx/cn20k/rx_all_offload.c
new file mode 100644
index 0000000000..1d032b3b17
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_all_offload.c
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if defined(CNXK_DIS_TMPLT_FUNC)
+
+uint16_t __rte_noinline __rte_hot
+cn20k_nix_recv_pkts_all_offload(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)
+{
+	return cn20k_nix_recv_pkts(rx_queue, rx_pkts, pkts,
+				   NIX_RX_OFFLOAD_RSS_F | NIX_RX_OFFLOAD_PTYPE_F |
+					   NIX_RX_OFFLOAD_CHECKSUM_F |
+					   NIX_RX_OFFLOAD_MARK_UPDATE_F | NIX_RX_OFFLOAD_TSTAMP_F |
+					   NIX_RX_OFFLOAD_VLAN_STRIP_F | NIX_RX_OFFLOAD_SECURITY_F |
+					   NIX_RX_MULTI_SEG_F | NIX_RX_REAS_F);
+}
+
+uint16_t __rte_noinline __rte_hot
+cn20k_nix_recv_pkts_vec_all_offload(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)
+{
+	return cn20k_nix_recv_pkts_vector(
+		rx_queue, rx_pkts, pkts,
+		NIX_RX_OFFLOAD_RSS_F | NIX_RX_OFFLOAD_PTYPE_F | NIX_RX_OFFLOAD_CHECKSUM_F |
+			NIX_RX_OFFLOAD_MARK_UPDATE_F | NIX_RX_OFFLOAD_TSTAMP_F |
+			NIX_RX_OFFLOAD_VLAN_STRIP_F | NIX_RX_OFFLOAD_SECURITY_F |
+			NIX_RX_MULTI_SEG_F | NIX_RX_REAS_F,
+		NULL, NULL, 0, 0);
+}
+
+uint16_t __rte_noinline __rte_hot
+cn20k_nix_recv_pkts_all_offload_tst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)
+{
+	return cn20k_nix_recv_pkts(
+		rx_queue, rx_pkts, pkts,
+		NIX_RX_OFFLOAD_RSS_F | NIX_RX_OFFLOAD_PTYPE_F | NIX_RX_OFFLOAD_CHECKSUM_F |
+			NIX_RX_OFFLOAD_MARK_UPDATE_F | NIX_RX_OFFLOAD_VLAN_STRIP_F |
+			NIX_RX_OFFLOAD_SECURITY_F | NIX_RX_MULTI_SEG_F | NIX_RX_REAS_F);
+}
+
+uint16_t __rte_noinline __rte_hot
+cn20k_nix_recv_pkts_vec_all_offload_tst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)
+{
+	return cn20k_nix_recv_pkts_vector(
+		rx_queue, rx_pkts, pkts,
+		NIX_RX_OFFLOAD_RSS_F | NIX_RX_OFFLOAD_PTYPE_F | NIX_RX_OFFLOAD_CHECKSUM_F |
+			NIX_RX_OFFLOAD_MARK_UPDATE_F | NIX_RX_OFFLOAD_VLAN_STRIP_F |
+			NIX_RX_OFFLOAD_SECURITY_F | NIX_RX_MULTI_SEG_F | NIX_RX_REAS_F,
+		NULL, NULL, 0, 0);
+}
+
+#endif
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 12/18] net/cnxk: support Tx function select for cn20k
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
                     ` (10 preceding siblings ...)
  2024-09-26 16:01   ` [PATCH v2 11/18] net/cnxk: support Rx function select for cn20k Nithin Dabilpuram
@ 2024-09-26 16:01   ` Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 13/18] net/cnxk: support Rx burst scalar " Nithin Dabilpuram
                     ` (6 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-26 16:01 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

Add support to select Tx function based on offload flags
for cn20k.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
---
 drivers/net/cnxk/cn20k_ethdev.c               |  80 ++++++
 drivers/net/cnxk/cn20k_ethdev.h               |   1 +
 drivers/net/cnxk/cn20k_tx.h                   | 237 ++++++++++++++++++
 drivers/net/cnxk/cn20k_tx_select.c            | 122 +++++++++
 drivers/net/cnxk/meson.build                  |  37 +++
 drivers/net/cnxk/tx/cn20k/tx_0_15.c           |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c      |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c       |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c  |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_112_127.c        |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c   |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c    |  18 ++
 .../net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c   |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_16_31.c          |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c     |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c      |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_32_47.c          |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c     |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c      |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_48_63.c          |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c     |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c      |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_64_79.c          |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c     |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c      |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_80_95.c          |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c     |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c      |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_96_111.c         |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c    |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c     |  18 ++
 .../net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c    |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_all_offload.c    |  39 +++
 38 files changed, 1092 insertions(+)
 create mode 100644 drivers/net/cnxk/cn20k_tx_select.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_all_offload.c

diff --git a/drivers/net/cnxk/cn20k_ethdev.c b/drivers/net/cnxk/cn20k_ethdev.c
index d1cb3a52bf..4b2f04ba31 100644
--- a/drivers/net/cnxk/cn20k_ethdev.c
+++ b/drivers/net/cnxk/cn20k_ethdev.c
@@ -40,6 +40,78 @@ nix_rx_offload_flags(struct rte_eth_dev *eth_dev)
 	return flags;
 }
 
+static uint16_t
+nix_tx_offload_flags(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	uint64_t conf = dev->tx_offloads;
+	struct roc_nix *nix = &dev->nix;
+	uint16_t flags = 0;
+
+	/* Fastpath is dependent on these enums */
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_TCP_CKSUM != (1ULL << 52));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_SCTP_CKSUM != (2ULL << 52));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_UDP_CKSUM != (3ULL << 52));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_IP_CKSUM != (1ULL << 54));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_IPV4 != (1ULL << 55));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_OUTER_IP_CKSUM != (1ULL << 58));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_OUTER_IPV4 != (1ULL << 59));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_OUTER_IPV6 != (1ULL << 60));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_OUTER_UDP_CKSUM != (1ULL << 41));
+	RTE_BUILD_BUG_ON(RTE_MBUF_L2_LEN_BITS != 7);
+	RTE_BUILD_BUG_ON(RTE_MBUF_L3_LEN_BITS != 9);
+	RTE_BUILD_BUG_ON(RTE_MBUF_OUTL2_LEN_BITS != 7);
+	RTE_BUILD_BUG_ON(RTE_MBUF_OUTL3_LEN_BITS != 9);
+	RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, data_off) !=
+			 offsetof(struct rte_mbuf, buf_addr) + 16);
+	RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) !=
+			 offsetof(struct rte_mbuf, buf_addr) + 24);
+	RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, pkt_len) !=
+			 offsetof(struct rte_mbuf, ol_flags) + 12);
+	RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, tx_offload) !=
+			 offsetof(struct rte_mbuf, pool) + 2 * sizeof(void *));
+
+	if (conf & RTE_ETH_TX_OFFLOAD_VLAN_INSERT || conf & RTE_ETH_TX_OFFLOAD_QINQ_INSERT)
+		flags |= NIX_TX_OFFLOAD_VLAN_QINQ_F;
+
+	if (conf & RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM || conf & RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM)
+		flags |= NIX_TX_OFFLOAD_OL3_OL4_CSUM_F;
+
+	if (conf & RTE_ETH_TX_OFFLOAD_IPV4_CKSUM || conf & RTE_ETH_TX_OFFLOAD_TCP_CKSUM ||
+	    conf & RTE_ETH_TX_OFFLOAD_UDP_CKSUM || conf & RTE_ETH_TX_OFFLOAD_SCTP_CKSUM)
+		flags |= NIX_TX_OFFLOAD_L3_L4_CSUM_F;
+
+	if (!(conf & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE))
+		flags |= NIX_TX_OFFLOAD_MBUF_NOFF_F;
+
+	if (conf & RTE_ETH_TX_OFFLOAD_MULTI_SEGS)
+		flags |= NIX_TX_MULTI_SEG_F;
+
+	/* Enable Inner checksum for TSO */
+	if (conf & RTE_ETH_TX_OFFLOAD_TCP_TSO)
+		flags |= (NIX_TX_OFFLOAD_TSO_F | NIX_TX_OFFLOAD_L3_L4_CSUM_F);
+
+	/* Enable Inner and Outer checksum for Tunnel TSO */
+	if (conf & (RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO | RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO |
+		    RTE_ETH_TX_OFFLOAD_GRE_TNL_TSO))
+		flags |= (NIX_TX_OFFLOAD_TSO_F | NIX_TX_OFFLOAD_OL3_OL4_CSUM_F |
+			  NIX_TX_OFFLOAD_L3_L4_CSUM_F);
+
+	if ((dev->rx_offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP))
+		flags |= NIX_TX_OFFLOAD_TSTAMP_F;
+
+	if (conf & RTE_ETH_TX_OFFLOAD_SECURITY)
+		flags |= NIX_TX_OFFLOAD_SECURITY_F;
+
+	if (dev->tx_mark)
+		flags |= NIX_TX_OFFLOAD_VLAN_QINQ_F;
+
+	if (nix->tx_compl_ena)
+		flags |= NIX_TX_OFFLOAD_MBUF_NOFF_F;
+
+	return flags;
+}
+
 static int
 cn20k_nix_ptypes_set(struct rte_eth_dev *eth_dev, uint32_t ptype_mask)
 {
@@ -226,6 +298,7 @@ cn20k_nix_rx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid, uint16_t nb_
 
 		/* Update offload flags */
 		dev->rx_offload_flags = nix_rx_offload_flags(eth_dev);
+		dev->tx_offload_flags = nix_tx_offload_flags(eth_dev);
 	}
 
 	rq = &dev->rqs[qid];
@@ -286,6 +359,8 @@ cn20k_nix_configure(struct rte_eth_dev *eth_dev)
 
 	/* Update offload flags */
 	dev->rx_offload_flags = nix_rx_offload_flags(eth_dev);
+	dev->tx_offload_flags = nix_tx_offload_flags(eth_dev);
+
 	/* reset reassembly dynfield/flag offset */
 	dev->reass_dynfield_off = -1;
 	dev->reass_dynflag_bit = -1;
@@ -316,6 +391,7 @@ cn20k_nix_timesync_enable(struct rte_eth_dev *eth_dev)
 	 * in rx[tx]_offloads.
 	 */
 	cn20k_eth_set_rx_function(eth_dev);
+	cn20k_eth_set_tx_function(eth_dev);
 	return 0;
 }
 
@@ -339,6 +415,7 @@ cn20k_nix_timesync_disable(struct rte_eth_dev *eth_dev)
 	 * in rx[tx]_offloads.
 	 */
 	cn20k_eth_set_rx_function(eth_dev);
+	cn20k_eth_set_tx_function(eth_dev);
 	return 0;
 }
 
@@ -378,10 +455,12 @@ cn20k_nix_dev_start(struct rte_eth_dev *eth_dev)
 	 * in rx[tx]_offloads.
 	 */
 	dev->rx_offload_flags |= nix_rx_offload_flags(eth_dev);
+	dev->tx_offload_flags |= nix_tx_offload_flags(eth_dev);
 	/* Set flags for Rx Inject feature */
 	if (roc_idev_nix_rx_inject_get(nix->port_id))
 		dev->rx_offload_flags |= NIX_RX_SEC_REASSEMBLY_F;
 
+	cn20k_eth_set_tx_function(eth_dev);
 	cn20k_eth_set_rx_function(eth_dev);
 	return 0;
 }
@@ -581,6 +660,7 @@ cn20k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
 		/* Setup callbacks for secondary process */
+		cn20k_eth_set_tx_function(eth_dev);
 		cn20k_eth_set_rx_function(eth_dev);
 		return 0;
 	}
diff --git a/drivers/net/cnxk/cn20k_ethdev.h b/drivers/net/cnxk/cn20k_ethdev.h
index 2049ee7fa4..cb46044d60 100644
--- a/drivers/net/cnxk/cn20k_ethdev.h
+++ b/drivers/net/cnxk/cn20k_ethdev.h
@@ -10,5 +10,6 @@
 
 /* Rx and Tx routines */
 void cn20k_eth_set_rx_function(struct rte_eth_dev *eth_dev);
+void cn20k_eth_set_tx_function(struct rte_eth_dev *eth_dev);
 
 #endif /* __CN20K_ETHDEV_H__ */
diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
index a00c9d5776..9fd925ac34 100644
--- a/drivers/net/cnxk/cn20k_tx.h
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -32,4 +32,241 @@
 #define NIX_TX_NEED_EXT_HDR                                                                        \
 	(NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSTAMP_F | NIX_TX_OFFLOAD_TSO_F)
 
+#define L3L4CSUM_F   NIX_TX_OFFLOAD_L3_L4_CSUM_F
+#define OL3OL4CSUM_F NIX_TX_OFFLOAD_OL3_OL4_CSUM_F
+#define VLAN_F	     NIX_TX_OFFLOAD_VLAN_QINQ_F
+#define NOFF_F	     NIX_TX_OFFLOAD_MBUF_NOFF_F
+#define TSO_F	     NIX_TX_OFFLOAD_TSO_F
+#define TSP_F	     NIX_TX_OFFLOAD_TSTAMP_F
+#define T_SEC_F	     NIX_TX_OFFLOAD_SECURITY_F
+
+/* [T_SEC_F] [TSP] [TSO] [NOFF] [VLAN] [OL3OL4CSUM] [L3L4CSUM] */
+#define NIX_TX_FASTPATH_MODES_0_15                                                                 \
+	T(no_offload, 6, NIX_TX_OFFLOAD_NONE)                                                      \
+	T(l3l4csum, 6, L3L4CSUM_F)                                                                 \
+	T(ol3ol4csum, 6, OL3OL4CSUM_F)                                                             \
+	T(ol3ol4csum_l3l4csum, 6, OL3OL4CSUM_F | L3L4CSUM_F)                                       \
+	T(vlan, 6, VLAN_F)                                                                         \
+	T(vlan_l3l4csum, 6, VLAN_F | L3L4CSUM_F)                                                   \
+	T(vlan_ol3ol4csum, 6, VLAN_F | OL3OL4CSUM_F)                                               \
+	T(vlan_ol3ol4csum_l3l4csum, 6, VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)                         \
+	T(noff, 6, NOFF_F)                                                                         \
+	T(noff_l3l4csum, 6, NOFF_F | L3L4CSUM_F)                                                   \
+	T(noff_ol3ol4csum, 6, NOFF_F | OL3OL4CSUM_F)                                               \
+	T(noff_ol3ol4csum_l3l4csum, 6, NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)                         \
+	T(noff_vlan, 6, NOFF_F | VLAN_F)                                                           \
+	T(noff_vlan_l3l4csum, 6, NOFF_F | VLAN_F | L3L4CSUM_F)                                     \
+	T(noff_vlan_ol3ol4csum, 6, NOFF_F | VLAN_F | OL3OL4CSUM_F)                                 \
+	T(noff_vlan_ol3ol4csum_l3l4csum, 6, NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_16_31                                                                \
+	T(tso, 6, TSO_F)                                                                           \
+	T(tso_l3l4csum, 6, TSO_F | L3L4CSUM_F)                                                     \
+	T(tso_ol3ol4csum, 6, TSO_F | OL3OL4CSUM_F)                                                 \
+	T(tso_ol3ol4csum_l3l4csum, 6, TSO_F | OL3OL4CSUM_F | L3L4CSUM_F)                           \
+	T(tso_vlan, 6, TSO_F | VLAN_F)                                                             \
+	T(tso_vlan_l3l4csum, 6, TSO_F | VLAN_F | L3L4CSUM_F)                                       \
+	T(tso_vlan_ol3ol4csum, 6, TSO_F | VLAN_F | OL3OL4CSUM_F)                                   \
+	T(tso_vlan_ol3ol4csum_l3l4csum, 6, TSO_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)             \
+	T(tso_noff, 6, TSO_F | NOFF_F)                                                             \
+	T(tso_noff_l3l4csum, 6, TSO_F | NOFF_F | L3L4CSUM_F)                                       \
+	T(tso_noff_ol3ol4csum, 6, TSO_F | NOFF_F | OL3OL4CSUM_F)                                   \
+	T(tso_noff_ol3ol4csum_l3l4csum, 6, TSO_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)             \
+	T(tso_noff_vlan, 6, TSO_F | NOFF_F | VLAN_F)                                               \
+	T(tso_noff_vlan_l3l4csum, 6, TSO_F | NOFF_F | VLAN_F | L3L4CSUM_F)                         \
+	T(tso_noff_vlan_ol3ol4csum, 6, TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)                     \
+	T(tso_noff_vlan_ol3ol4csum_l3l4csum, 6, TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_32_47                                                                \
+	T(ts, 8, TSP_F)                                                                            \
+	T(ts_l3l4csum, 8, TSP_F | L3L4CSUM_F)                                                      \
+	T(ts_ol3ol4csum, 8, TSP_F | OL3OL4CSUM_F)                                                  \
+	T(ts_ol3ol4csum_l3l4csum, 8, TSP_F | OL3OL4CSUM_F | L3L4CSUM_F)                            \
+	T(ts_vlan, 8, TSP_F | VLAN_F)                                                              \
+	T(ts_vlan_l3l4csum, 8, TSP_F | VLAN_F | L3L4CSUM_F)                                        \
+	T(ts_vlan_ol3ol4csum, 8, TSP_F | VLAN_F | OL3OL4CSUM_F)                                    \
+	T(ts_vlan_ol3ol4csum_l3l4csum, 8, TSP_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)              \
+	T(ts_noff, 8, TSP_F | NOFF_F)                                                              \
+	T(ts_noff_l3l4csum, 8, TSP_F | NOFF_F | L3L4CSUM_F)                                        \
+	T(ts_noff_ol3ol4csum, 8, TSP_F | NOFF_F | OL3OL4CSUM_F)                                    \
+	T(ts_noff_ol3ol4csum_l3l4csum, 8, TSP_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)              \
+	T(ts_noff_vlan, 8, TSP_F | NOFF_F | VLAN_F)                                                \
+	T(ts_noff_vlan_l3l4csum, 8, TSP_F | NOFF_F | VLAN_F | L3L4CSUM_F)                          \
+	T(ts_noff_vlan_ol3ol4csum, 8, TSP_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)                      \
+	T(ts_noff_vlan_ol3ol4csum_l3l4csum, 8, TSP_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_48_63                                                                \
+	T(ts_tso, 8, TSP_F | TSO_F)                                                                \
+	T(ts_tso_l3l4csum, 8, TSP_F | TSO_F | L3L4CSUM_F)                                          \
+	T(ts_tso_ol3ol4csum, 8, TSP_F | TSO_F | OL3OL4CSUM_F)                                      \
+	T(ts_tso_ol3ol4csum_l3l4csum, 8, TSP_F | TSO_F | OL3OL4CSUM_F | L3L4CSUM_F)                \
+	T(ts_tso_vlan, 8, TSP_F | TSO_F | VLAN_F)                                                  \
+	T(ts_tso_vlan_l3l4csum, 8, TSP_F | TSO_F | VLAN_F | L3L4CSUM_F)                            \
+	T(ts_tso_vlan_ol3ol4csum, 8, TSP_F | TSO_F | VLAN_F | OL3OL4CSUM_F)                        \
+	T(ts_tso_vlan_ol3ol4csum_l3l4csum, 8, TSP_F | TSO_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)  \
+	T(ts_tso_noff, 8, TSP_F | TSO_F | NOFF_F)                                                  \
+	T(ts_tso_noff_l3l4csum, 8, TSP_F | TSO_F | NOFF_F | L3L4CSUM_F)                            \
+	T(ts_tso_noff_ol3ol4csum, 8, TSP_F | TSO_F | NOFF_F | OL3OL4CSUM_F)                        \
+	T(ts_tso_noff_ol3ol4csum_l3l4csum, 8, TSP_F | TSO_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)  \
+	T(ts_tso_noff_vlan, 8, TSP_F | TSO_F | NOFF_F | VLAN_F)                                    \
+	T(ts_tso_noff_vlan_l3l4csum, 8, TSP_F | TSO_F | NOFF_F | VLAN_F | L3L4CSUM_F)              \
+	T(ts_tso_noff_vlan_ol3ol4csum, 8, TSP_F | TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)          \
+	T(ts_tso_noff_vlan_ol3ol4csum_l3l4csum, 8,                                                 \
+	  TSP_F | TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_64_79                                                                \
+	T(sec, 6, T_SEC_F)                                                                         \
+	T(sec_l3l4csum, 6, T_SEC_F | L3L4CSUM_F)                                                   \
+	T(sec_ol3ol4csum, 6, T_SEC_F | OL3OL4CSUM_F)                                               \
+	T(sec_ol3ol4csum_l3l4csum, 6, T_SEC_F | OL3OL4CSUM_F | L3L4CSUM_F)                         \
+	T(sec_vlan, 6, T_SEC_F | VLAN_F)                                                           \
+	T(sec_vlan_l3l4csum, 6, T_SEC_F | VLAN_F | L3L4CSUM_F)                                     \
+	T(sec_vlan_ol3ol4csum, 6, T_SEC_F | VLAN_F | OL3OL4CSUM_F)                                 \
+	T(sec_vlan_ol3ol4csum_l3l4csum, 6, T_SEC_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)           \
+	T(sec_noff, 6, T_SEC_F | NOFF_F)                                                           \
+	T(sec_noff_l3l4csum, 6, T_SEC_F | NOFF_F | L3L4CSUM_F)                                     \
+	T(sec_noff_ol3ol4csum, 6, T_SEC_F | NOFF_F | OL3OL4CSUM_F)                                 \
+	T(sec_noff_ol3ol4csum_l3l4csum, 6, T_SEC_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)           \
+	T(sec_noff_vlan, 6, T_SEC_F | NOFF_F | VLAN_F)                                             \
+	T(sec_noff_vlan_l3l4csum, 6, T_SEC_F | NOFF_F | VLAN_F | L3L4CSUM_F)                       \
+	T(sec_noff_vlan_ol3ol4csum, 6, T_SEC_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)                   \
+	T(sec_noff_vlan_ol3ol4csum_l3l4csum, 6,                                                    \
+	  T_SEC_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_80_95                                                                \
+	T(sec_tso, 6, T_SEC_F | TSO_F)                                                             \
+	T(sec_tso_l3l4csum, 6, T_SEC_F | TSO_F | L3L4CSUM_F)                                       \
+	T(sec_tso_ol3ol4csum, 6, T_SEC_F | TSO_F | OL3OL4CSUM_F)                                   \
+	T(sec_tso_ol3ol4csum_l3l4csum, 6, T_SEC_F | TSO_F | OL3OL4CSUM_F | L3L4CSUM_F)             \
+	T(sec_tso_vlan, 6, T_SEC_F | TSO_F | VLAN_F)                                               \
+	T(sec_tso_vlan_l3l4csum, 6, T_SEC_F | TSO_F | VLAN_F | L3L4CSUM_F)                         \
+	T(sec_tso_vlan_ol3ol4csum, 6, T_SEC_F | TSO_F | VLAN_F | OL3OL4CSUM_F)                     \
+	T(sec_tso_vlan_ol3ol4csum_l3l4csum, 6,                                                     \
+	  T_SEC_F | TSO_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)                                    \
+	T(sec_tso_noff, 6, T_SEC_F | TSO_F | NOFF_F)                                               \
+	T(sec_tso_noff_l3l4csum, 6, T_SEC_F | TSO_F | NOFF_F | L3L4CSUM_F)                         \
+	T(sec_tso_noff_ol3ol4csum, 6, T_SEC_F | TSO_F | NOFF_F | OL3OL4CSUM_F)                     \
+	T(sec_tso_noff_ol3ol4csum_l3l4csum, 6,                                                     \
+	  T_SEC_F | TSO_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)                                    \
+	T(sec_tso_noff_vlan, 6, T_SEC_F | TSO_F | NOFF_F | VLAN_F)                                 \
+	T(sec_tso_noff_vlan_l3l4csum, 6, T_SEC_F | TSO_F | NOFF_F | VLAN_F | L3L4CSUM_F)           \
+	T(sec_tso_noff_vlan_ol3ol4csum, 6, T_SEC_F | TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)       \
+	T(sec_tso_noff_vlan_ol3ol4csum_l3l4csum, 6,                                                \
+	  T_SEC_F | TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_96_111                                                               \
+	T(sec_ts, 8, T_SEC_F | TSP_F)                                                              \
+	T(sec_ts_l3l4csum, 8, T_SEC_F | TSP_F | L3L4CSUM_F)                                        \
+	T(sec_ts_ol3ol4csum, 8, T_SEC_F | TSP_F | OL3OL4CSUM_F)                                    \
+	T(sec_ts_ol3ol4csum_l3l4csum, 8, T_SEC_F | TSP_F | OL3OL4CSUM_F | L3L4CSUM_F)              \
+	T(sec_ts_vlan, 8, T_SEC_F | TSP_F | VLAN_F)                                                \
+	T(sec_ts_vlan_l3l4csum, 8, T_SEC_F | TSP_F | VLAN_F | L3L4CSUM_F)                          \
+	T(sec_ts_vlan_ol3ol4csum, 8, T_SEC_F | TSP_F | VLAN_F | OL3OL4CSUM_F)                      \
+	T(sec_ts_vlan_ol3ol4csum_l3l4csum, 8,                                                      \
+	  T_SEC_F | TSP_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)                                    \
+	T(sec_ts_noff, 8, T_SEC_F | TSP_F | NOFF_F)                                                \
+	T(sec_ts_noff_l3l4csum, 8, T_SEC_F | TSP_F | NOFF_F | L3L4CSUM_F)                          \
+	T(sec_ts_noff_ol3ol4csum, 8, T_SEC_F | TSP_F | NOFF_F | OL3OL4CSUM_F)                      \
+	T(sec_ts_noff_ol3ol4csum_l3l4csum, 8,                                                      \
+	  T_SEC_F | TSP_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)                                    \
+	T(sec_ts_noff_vlan, 8, T_SEC_F | TSP_F | NOFF_F | VLAN_F)                                  \
+	T(sec_ts_noff_vlan_l3l4csum, 8, T_SEC_F | TSP_F | NOFF_F | VLAN_F | L3L4CSUM_F)            \
+	T(sec_ts_noff_vlan_ol3ol4csum, 8, T_SEC_F | TSP_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)        \
+	T(sec_ts_noff_vlan_ol3ol4csum_l3l4csum, 8,                                                 \
+	  T_SEC_F | TSP_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_112_127                                                              \
+	T(sec_ts_tso, 8, T_SEC_F | TSP_F | TSO_F)                                                  \
+	T(sec_ts_tso_l3l4csum, 8, T_SEC_F | TSP_F | TSO_F | L3L4CSUM_F)                            \
+	T(sec_ts_tso_ol3ol4csum, 8, T_SEC_F | TSP_F | TSO_F | OL3OL4CSUM_F)                        \
+	T(sec_ts_tso_ol3ol4csum_l3l4csum, 8, T_SEC_F | TSP_F | TSO_F | OL3OL4CSUM_F | L3L4CSUM_F)  \
+	T(sec_ts_tso_vlan, 8, T_SEC_F | TSP_F | TSO_F | VLAN_F)                                    \
+	T(sec_ts_tso_vlan_l3l4csum, 8, T_SEC_F | TSP_F | TSO_F | VLAN_F | L3L4CSUM_F)              \
+	T(sec_ts_tso_vlan_ol3ol4csum, 8, T_SEC_F | TSP_F | TSO_F | VLAN_F | OL3OL4CSUM_F)          \
+	T(sec_ts_tso_vlan_ol3ol4csum_l3l4csum, 8,                                                  \
+	  T_SEC_F | TSP_F | TSO_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)                            \
+	T(sec_ts_tso_noff, 8, T_SEC_F | TSP_F | TSO_F | NOFF_F)                                    \
+	T(sec_ts_tso_noff_l3l4csum, 8, T_SEC_F | TSP_F | TSO_F | NOFF_F | L3L4CSUM_F)              \
+	T(sec_ts_tso_noff_ol3ol4csum, 8, T_SEC_F | TSP_F | TSO_F | NOFF_F | OL3OL4CSUM_F)          \
+	T(sec_ts_tso_noff_ol3ol4csum_l3l4csum, 8,                                                  \
+	  T_SEC_F | TSP_F | TSO_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)                            \
+	T(sec_ts_tso_noff_vlan, 8, T_SEC_F | TSP_F | TSO_F | NOFF_F | VLAN_F)                      \
+	T(sec_ts_tso_noff_vlan_l3l4csum, 8,                                                        \
+	  T_SEC_F | TSP_F | TSO_F | NOFF_F | VLAN_F | L3L4CSUM_F)                                  \
+	T(sec_ts_tso_noff_vlan_ol3ol4csum, 8,                                                      \
+	  T_SEC_F | TSP_F | TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)                                \
+	T(sec_ts_tso_noff_vlan_ol3ol4csum_l3l4csum, 8,                                             \
+	  T_SEC_F | TSP_F | TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES                                                                      \
+	NIX_TX_FASTPATH_MODES_0_15                                                                 \
+	NIX_TX_FASTPATH_MODES_16_31                                                                \
+	NIX_TX_FASTPATH_MODES_32_47                                                                \
+	NIX_TX_FASTPATH_MODES_48_63                                                                \
+	NIX_TX_FASTPATH_MODES_64_79                                                                \
+	NIX_TX_FASTPATH_MODES_80_95                                                                \
+	NIX_TX_FASTPATH_MODES_96_111                                                               \
+	NIX_TX_FASTPATH_MODES_112_127
+
+#define T(name, sz, flags)                                                                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_##name(                              \
+		void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_mseg_##name(                         \
+		void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_vec_##name(                          \
+		void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_vec_mseg_##name(                     \
+		void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts);
+
+NIX_TX_FASTPATH_MODES
+#undef T
+
+#define NIX_TX_XMIT(fn, sz, flags)                                                                 \
+	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
+					     uint16_t pkts)                                        \
+	{                                                                                          \
+		RTE_SET_USED(tx_queue);                                                            \
+		RTE_SET_USED(tx_pkts);                                                             \
+		RTE_SET_USED(pkts);                                                                \
+		return 0;                                                                          \
+	}
+
+#define NIX_TX_XMIT_MSEG(fn, sz, flags)                                                            \
+	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
+					     uint16_t pkts)                                        \
+	{                                                                                          \
+		RTE_SET_USED(tx_queue);                                                            \
+		RTE_SET_USED(tx_pkts);                                                             \
+		RTE_SET_USED(pkts);                                                                \
+		return 0;                                                                          \
+	}
+
+#define NIX_TX_XMIT_VEC(fn, sz, flags)                                                             \
+	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
+					     uint16_t pkts)                                        \
+	{                                                                                          \
+		RTE_SET_USED(tx_queue);                                                            \
+		RTE_SET_USED(tx_pkts);                                                             \
+		RTE_SET_USED(pkts);                                                                \
+		return 0;                                                                          \
+	}
+
+#define NIX_TX_XMIT_VEC_MSEG(fn, sz, flags)                                                        \
+	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
+					     uint16_t pkts)                                        \
+	{                                                                                          \
+		RTE_SET_USED(tx_queue);                                                            \
+		RTE_SET_USED(tx_pkts);                                                             \
+		RTE_SET_USED(pkts);                                                                \
+		return 0;                                                                          \
+	}
+
+uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_all_offload(void *tx_queue,
+								  struct rte_mbuf **tx_pkts,
+								  uint16_t pkts);
+
+uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_vec_all_offload(void *tx_queue,
+								      struct rte_mbuf **tx_pkts,
+								      uint16_t pkts);
+
 #endif /* __CN20K_TX_H__ */
diff --git a/drivers/net/cnxk/cn20k_tx_select.c b/drivers/net/cnxk/cn20k_tx_select.c
new file mode 100644
index 0000000000..fb62b54a5f
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_tx_select.c
@@ -0,0 +1,122 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_ethdev.h"
+#include "cn20k_tx.h"
+
+static __rte_used inline void
+pick_tx_func(struct rte_eth_dev *eth_dev, const eth_tx_burst_t tx_burst[NIX_TX_OFFLOAD_MAX])
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	/* [SEC] [TSP] [TSO] [NOFF] [VLAN] [OL3_OL4_CSUM] [IL3_IL4_CSUM] */
+	eth_dev->tx_pkt_burst = tx_burst[dev->tx_offload_flags & (NIX_TX_OFFLOAD_MAX - 1)];
+
+	if (eth_dev->data->dev_started)
+		rte_eth_fp_ops[eth_dev->data->port_id].tx_pkt_burst = eth_dev->tx_pkt_burst;
+}
+
+#if defined(RTE_ARCH_ARM64)
+static int
+cn20k_nix_tx_queue_count(void *tx_queue)
+{
+	struct cn20k_eth_txq *txq = (struct cn20k_eth_txq *)tx_queue;
+
+	return cnxk_nix_tx_queue_count(txq->fc_mem, txq->sqes_per_sqb_log2);
+}
+
+static int
+cn20k_nix_tx_queue_sec_count(void *tx_queue)
+{
+	struct cn20k_eth_txq *txq = (struct cn20k_eth_txq *)tx_queue;
+
+	return cnxk_nix_tx_queue_sec_count(txq->fc_mem, txq->sqes_per_sqb_log2, txq->cpt_fc);
+}
+
+static void
+cn20k_eth_set_tx_tmplt_func(struct rte_eth_dev *eth_dev)
+{
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	const eth_tx_burst_t nix_eth_tx_burst[NIX_TX_OFFLOAD_MAX] = {
+#define T(name, sz, flags) [flags] = cn20k_nix_xmit_pkts_##name,
+
+		NIX_TX_FASTPATH_MODES
+#undef T
+	};
+
+	const eth_tx_burst_t nix_eth_tx_burst_mseg[NIX_TX_OFFLOAD_MAX] = {
+#define T(name, sz, flags) [flags] = cn20k_nix_xmit_pkts_mseg_##name,
+
+		NIX_TX_FASTPATH_MODES
+#undef T
+	};
+
+	const eth_tx_burst_t nix_eth_tx_vec_burst[NIX_TX_OFFLOAD_MAX] = {
+#define T(name, sz, flags) [flags] = cn20k_nix_xmit_pkts_vec_##name,
+
+		NIX_TX_FASTPATH_MODES
+#undef T
+	};
+
+	const eth_tx_burst_t nix_eth_tx_vec_burst_mseg[NIX_TX_OFFLOAD_MAX] = {
+#define T(name, sz, flags) [flags] = cn20k_nix_xmit_pkts_vec_mseg_##name,
+
+		NIX_TX_FASTPATH_MODES
+#undef T
+	};
+
+	if (dev->scalar_ena || dev->tx_mark) {
+		pick_tx_func(eth_dev, nix_eth_tx_burst);
+		if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_MULTI_SEGS)
+			pick_tx_func(eth_dev, nix_eth_tx_burst_mseg);
+	} else {
+		pick_tx_func(eth_dev, nix_eth_tx_vec_burst);
+		if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_MULTI_SEGS)
+			pick_tx_func(eth_dev, nix_eth_tx_vec_burst_mseg);
+	}
+#else
+	RTE_SET_USED(eth_dev);
+#endif
+}
+
+static void
+cn20k_eth_set_tx_blk_func(struct rte_eth_dev *eth_dev)
+{
+#if defined(CNXK_DIS_TMPLT_FUNC)
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	if (dev->scalar_ena || dev->tx_mark)
+		eth_dev->tx_pkt_burst = cn20k_nix_xmit_pkts_all_offload;
+	else
+		eth_dev->tx_pkt_burst = cn20k_nix_xmit_pkts_vec_all_offload;
+
+	if (eth_dev->data->dev_started)
+		rte_eth_fp_ops[eth_dev->data->port_id].tx_pkt_burst = eth_dev->tx_pkt_burst;
+#else
+	RTE_SET_USED(eth_dev);
+#endif
+}
+#endif
+
+void
+cn20k_eth_set_tx_function(struct rte_eth_dev *eth_dev)
+{
+#if defined(RTE_ARCH_ARM64)
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	cn20k_eth_set_tx_blk_func(eth_dev);
+	cn20k_eth_set_tx_tmplt_func(eth_dev);
+
+	if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_SECURITY)
+		eth_dev->tx_queue_count = cn20k_nix_tx_queue_sec_count;
+	else
+		eth_dev->tx_queue_count = cn20k_nix_tx_queue_count;
+
+	rte_atomic_thread_fence(rte_memory_order_release);
+#else
+	RTE_SET_USED(eth_dev);
+#endif
+}
diff --git a/drivers/net/cnxk/meson.build b/drivers/net/cnxk/meson.build
index f41238be9c..fcf48f600a 100644
--- a/drivers/net/cnxk/meson.build
+++ b/drivers/net/cnxk/meson.build
@@ -237,6 +237,7 @@ if soc_type == 'cn20k' or soc_type == 'all'
 sources += files(
         'cn20k_ethdev.c',
         'cn20k_rx_select.c',
+        'cn20k_tx_select.c',
 )
 
 if host_machine.cpu_family().startswith('aarch') and not disable_template
@@ -276,9 +277,45 @@ sources += files(
         'rx/cn20k/rx_all_offload.c',
 )
 
+sources += files(
+        'tx/cn20k/tx_0_15.c',
+        'tx/cn20k/tx_16_31.c',
+        'tx/cn20k/tx_32_47.c',
+        'tx/cn20k/tx_48_63.c',
+        'tx/cn20k/tx_64_79.c',
+        'tx/cn20k/tx_80_95.c',
+        'tx/cn20k/tx_96_111.c',
+        'tx/cn20k/tx_112_127.c',
+        'tx/cn20k/tx_0_15_mseg.c',
+        'tx/cn20k/tx_16_31_mseg.c',
+        'tx/cn20k/tx_32_47_mseg.c',
+        'tx/cn20k/tx_48_63_mseg.c',
+        'tx/cn20k/tx_64_79_mseg.c',
+        'tx/cn20k/tx_80_95_mseg.c',
+        'tx/cn20k/tx_96_111_mseg.c',
+        'tx/cn20k/tx_112_127_mseg.c',
+        'tx/cn20k/tx_0_15_vec.c',
+        'tx/cn20k/tx_16_31_vec.c',
+        'tx/cn20k/tx_32_47_vec.c',
+        'tx/cn20k/tx_48_63_vec.c',
+        'tx/cn20k/tx_64_79_vec.c',
+        'tx/cn20k/tx_80_95_vec.c',
+        'tx/cn20k/tx_96_111_vec.c',
+        'tx/cn20k/tx_112_127_vec.c',
+        'tx/cn20k/tx_0_15_vec_mseg.c',
+        'tx/cn20k/tx_16_31_vec_mseg.c',
+        'tx/cn20k/tx_32_47_vec_mseg.c',
+        'tx/cn20k/tx_48_63_vec_mseg.c',
+        'tx/cn20k/tx_64_79_vec_mseg.c',
+        'tx/cn20k/tx_80_95_vec_mseg.c',
+        'tx/cn20k/tx_96_111_vec_mseg.c',
+        'tx/cn20k/tx_112_127_vec_mseg.c',
+        'tx/cn20k/tx_all_offload.c',
+)
 else
 sources += files(
         'rx/cn20k/rx_all_offload.c',
+        'tx/cn20k/tx_all_offload.c',
 )
 endif
 endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_0_15.c b/drivers/net/cnxk/tx/cn20k/tx_0_15.c
new file mode 100644
index 0000000000..2de434ccb4
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_0_15.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_0_15
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c
new file mode 100644
index 0000000000..c928902b02
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_0_15
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c b/drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c
new file mode 100644
index 0000000000..0e82451c7e
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_0_15
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c
new file mode 100644
index 0000000000..b0cd33f781
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_0_15
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_112_127.c b/drivers/net/cnxk/tx/cn20k/tx_112_127.c
new file mode 100644
index 0000000000..c116c48763
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_112_127.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_112_127
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c
new file mode 100644
index 0000000000..5d67426f2b
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_112_127
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c b/drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c
new file mode 100644
index 0000000000..5a3e5c660d
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_112_127
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c
new file mode 100644
index 0000000000..c6918de6df
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_112_127
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_16_31.c b/drivers/net/cnxk/tx/cn20k/tx_16_31.c
new file mode 100644
index 0000000000..953f63b192
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_16_31.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_16_31
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c
new file mode 100644
index 0000000000..cdfd6bf69c
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_16_31
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c b/drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c
new file mode 100644
index 0000000000..6e6ad7c968
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_16_31
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c
new file mode 100644
index 0000000000..a3a0fcace3
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_16_31
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_32_47.c b/drivers/net/cnxk/tx/cn20k/tx_32_47.c
new file mode 100644
index 0000000000..50295fcd16
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_32_47.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_32_47
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c
new file mode 100644
index 0000000000..8b4da505ad
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_32_47
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c b/drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c
new file mode 100644
index 0000000000..3a3298ffa6
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_32_47
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c
new file mode 100644
index 0000000000..93168990a8
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_32_47
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_48_63.c b/drivers/net/cnxk/tx/cn20k/tx_48_63.c
new file mode 100644
index 0000000000..5765b1fe57
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_48_63.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_48_63
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c
new file mode 100644
index 0000000000..5f591eee68
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_48_63
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c b/drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c
new file mode 100644
index 0000000000..06eec15976
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_48_63
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c
new file mode 100644
index 0000000000..220f117c47
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_48_63
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_64_79.c b/drivers/net/cnxk/tx/cn20k/tx_64_79.c
new file mode 100644
index 0000000000..c05ef2a238
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_64_79.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_64_79
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c
new file mode 100644
index 0000000000..79d40a09ed
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_64_79
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c b/drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c
new file mode 100644
index 0000000000..a4fac7e73e
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_64_79
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c
new file mode 100644
index 0000000000..90d6b4f2f9
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_64_79
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_80_95.c b/drivers/net/cnxk/tx/cn20k/tx_80_95.c
new file mode 100644
index 0000000000..8a09ff842b
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_80_95.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_80_95
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c
new file mode 100644
index 0000000000..59f959b29f
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_80_95
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c b/drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c
new file mode 100644
index 0000000000..ca78d42344
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_80_95
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c
new file mode 100644
index 0000000000..a3a9856783
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_80_95
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_96_111.c b/drivers/net/cnxk/tx/cn20k/tx_96_111.c
new file mode 100644
index 0000000000..fab39f8fcc
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_96_111.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_96_111
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c
new file mode 100644
index 0000000000..11b6814223
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_96_111
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c b/drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c
new file mode 100644
index 0000000000..e1e3b1bca3
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_96_111
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c
new file mode 100644
index 0000000000..b6af4e34c0
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_96_111
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_all_offload.c b/drivers/net/cnxk/tx/cn20k/tx_all_offload.c
new file mode 100644
index 0000000000..c7258b5df7
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_all_offload.c
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if defined(CNXK_DIS_TMPLT_FUNC)
+
+uint16_t __rte_noinline __rte_hot
+cn20k_nix_xmit_pkts_all_offload(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts)
+{
+	uint64_t cmd[8 + CNXK_NIX_TX_MSEG_SG_DWORDS - 2];
+
+	return cn20k_nix_xmit_pkts_mseg(
+		tx_queue, NULL, tx_pkts, pkts, cmd,
+		NIX_TX_OFFLOAD_L3_L4_CSUM_F | NIX_TX_OFFLOAD_OL3_OL4_CSUM_F |
+			NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_MBUF_NOFF_F |
+			NIX_TX_OFFLOAD_TSO_F | NIX_TX_OFFLOAD_TSTAMP_F | NIX_TX_OFFLOAD_SECURITY_F |
+			NIX_TX_MULTI_SEG_F);
+}
+
+uint16_t __rte_noinline __rte_hot
+cn20k_nix_xmit_pkts_vec_all_offload(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts)
+{
+	uint64_t cmd[8 + CNXK_NIX_TX_MSEG_SG_DWORDS - 2];
+
+	return cn20k_nix_xmit_pkts_vector(
+		tx_queue, NULL, tx_pkts, pkts, cmd,
+		NIX_TX_OFFLOAD_L3_L4_CSUM_F | NIX_TX_OFFLOAD_OL3_OL4_CSUM_F |
+			NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_MBUF_NOFF_F |
+			NIX_TX_OFFLOAD_TSO_F | NIX_TX_OFFLOAD_TSTAMP_F | NIX_TX_OFFLOAD_SECURITY_F |
+			NIX_TX_MULTI_SEG_F);
+}
+
+#endif
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 13/18] net/cnxk: support Rx burst scalar for cn20k
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
                     ` (11 preceding siblings ...)
  2024-09-26 16:01   ` [PATCH v2 12/18] net/cnxk: support Tx " Nithin Dabilpuram
@ 2024-09-26 16:01   ` Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 14/18] net/cnxk: support Rx burst vector " Nithin Dabilpuram
                     ` (5 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-26 16:01 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rahul Bhansali, Pavan Nikhilesh

Add Rx burst support scalar version for cn20k.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 drivers/net/cnxk/cn20k_ethdev.c    | 126 +++++++++
 drivers/net/cnxk/cn20k_rx.h        | 394 ++++++++++++++++++++++++++++-
 drivers/net/cnxk/cn20k_rx_select.c |   6 +-
 drivers/net/cnxk/cn20k_rxtx.h      | 156 ++++++++++++
 4 files changed, 674 insertions(+), 8 deletions(-)

diff --git a/drivers/net/cnxk/cn20k_ethdev.c b/drivers/net/cnxk/cn20k_ethdev.c
index 4b2f04ba31..cad7b1316a 100644
--- a/drivers/net/cnxk/cn20k_ethdev.c
+++ b/drivers/net/cnxk/cn20k_ethdev.c
@@ -330,6 +330,33 @@ cn20k_nix_rx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid, uint16_t nb_
 	return 0;
 }
 
+static void
+cn20k_nix_rx_queue_meta_aura_update(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct cnxk_eth_rxq_sp *rxq_sp;
+	struct cn20k_eth_rxq *rxq;
+	struct roc_nix_rq *rq;
+	int i;
+
+	/* Update Aura handle for fastpath rx queues */
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		rq = &dev->rqs[i];
+		rxq = eth_dev->data->rx_queues[i];
+		rxq->meta_aura = rq->meta_aura_handle;
+		rxq->meta_pool = dev->nix.meta_mempool;
+		/* Assume meta packet from normal aura if meta aura is not setup
+		 */
+		if (!rxq->meta_aura) {
+			rxq_sp = cnxk_eth_rxq_to_sp(rxq);
+			rxq->meta_aura = rxq_sp->qconf.mp->pool_id;
+			rxq->meta_pool = (uintptr_t)rxq_sp->qconf.mp;
+		}
+	}
+	/* Store mempool in lookup mem */
+	cnxk_nix_lookup_mem_metapool_set(dev);
+}
+
 static int
 cn20k_nix_tx_queue_stop(struct rte_eth_dev *eth_dev, uint16_t qidx)
 {
@@ -371,6 +398,74 @@ cn20k_nix_configure(struct rte_eth_dev *eth_dev)
 	return 0;
 }
 
+/* Function to enable ptp config for VFs */
+static void
+nix_ptp_enable_vf(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	if (nix_recalc_mtu(eth_dev))
+		plt_err("Failed to set MTU size for ptp");
+
+	dev->rx_offload_flags |= NIX_RX_OFFLOAD_TSTAMP_F;
+
+	/* Setting up the function pointers as per new offload flags */
+	cn20k_eth_set_rx_function(eth_dev);
+	cn20k_eth_set_tx_function(eth_dev);
+}
+
+static uint16_t
+nix_ptp_vf_burst(void *queue, struct rte_mbuf **mbufs, uint16_t pkts)
+{
+	struct cn20k_eth_rxq *rxq = queue;
+	struct cnxk_eth_rxq_sp *rxq_sp;
+	struct rte_eth_dev *eth_dev;
+
+	RTE_SET_USED(mbufs);
+	RTE_SET_USED(pkts);
+
+	rxq_sp = cnxk_eth_rxq_to_sp(rxq);
+	eth_dev = rxq_sp->dev->eth_dev;
+	nix_ptp_enable_vf(eth_dev);
+
+	return 0;
+}
+
+static int
+cn20k_nix_ptp_info_update_cb(struct roc_nix *nix, bool ptp_en)
+{
+	struct cnxk_eth_dev *dev = (struct cnxk_eth_dev *)nix;
+	struct rte_eth_dev *eth_dev;
+	struct cn20k_eth_rxq *rxq;
+	int i;
+
+	if (!dev)
+		return -EINVAL;
+
+	eth_dev = dev->eth_dev;
+	if (!eth_dev)
+		return -EINVAL;
+
+	dev->ptp_en = ptp_en;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		rxq = eth_dev->data->rx_queues[i];
+		rxq->mbuf_initializer = cnxk_nix_rxq_mbuf_setup(dev);
+	}
+
+	if (roc_nix_is_vf_or_sdp(nix) && !(roc_nix_is_sdp(nix)) && !(roc_nix_is_lbk(nix))) {
+		/* In case of VF, setting of MTU cannot be done directly in this
+		 * function as this is running as part of MBOX request(PF->VF)
+		 * and MTU setting also requires MBOX message to be
+		 * sent(VF->PF)
+		 */
+		eth_dev->rx_pkt_burst = nix_ptp_vf_burst;
+		rte_mb();
+	}
+
+	return 0;
+}
+
 static int
 cn20k_nix_timesync_enable(struct rte_eth_dev *eth_dev)
 {
@@ -451,11 +546,21 @@ cn20k_nix_dev_start(struct rte_eth_dev *eth_dev)
 	if (rc)
 		return rc;
 
+	/* Update VF about data off shifted by 8 bytes if PTP already
+	 * enabled in PF owning this VF
+	 */
+	if (dev->ptp_en && (!roc_nix_is_pf(nix) && (!roc_nix_is_sdp(nix))))
+		nix_ptp_enable_vf(eth_dev);
+
 	/* Setting up the rx[tx]_offload_flags due to change
 	 * in rx[tx]_offloads.
 	 */
 	dev->rx_offload_flags |= nix_rx_offload_flags(eth_dev);
 	dev->tx_offload_flags |= nix_tx_offload_flags(eth_dev);
+
+	if (dev->rx_offload_flags & NIX_RX_OFFLOAD_SECURITY_F)
+		cn20k_nix_rx_queue_meta_aura_update(eth_dev);
+
 	/* Set flags for Rx Inject feature */
 	if (roc_idev_nix_rx_inject_get(nix->port_id))
 		dev->rx_offload_flags |= NIX_RX_SEC_REASSEMBLY_F;
@@ -621,6 +726,20 @@ nix_tm_ops_override(void)
 	if (init_once)
 		return;
 	init_once = 1;
+
+	/* Update platform specific ops */
+}
+
+static void
+npc_flow_ops_override(void)
+{
+	static int init_once;
+
+	if (init_once)
+		return;
+	init_once = 1;
+
+	/* Update platform specific ops */
 }
 
 static int
@@ -633,6 +752,7 @@ static int
 cn20k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 {
 	struct rte_eth_dev *eth_dev;
+	struct cnxk_eth_dev *dev;
 	int rc;
 
 	rc = roc_plt_init();
@@ -643,6 +763,7 @@ cn20k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 
 	nix_eth_dev_ops_override();
 	nix_tm_ops_override();
+	npc_flow_ops_override();
 
 	/* Common probe */
 	rc = cnxk_nix_probe(pci_drv, pci_dev);
@@ -665,6 +786,11 @@ cn20k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		return 0;
 	}
 
+	dev = cnxk_eth_pmd_priv(eth_dev);
+
+	/* Register up msg callbacks for PTP information */
+	roc_nix_ptp_info_cb_register(&dev->nix, cn20k_nix_ptp_info_update_cb);
+
 	return 0;
 }
 
diff --git a/drivers/net/cnxk/cn20k_rx.h b/drivers/net/cnxk/cn20k_rx.h
index 2cb77c0b46..22abf7bbd8 100644
--- a/drivers/net/cnxk/cn20k_rx.h
+++ b/drivers/net/cnxk/cn20k_rx.h
@@ -29,8 +29,397 @@
 #define NIX_RX_VWQE_F	   BIT(13)
 #define NIX_RX_MULTI_SEG_F BIT(14)
 
+#define CNXK_NIX_CQ_ENTRY_SZ 128
+#define NIX_DESCS_PER_LOOP   4
+#define CQE_CAST(x)	     ((struct nix_cqe_hdr_s *)(x))
+#define CQE_SZ(x)	     ((x) * CNXK_NIX_CQ_ENTRY_SZ)
+
+#define CQE_PTR_OFF(b, i, o, f)                                                                    \
+	(((f) & NIX_RX_VWQE_F) ? (uint64_t *)(((uintptr_t)((uint64_t *)(b))[i]) + (o)) :           \
+				 (uint64_t *)(((uintptr_t)(b)) + CQE_SZ(i) + (o)))
+#define CQE_PTR_DIFF(b, i, o, f)                                                                   \
+	(((f) & NIX_RX_VWQE_F) ? (uint64_t *)(((uintptr_t)((uint64_t *)(b))[i]) - (o)) :           \
+				 (uint64_t *)(((uintptr_t)(b)) + CQE_SZ(i) - (o)))
+
+#define NIX_RX_SEC_UCC_CONST                                                                       \
+	((RTE_MBUF_F_RX_IP_CKSUM_BAD >> 1) |                                                       \
+	 ((RTE_MBUF_F_RX_IP_CKSUM_GOOD | RTE_MBUF_F_RX_L4_CKSUM_GOOD) >> 1) << 8 |                 \
+	 ((RTE_MBUF_F_RX_IP_CKSUM_GOOD | RTE_MBUF_F_RX_L4_CKSUM_BAD) >> 1) << 16 |                 \
+	 ((RTE_MBUF_F_RX_IP_CKSUM_GOOD | RTE_MBUF_F_RX_L4_CKSUM_GOOD) >> 1) << 32 |                \
+	 ((RTE_MBUF_F_RX_IP_CKSUM_GOOD | RTE_MBUF_F_RX_L4_CKSUM_GOOD) >> 1) << 48)
+
+#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
+static inline void
+nix_mbuf_validate_next(struct rte_mbuf *m)
+{
+	if (m->nb_segs == 1 && m->next) {
+		rte_panic("mbuf->next[%p] valid when mbuf->nb_segs is %d", m->next, m->nb_segs);
+	}
+}
+#else
+static inline void
+nix_mbuf_validate_next(struct rte_mbuf *m)
+{
+	RTE_SET_USED(m);
+}
+#endif
+
 #define NIX_RX_SEC_REASSEMBLY_F (NIX_RX_REAS_F | NIX_RX_OFFLOAD_SECURITY_F)
 
+static inline rte_eth_ip_reassembly_dynfield_t *
+cnxk_ip_reassembly_dynfield(struct rte_mbuf *mbuf, int ip_reassembly_dynfield_offset)
+{
+	return RTE_MBUF_DYNFIELD(mbuf, ip_reassembly_dynfield_offset,
+				 rte_eth_ip_reassembly_dynfield_t *);
+}
+
+union mbuf_initializer {
+	struct {
+		uint16_t data_off;
+		uint16_t refcnt;
+		uint16_t nb_segs;
+		uint16_t port;
+	} fields;
+	uint64_t value;
+};
+
+static __rte_always_inline uint64_t
+nix_clear_data_off(uint64_t oldval)
+{
+	union mbuf_initializer mbuf_init = {.value = oldval};
+
+	mbuf_init.fields.data_off = 0;
+	return mbuf_init.value;
+}
+
+static __rte_always_inline struct rte_mbuf *
+nix_get_mbuf_from_cqe(void *cq, const uint64_t data_off)
+{
+	rte_iova_t buff;
+
+	/* Skip CQE, NIX_RX_PARSE_S and SG HDR(9 DWORDs) and peek buff addr */
+	buff = *((rte_iova_t *)((uint64_t *)cq + 9));
+	return (struct rte_mbuf *)(buff - data_off);
+}
+
+static __rte_always_inline uint32_t
+nix_ptype_get(const void *const lookup_mem, const uint64_t in)
+{
+	const uint16_t *const ptype = lookup_mem;
+	const uint16_t lh_lg_lf = (in & 0xFFF0000000000000) >> 52;
+	const uint16_t tu_l2 = ptype[(in & 0x000FFFF000000000) >> 36];
+	const uint16_t il4_tu = ptype[PTYPE_NON_TUNNEL_ARRAY_SZ + lh_lg_lf];
+
+	return (il4_tu << PTYPE_NON_TUNNEL_WIDTH) | tu_l2;
+}
+
+static __rte_always_inline uint32_t
+nix_rx_olflags_get(const void *const lookup_mem, const uint64_t in)
+{
+	const uint32_t *const ol_flags =
+		(const uint32_t *)((const uint8_t *)lookup_mem + PTYPE_ARRAY_SZ);
+
+	return ol_flags[(in & 0xfff00000) >> 20];
+}
+
+static inline uint64_t
+nix_update_match_id(const uint16_t match_id, uint64_t ol_flags, struct rte_mbuf *mbuf)
+{
+	/* There is no separate bit to check match_id
+	 * is valid or not? and no flag to identify it is an
+	 * RTE_FLOW_ACTION_TYPE_FLAG vs RTE_FLOW_ACTION_TYPE_MARK
+	 * action. The former case addressed through 0 being invalid
+	 * value and inc/dec match_id pair when MARK is activated.
+	 * The later case addressed through defining
+	 * CNXK_FLOW_MARK_DEFAULT as value for
+	 * RTE_FLOW_ACTION_TYPE_MARK.
+	 * This would translate to not use
+	 * CNXK_FLOW_ACTION_FLAG_DEFAULT - 1 and
+	 * CNXK_FLOW_ACTION_FLAG_DEFAULT for match_id.
+	 * i.e valid mark_id's are from
+	 * 0 to CNXK_FLOW_ACTION_FLAG_DEFAULT - 2
+	 */
+	if (likely(match_id)) {
+		ol_flags |= RTE_MBUF_F_RX_FDIR;
+		if (match_id != CNXK_FLOW_ACTION_FLAG_DEFAULT) {
+			ol_flags |= RTE_MBUF_F_RX_FDIR_ID;
+			mbuf->hash.fdir.hi = match_id - 1;
+		}
+	}
+
+	return ol_flags;
+}
+
+static __rte_always_inline void
+nix_cqe_xtract_mseg(const union nix_rx_parse_u *rx, struct rte_mbuf *mbuf, uint64_t rearm,
+		    uintptr_t cpth, uintptr_t sa_base, const uint16_t flags)
+{
+	const rte_iova_t *iova_list;
+	uint16_t later_skip = 0;
+	struct rte_mbuf *head;
+	const rte_iova_t *eol;
+	uint8_t nb_segs;
+	uint16_t sg_len;
+	int64_t len;
+	uint64_t sg;
+	uintptr_t p;
+
+	(void)cpth;
+	(void)sa_base;
+
+	sg = *(const uint64_t *)(rx + 1);
+	nb_segs = (sg >> 48) & 0x3;
+
+	if (nb_segs == 1)
+		return;
+
+	len = rx->pkt_lenm1 + 1;
+
+	mbuf->pkt_len = len - (flags & NIX_RX_OFFLOAD_TSTAMP_F ? CNXK_NIX_TIMESYNC_RX_OFFSET : 0);
+	mbuf->nb_segs = nb_segs;
+	head = mbuf;
+	mbuf->data_len =
+		(sg & 0xFFFF) - (flags & NIX_RX_OFFLOAD_TSTAMP_F ? CNXK_NIX_TIMESYNC_RX_OFFSET : 0);
+	eol = ((const rte_iova_t *)(rx + 1) + ((rx->desc_sizem1 + 1) << 1));
+
+	len -= mbuf->data_len;
+	sg = sg >> 16;
+	/* Skip SG_S and first IOVA*/
+	iova_list = ((const rte_iova_t *)(rx + 1)) + 2;
+	nb_segs--;
+
+	later_skip = (uintptr_t)mbuf->buf_addr - (uintptr_t)mbuf;
+
+	while (nb_segs) {
+		mbuf->next = (struct rte_mbuf *)(*iova_list - later_skip);
+		mbuf = mbuf->next;
+
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf->pool, (void **)&mbuf, 1, 1);
+
+		sg_len = sg & 0XFFFF;
+
+		mbuf->data_len = sg_len;
+		sg = sg >> 16;
+		p = (uintptr_t)&mbuf->rearm_data;
+		*(uint64_t *)p = rearm & ~0xFFFF;
+		nb_segs--;
+		iova_list++;
+
+		if (!nb_segs && (iova_list + 1 < eol)) {
+			sg = *(const uint64_t *)(iova_list);
+			nb_segs = (sg >> 48) & 0x3;
+			head->nb_segs += nb_segs;
+			iova_list = (const rte_iova_t *)(iova_list + 1);
+		}
+	}
+}
+
+static __rte_always_inline void
+cn20k_nix_cqe_to_mbuf(const struct nix_cqe_hdr_s *cq, const uint32_t tag, struct rte_mbuf *mbuf,
+		      const void *lookup_mem, const uint64_t val, const uintptr_t cpth,
+		      const uintptr_t sa_base, const uint16_t flag)
+{
+	const union nix_rx_parse_u *rx = (const union nix_rx_parse_u *)((const uint64_t *)cq + 1);
+	const uint64_t w1 = *(const uint64_t *)rx;
+	uint16_t len = rx->pkt_lenm1 + 1;
+	uint64_t ol_flags = 0;
+	uintptr_t p;
+
+	if (flag & NIX_RX_OFFLOAD_PTYPE_F)
+		mbuf->packet_type = nix_ptype_get(lookup_mem, w1);
+	else
+		mbuf->packet_type = 0;
+
+	if (flag & NIX_RX_OFFLOAD_RSS_F) {
+		mbuf->hash.rss = tag;
+		ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
+	}
+
+	/* Skip rx ol flags extraction for Security packets */
+	ol_flags |= (uint64_t)nix_rx_olflags_get(lookup_mem, w1);
+
+	if (flag & NIX_RX_OFFLOAD_VLAN_STRIP_F) {
+		if (rx->vtag0_gone) {
+			ol_flags |= RTE_MBUF_F_RX_VLAN | RTE_MBUF_F_RX_VLAN_STRIPPED;
+			mbuf->vlan_tci = rx->vtag0_tci;
+		}
+		if (rx->vtag1_gone) {
+			ol_flags |= RTE_MBUF_F_RX_QINQ | RTE_MBUF_F_RX_QINQ_STRIPPED;
+			mbuf->vlan_tci_outer = rx->vtag1_tci;
+		}
+	}
+
+	if (flag & NIX_RX_OFFLOAD_MARK_UPDATE_F)
+		ol_flags = nix_update_match_id(rx->match_id, ol_flags, mbuf);
+
+	mbuf->ol_flags = ol_flags;
+	mbuf->pkt_len = len;
+	mbuf->data_len = len;
+	p = (uintptr_t)&mbuf->rearm_data;
+	*(uint64_t *)p = val;
+
+	if (flag & NIX_RX_MULTI_SEG_F)
+		/*
+		 * For multi segment packets, mbuf length correction according
+		 * to Rx timestamp length will be handled later during
+		 * timestamp data process.
+		 * Hence, timestamp flag argument is not required.
+		 */
+		nix_cqe_xtract_mseg(rx, mbuf, val, cpth, sa_base, flag & ~NIX_RX_OFFLOAD_TSTAMP_F);
+}
+
+static inline uint16_t
+nix_rx_nb_pkts(struct cn20k_eth_rxq *rxq, const uint64_t wdata, const uint16_t pkts,
+	       const uint32_t qmask)
+{
+	uint32_t available = rxq->available;
+
+	/* Update the available count if cached value is not enough */
+	if (unlikely(available < pkts)) {
+		uint64_t reg, head, tail;
+
+		/* Use LDADDA version to avoid reorder */
+		reg = roc_atomic64_add_sync(wdata, rxq->cq_status);
+		/* CQ_OP_STATUS operation error */
+		if (reg & BIT_ULL(NIX_CQ_OP_STAT_OP_ERR) || reg & BIT_ULL(NIX_CQ_OP_STAT_CQ_ERR))
+			return 0;
+
+		tail = reg & 0xFFFFF;
+		head = (reg >> 20) & 0xFFFFF;
+		if (tail < head)
+			available = tail - head + qmask + 1;
+		else
+			available = tail - head;
+
+		rxq->available = available;
+	}
+
+	return RTE_MIN(pkts, available);
+}
+
+static __rte_always_inline void
+cn20k_nix_mbuf_to_tstamp(struct rte_mbuf *mbuf, struct cnxk_timesync_info *tstamp,
+			 const uint8_t ts_enable, uint64_t *tstamp_ptr)
+{
+	if (ts_enable) {
+		mbuf->pkt_len -= CNXK_NIX_TIMESYNC_RX_OFFSET;
+		mbuf->data_len -= CNXK_NIX_TIMESYNC_RX_OFFSET;
+
+		/* Reading the rx timestamp inserted by CGX, viz at
+		 * starting of the packet data.
+		 */
+		*tstamp_ptr = ((*tstamp_ptr >> 32) * NSEC_PER_SEC) + (*tstamp_ptr & 0xFFFFFFFFUL);
+		*cnxk_nix_timestamp_dynfield(mbuf, tstamp) = rte_be_to_cpu_64(*tstamp_ptr);
+		/* RTE_MBUF_F_RX_IEEE1588_TMST flag needs to be set only in case
+		 * PTP packets are received.
+		 */
+		if (mbuf->packet_type == RTE_PTYPE_L2_ETHER_TIMESYNC) {
+			tstamp->rx_tstamp = *cnxk_nix_timestamp_dynfield(mbuf, tstamp);
+			tstamp->rx_ready = 1;
+			mbuf->ol_flags |= RTE_MBUF_F_RX_IEEE1588_PTP | RTE_MBUF_F_RX_IEEE1588_TMST |
+					  tstamp->rx_tstamp_dynflag;
+		}
+	}
+}
+
+static __rte_always_inline uint16_t
+cn20k_nix_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts, const uint16_t flags)
+{
+	struct cn20k_eth_rxq *rxq = rx_queue;
+	const uint64_t mbuf_init = rxq->mbuf_initializer;
+	const void *lookup_mem = rxq->lookup_mem;
+	const uint64_t data_off = rxq->data_off;
+	const uintptr_t desc = rxq->desc;
+	const uint64_t wdata = rxq->wdata;
+	const uint32_t qmask = rxq->qmask;
+	uint16_t packets = 0, nb_pkts;
+	uint32_t head = rxq->head;
+	struct nix_cqe_hdr_s *cq;
+	struct rte_mbuf *mbuf;
+	uint64_t sa_base = 0;
+	uintptr_t cpth = 0;
+
+	nb_pkts = nix_rx_nb_pkts(rxq, wdata, pkts, qmask);
+
+	while (packets < nb_pkts) {
+		/* Prefetch N desc ahead */
+		rte_prefetch_non_temporal((void *)(desc + (CQE_SZ((head + 2) & qmask))));
+		cq = (struct nix_cqe_hdr_s *)(desc + CQE_SZ(head));
+
+		mbuf = nix_get_mbuf_from_cqe(cq, data_off);
+
+		/* Mark mempool obj as "get" as it is alloc'ed by NIX */
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf->pool, (void **)&mbuf, 1, 1);
+
+		cn20k_nix_cqe_to_mbuf(cq, cq->tag, mbuf, lookup_mem, mbuf_init, cpth, sa_base,
+				      flags);
+		cn20k_nix_mbuf_to_tstamp(mbuf, rxq->tstamp, (flags & NIX_RX_OFFLOAD_TSTAMP_F),
+					 (uint64_t *)((uint8_t *)mbuf + data_off));
+		rx_pkts[packets++] = mbuf;
+		roc_prefetch_store_keep(mbuf);
+		head++;
+		head &= qmask;
+	}
+
+	rxq->head = head;
+	rxq->available -= nb_pkts;
+
+	/* Free all the CQs that we've processed */
+	plt_write64((wdata | nb_pkts), rxq->cq_door);
+
+	return nb_pkts;
+}
+
+static __rte_always_inline uint16_t
+cn20k_nix_flush_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts,
+			  const uint16_t flags)
+{
+	struct cn20k_eth_rxq *rxq = rx_queue;
+	const uint64_t mbuf_init = rxq->mbuf_initializer;
+	const void *lookup_mem = rxq->lookup_mem;
+	const uint64_t data_off = rxq->data_off;
+	const uint64_t wdata = rxq->wdata;
+	const uint32_t qmask = rxq->qmask;
+	const uintptr_t desc = rxq->desc;
+	uint16_t packets = 0, nb_pkts;
+	uint16_t lmt_id __rte_unused;
+	uint32_t head = rxq->head;
+	struct nix_cqe_hdr_s *cq;
+	struct rte_mbuf *mbuf;
+	uint64_t sa_base = 0;
+	uintptr_t cpth = 0;
+
+	nb_pkts = nix_rx_nb_pkts(rxq, wdata, pkts, qmask);
+
+	while (packets < nb_pkts) {
+		/* Prefetch N desc ahead */
+		rte_prefetch_non_temporal((void *)(desc + (CQE_SZ((head + 2) & qmask))));
+		cq = (struct nix_cqe_hdr_s *)(desc + CQE_SZ(head));
+
+		mbuf = nix_get_mbuf_from_cqe(cq, data_off);
+
+		/* Mark mempool obj as "get" as it is alloc'ed by NIX */
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf->pool, (void **)&mbuf, 1, 1);
+
+		cn20k_nix_cqe_to_mbuf(cq, cq->tag, mbuf, lookup_mem, mbuf_init, cpth, sa_base,
+				      flags);
+		cn20k_nix_mbuf_to_tstamp(mbuf, rxq->tstamp, (flags & NIX_RX_OFFLOAD_TSTAMP_F),
+					 (uint64_t *)((uint8_t *)mbuf + data_off));
+		rx_pkts[packets++] = mbuf;
+		roc_prefetch_store_keep(mbuf);
+		head++;
+		head &= qmask;
+	}
+
+	rxq->head = head;
+	rxq->available -= nb_pkts;
+
+	/* Free all the CQs that we've processed */
+	plt_write64((wdata | nb_pkts), rxq->cq_door);
+
+	return nb_pkts;
+}
+
 #define RSS_F	  NIX_RX_OFFLOAD_RSS_F
 #define PTYPE_F	  NIX_RX_OFFLOAD_PTYPE_F
 #define CKSUM_F	  NIX_RX_OFFLOAD_CHECKSUM_F
@@ -220,10 +609,7 @@ NIX_RX_FASTPATH_MODES
 	uint16_t __rte_noinline __rte_hot fn(void *rx_queue, struct rte_mbuf **rx_pkts,            \
 					     uint16_t pkts)                                        \
 	{                                                                                          \
-		RTE_SET_USED(rx_queue);                                                            \
-		RTE_SET_USED(rx_pkts);                                                             \
-		RTE_SET_USED(pkts);                                                                \
-		return 0;                                                                          \
+		return cn20k_nix_recv_pkts(rx_queue, rx_pkts, pkts, (flags));                      \
 	}
 
 #define NIX_RX_RECV_MSEG(fn, flags) NIX_RX_RECV(fn, flags | NIX_RX_MULTI_SEG_F)
diff --git a/drivers/net/cnxk/cn20k_rx_select.c b/drivers/net/cnxk/cn20k_rx_select.c
index 82e06a62ef..25c79434cd 100644
--- a/drivers/net/cnxk/cn20k_rx_select.c
+++ b/drivers/net/cnxk/cn20k_rx_select.c
@@ -22,10 +22,8 @@ pick_rx_func(struct rte_eth_dev *eth_dev, const eth_rx_burst_t rx_burst[NIX_RX_O
 static uint16_t __rte_noinline __rte_hot __rte_unused
 cn20k_nix_flush_rx(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)
 {
-	RTE_SET_USED(rx_queue);
-	RTE_SET_USED(rx_pkts);
-	RTE_SET_USED(pkts);
-	return 0;
+	const uint16_t flags = NIX_RX_MULTI_SEG_F | NIX_RX_REAS_F | NIX_RX_OFFLOAD_SECURITY_F;
+	return cn20k_nix_flush_recv_pkts(rx_queue, rx_pkts, pkts, flags);
 }
 
 #if defined(RTE_ARCH_ARM64)
diff --git a/drivers/net/cnxk/cn20k_rxtx.h b/drivers/net/cnxk/cn20k_rxtx.h
index 5cc445d4b1..03eaf34d64 100644
--- a/drivers/net/cnxk/cn20k_rxtx.h
+++ b/drivers/net/cnxk/cn20k_rxtx.h
@@ -83,7 +83,163 @@ struct cn20k_eth_rxq {
 	struct cnxk_timesync_info *tstamp;
 } __plt_cache_aligned;
 
+/* Private data in sw rsvd area of struct roc_ot_ipsec_inb_sa */
+struct cn20k_inb_priv_data {
+	void *userdata;
+	int reass_dynfield_off;
+	int reass_dynflag_bit;
+	struct cnxk_eth_sec_sess *eth_sec;
+};
+
+struct cn20k_sec_sess_priv {
+	union {
+		struct {
+			uint32_t sa_idx;
+			uint8_t inb_sa : 1;
+			uint8_t outer_ip_ver : 1;
+			uint8_t mode : 1;
+			uint8_t roundup_byte : 5;
+			uint8_t roundup_len;
+			uint16_t partial_len : 10;
+			uint16_t chksum : 2;
+			uint16_t dec_ttl : 1;
+			uint16_t nixtx_off : 1;
+			uint16_t rsvd : 2;
+		};
+
+		uint64_t u64;
+	};
+} __rte_packed;
+
 #define LMT_OFF(lmt_addr, lmt_num, offset)                                                         \
 	(void *)((uintptr_t)(lmt_addr) + ((uint64_t)(lmt_num) << ROC_LMT_LINE_SIZE_LOG2) + (offset))
 
+static inline uint16_t
+nix_tx_compl_nb_pkts(struct cn20k_eth_txq *txq, const uint64_t wdata, const uint32_t qmask)
+{
+	uint16_t available = txq->tx_compl.available;
+
+	/* Update the available count if cached value is not enough */
+	if (!unlikely(available)) {
+		uint64_t reg, head, tail;
+
+		/* Use LDADDA version to avoid reorder */
+		reg = roc_atomic64_add_sync(wdata, txq->tx_compl.cq_status);
+		/* CQ_OP_STATUS operation error */
+		if (reg & BIT_ULL(NIX_CQ_OP_STAT_OP_ERR) || reg & BIT_ULL(NIX_CQ_OP_STAT_CQ_ERR))
+			return 0;
+
+		tail = reg & 0xFFFFF;
+		head = (reg >> 20) & 0xFFFFF;
+		if (tail < head)
+			available = tail - head + qmask + 1;
+		else
+			available = tail - head;
+
+		txq->tx_compl.available = available;
+	}
+	return available;
+}
+
+static inline void
+handle_tx_completion_pkts(struct cn20k_eth_txq *txq, uint8_t mt_safe)
+{
+#define CNXK_NIX_CQ_ENTRY_SZ 128
+#define CQE_SZ(x)	     ((x) * CNXK_NIX_CQ_ENTRY_SZ)
+
+	uint16_t tx_pkts = 0, nb_pkts;
+	const uintptr_t desc = txq->tx_compl.desc_base;
+	const uint64_t wdata = txq->tx_compl.wdata;
+	const uint32_t qmask = txq->tx_compl.qmask;
+	uint32_t head = txq->tx_compl.head;
+	struct nix_cqe_hdr_s *tx_compl_cq;
+	struct nix_send_comp_s *tx_compl_s0;
+	struct rte_mbuf *m_next, *m;
+
+	if (mt_safe)
+		rte_spinlock_lock(&txq->tx_compl.ext_buf_lock);
+
+	nb_pkts = nix_tx_compl_nb_pkts(txq, wdata, qmask);
+	while (tx_pkts < nb_pkts) {
+		rte_prefetch_non_temporal((void *)(desc + (CQE_SZ((head + 2) & qmask))));
+		tx_compl_cq = (struct nix_cqe_hdr_s *)(desc + CQE_SZ(head));
+		tx_compl_s0 = (struct nix_send_comp_s *)((uint64_t *)tx_compl_cq + 1);
+		m = txq->tx_compl.ptr[tx_compl_s0->sqe_id];
+		while (m->next != NULL) {
+			m_next = m->next;
+			rte_pktmbuf_free_seg(m);
+			m = m_next;
+		}
+		rte_pktmbuf_free_seg(m);
+		txq->tx_compl.ptr[tx_compl_s0->sqe_id] = NULL;
+
+		head++;
+		head &= qmask;
+		tx_pkts++;
+	}
+	txq->tx_compl.head = head;
+	txq->tx_compl.available -= nb_pkts;
+
+	plt_write64((wdata | nb_pkts), txq->tx_compl.cq_door);
+
+	if (mt_safe)
+		rte_spinlock_unlock(&txq->tx_compl.ext_buf_lock);
+}
+
+static __rte_always_inline uint64_t
+cn20k_cpt_tx_steor_data(void)
+{
+	/* We have two CPT instructions per LMTLine TODO */
+	const uint64_t dw_m1 = ROC_CN10K_TWO_CPT_INST_DW_M1;
+	uint64_t data;
+
+	/* This will be moved to addr area */
+	data = dw_m1 << 16;
+	data |= dw_m1 << 19;
+	data |= dw_m1 << 22;
+	data |= dw_m1 << 25;
+	data |= dw_m1 << 28;
+	data |= dw_m1 << 31;
+	data |= dw_m1 << 34;
+	data |= dw_m1 << 37;
+	data |= dw_m1 << 40;
+	data |= dw_m1 << 43;
+	data |= dw_m1 << 46;
+	data |= dw_m1 << 49;
+	data |= dw_m1 << 52;
+	data |= dw_m1 << 55;
+	data |= dw_m1 << 58;
+	data |= dw_m1 << 61;
+
+	return data;
+}
+
+static __rte_always_inline void
+cn20k_nix_sec_steorl(uintptr_t io_addr, uint32_t lmt_id, uint8_t lnum, uint8_t loff, uint8_t shft)
+{
+	uint64_t data;
+	uintptr_t pa;
+
+	/* Check if there is any CPT instruction to submit */
+	if (!lnum && !loff)
+		return;
+
+	data = cn20k_cpt_tx_steor_data();
+	/* Update lmtline use for partial end line */
+	if (loff) {
+		data &= ~(0x7ULL << shft);
+		/* Update it to half full i.e 64B */
+		data |= (0x3UL << shft);
+	}
+
+	pa = io_addr | ((data >> 16) & 0x7) << 4;
+	data &= ~(0x7ULL << 16);
+	/* Update lines - 1 that contain valid data */
+	data |= ((uint64_t)(lnum + loff - 1)) << 12;
+	data |= (uint64_t)lmt_id;
+
+	/* STEOR */
+	roc_lmt_submit_steorl(data, pa);
+}
+
 #endif /* __CN20K_RXTX_H__ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 14/18] net/cnxk: support Rx burst vector for cn20k
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
                     ` (12 preceding siblings ...)
  2024-09-26 16:01   ` [PATCH v2 13/18] net/cnxk: support Rx burst scalar " Nithin Dabilpuram
@ 2024-09-26 16:01   ` Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 15/18] net/cnxk: support Tx burst scalar " Nithin Dabilpuram
                     ` (4 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-26 16:01 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rahul Bhansali, Pavan Nikhilesh

Add Rx vector support for cn20k

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 drivers/net/cnxk/cn20k_rx.h | 463 +++++++++++++++++++++++++++++++++++-
 1 file changed, 459 insertions(+), 4 deletions(-)

diff --git a/drivers/net/cnxk/cn20k_rx.h b/drivers/net/cnxk/cn20k_rx.h
index 22abf7bbd8..d1bf0c615e 100644
--- a/drivers/net/cnxk/cn20k_rx.h
+++ b/drivers/net/cnxk/cn20k_rx.h
@@ -420,6 +420,463 @@ cn20k_nix_flush_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pk
 	return nb_pkts;
 }
 
+#if defined(RTE_ARCH_ARM64)
+
+static __rte_always_inline uint64_t
+nix_vlan_update(const uint64_t w2, uint64_t ol_flags, uint8x16_t *f)
+{
+	if (w2 & BIT_ULL(21) /* vtag0_gone */) {
+		ol_flags |= RTE_MBUF_F_RX_VLAN | RTE_MBUF_F_RX_VLAN_STRIPPED;
+		*f = vsetq_lane_u16((uint16_t)(w2 >> 32), *f, 5);
+	}
+
+	return ol_flags;
+}
+
+static __rte_always_inline uint64_t
+nix_qinq_update(const uint64_t w2, uint64_t ol_flags, struct rte_mbuf *mbuf)
+{
+	if (w2 & BIT_ULL(23) /* vtag1_gone */) {
+		ol_flags |= RTE_MBUF_F_RX_QINQ | RTE_MBUF_F_RX_QINQ_STRIPPED;
+		mbuf->vlan_tci_outer = (uint16_t)(w2 >> 48);
+	}
+
+	return ol_flags;
+}
+
+#define NIX_PUSH_META_TO_FREE(_mbuf, _laddr, _loff_p)                                              \
+	do {                                                                                       \
+		*(uint64_t *)((_laddr) + (*(_loff_p) << 3)) = (uint64_t)_mbuf;                     \
+		*(_loff_p) = *(_loff_p) + 1;                                                       \
+		/* Mark meta mbuf as put */                                                        \
+		RTE_MEMPOOL_CHECK_COOKIES(_mbuf->pool, (void **)&_mbuf, 1, 0);                     \
+	} while (0)
+
+static __rte_always_inline uint16_t
+cn20k_nix_recv_pkts_vector(void *args, struct rte_mbuf **mbufs, uint16_t pkts, const uint16_t flags,
+			   void *lookup_mem, struct cnxk_timesync_info *tstamp, uintptr_t lmt_base,
+			   uint64_t meta_aura)
+{
+	struct cn20k_eth_rxq *rxq = args;
+	const uint64_t mbuf_initializer =
+		(flags & NIX_RX_VWQE_F) ? *(uint64_t *)args : rxq->mbuf_initializer;
+	const uint64x2_t data_off = flags & NIX_RX_VWQE_F ? vdupq_n_u64(RTE_PKTMBUF_HEADROOM) :
+							    vdupq_n_u64(rxq->data_off);
+	const uint32_t qmask = flags & NIX_RX_VWQE_F ? 0 : rxq->qmask;
+	const uint64_t wdata = flags & NIX_RX_VWQE_F ? 0 : rxq->wdata;
+	const uintptr_t desc = flags & NIX_RX_VWQE_F ? 0 : rxq->desc;
+	uint64x2_t cq0_w8, cq1_w8, cq2_w8, cq3_w8, mbuf01, mbuf23;
+	uintptr_t cpth0 = 0, cpth1 = 0, cpth2 = 0, cpth3 = 0;
+	uint64_t ol_flags0, ol_flags1, ol_flags2, ol_flags3;
+	uint64x2_t rearm0 = vdupq_n_u64(mbuf_initializer);
+	uint64x2_t rearm1 = vdupq_n_u64(mbuf_initializer);
+	uint64x2_t rearm2 = vdupq_n_u64(mbuf_initializer);
+	uint64x2_t rearm3 = vdupq_n_u64(mbuf_initializer);
+	struct rte_mbuf *mbuf0, *mbuf1, *mbuf2, *mbuf3;
+	uint8x16_t f0, f1, f2, f3;
+	uintptr_t sa_base = 0;
+	uint16_t packets = 0;
+	uint16_t pkts_left;
+	uint32_t head;
+	uintptr_t cq0;
+
+	(void)lmt_base;
+	(void)meta_aura;
+
+	if (!(flags & NIX_RX_VWQE_F)) {
+		lookup_mem = rxq->lookup_mem;
+		head = rxq->head;
+
+		pkts = nix_rx_nb_pkts(rxq, wdata, pkts, qmask);
+		pkts_left = pkts & (NIX_DESCS_PER_LOOP - 1);
+		/* Packets has to be floor-aligned to NIX_DESCS_PER_LOOP */
+		pkts = RTE_ALIGN_FLOOR(pkts, NIX_DESCS_PER_LOOP);
+		if (flags & NIX_RX_OFFLOAD_TSTAMP_F)
+			tstamp = rxq->tstamp;
+
+		cq0 = desc + CQE_SZ(head);
+		rte_prefetch0(CQE_PTR_OFF(cq0, 0, 64, flags));
+		rte_prefetch0(CQE_PTR_OFF(cq0, 1, 64, flags));
+		rte_prefetch0(CQE_PTR_OFF(cq0, 2, 64, flags));
+		rte_prefetch0(CQE_PTR_OFF(cq0, 3, 64, flags));
+	} else {
+		RTE_SET_USED(head);
+	}
+
+	while (packets < pkts) {
+		if (!(flags & NIX_RX_VWQE_F)) {
+			/* Exit loop if head is about to wrap and become
+			 * unaligned.
+			 */
+			if (((head + NIX_DESCS_PER_LOOP - 1) & qmask) < NIX_DESCS_PER_LOOP) {
+				pkts_left += (pkts - packets);
+				break;
+			}
+
+			cq0 = desc + CQE_SZ(head);
+		} else {
+			cq0 = (uintptr_t)&mbufs[packets];
+		}
+
+		if (flags & NIX_RX_VWQE_F) {
+			if (pkts - packets > 4) {
+				rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 4, 0, flags));
+				rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 5, 0, flags));
+				rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 6, 0, flags));
+				rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 7, 0, flags));
+
+				if (likely(pkts - packets > 8)) {
+					rte_prefetch1(CQE_PTR_OFF(cq0, 8, 0, flags));
+					rte_prefetch1(CQE_PTR_OFF(cq0, 9, 0, flags));
+					rte_prefetch1(CQE_PTR_OFF(cq0, 10, 0, flags));
+					rte_prefetch1(CQE_PTR_OFF(cq0, 11, 0, flags));
+					if (pkts - packets > 12) {
+						rte_prefetch1(CQE_PTR_OFF(cq0, 12, 0, flags));
+						rte_prefetch1(CQE_PTR_OFF(cq0, 13, 0, flags));
+						rte_prefetch1(CQE_PTR_OFF(cq0, 14, 0, flags));
+						rte_prefetch1(CQE_PTR_OFF(cq0, 15, 0, flags));
+					}
+				}
+
+				rte_prefetch0(CQE_PTR_DIFF(cq0, 4, RTE_PKTMBUF_HEADROOM, flags));
+				rte_prefetch0(CQE_PTR_DIFF(cq0, 5, RTE_PKTMBUF_HEADROOM, flags));
+				rte_prefetch0(CQE_PTR_DIFF(cq0, 6, RTE_PKTMBUF_HEADROOM, flags));
+				rte_prefetch0(CQE_PTR_DIFF(cq0, 7, RTE_PKTMBUF_HEADROOM, flags));
+
+				if (likely(pkts - packets > 8)) {
+					rte_prefetch0(
+						CQE_PTR_DIFF(cq0, 8, RTE_PKTMBUF_HEADROOM, flags));
+					rte_prefetch0(
+						CQE_PTR_DIFF(cq0, 9, RTE_PKTMBUF_HEADROOM, flags));
+					rte_prefetch0(
+						CQE_PTR_DIFF(cq0, 10, RTE_PKTMBUF_HEADROOM, flags));
+					rte_prefetch0(
+						CQE_PTR_DIFF(cq0, 11, RTE_PKTMBUF_HEADROOM, flags));
+				}
+			}
+		} else {
+			if (pkts - packets > 8) {
+				if (flags) {
+					rte_prefetch0(CQE_PTR_OFF(cq0, 8, 0, flags));
+					rte_prefetch0(CQE_PTR_OFF(cq0, 9, 0, flags));
+					rte_prefetch0(CQE_PTR_OFF(cq0, 10, 0, flags));
+					rte_prefetch0(CQE_PTR_OFF(cq0, 11, 0, flags));
+				}
+				rte_prefetch0(CQE_PTR_OFF(cq0, 8, 64, flags));
+				rte_prefetch0(CQE_PTR_OFF(cq0, 9, 64, flags));
+				rte_prefetch0(CQE_PTR_OFF(cq0, 10, 64, flags));
+				rte_prefetch0(CQE_PTR_OFF(cq0, 11, 64, flags));
+			}
+		}
+
+		if (!(flags & NIX_RX_VWQE_F)) {
+			/* Get NIX_RX_SG_S for size and buffer pointer */
+			cq0_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 0, 64, flags));
+			cq1_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 1, 64, flags));
+			cq2_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 2, 64, flags));
+			cq3_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 3, 64, flags));
+
+			/* Extract mbuf from NIX_RX_SG_S */
+			mbuf01 = vzip2q_u64(cq0_w8, cq1_w8);
+			mbuf23 = vzip2q_u64(cq2_w8, cq3_w8);
+			mbuf01 = vqsubq_u64(mbuf01, data_off);
+			mbuf23 = vqsubq_u64(mbuf23, data_off);
+		} else {
+			mbuf01 = vsubq_u64(vld1q_u64((uint64_t *)cq0),
+					   vdupq_n_u64(sizeof(struct rte_mbuf)));
+			mbuf23 = vsubq_u64(vld1q_u64((uint64_t *)(cq0 + 16)),
+					   vdupq_n_u64(sizeof(struct rte_mbuf)));
+		}
+
+		/* Move mbufs to scalar registers for future use */
+		mbuf0 = (struct rte_mbuf *)vgetq_lane_u64(mbuf01, 0);
+		mbuf1 = (struct rte_mbuf *)vgetq_lane_u64(mbuf01, 1);
+		mbuf2 = (struct rte_mbuf *)vgetq_lane_u64(mbuf23, 0);
+		mbuf3 = (struct rte_mbuf *)vgetq_lane_u64(mbuf23, 1);
+
+		/* Mark mempool obj as "get" as it is alloc'ed by NIX */
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf0->pool, (void **)&mbuf0, 1, 1);
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf1->pool, (void **)&mbuf1, 1, 1);
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf2->pool, (void **)&mbuf2, 1, 1);
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf3->pool, (void **)&mbuf3, 1, 1);
+
+		if (!(flags & NIX_RX_VWQE_F)) {
+			/* Mask to get packet len from NIX_RX_SG_S */
+			const uint8x16_t shuf_msk = {
+				0xFF, 0xFF, /* pkt_type set as unknown */
+				0xFF, 0xFF, /* pkt_type set as unknown */
+				0,    1,    /* octet 1~0, low 16 bits pkt_len */
+				0xFF, 0xFF, /* skip high 16it pkt_len, zero out */
+				0,    1,    /* octet 1~0, 16 bits data_len */
+				0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF};
+
+			/* Form the rx_descriptor_fields1 with pkt_len and data_len */
+			f0 = vqtbl1q_u8(cq0_w8, shuf_msk);
+			f1 = vqtbl1q_u8(cq1_w8, shuf_msk);
+			f2 = vqtbl1q_u8(cq2_w8, shuf_msk);
+			f3 = vqtbl1q_u8(cq3_w8, shuf_msk);
+		}
+
+		/* Load CQE word0 and word 1 */
+		const uint64_t cq0_w0 = *CQE_PTR_OFF(cq0, 0, 0, flags);
+		const uint64_t cq0_w1 = *CQE_PTR_OFF(cq0, 0, 8, flags);
+		const uint64_t cq0_w2 = *CQE_PTR_OFF(cq0, 0, 16, flags);
+		const uint64_t cq1_w0 = *CQE_PTR_OFF(cq0, 1, 0, flags);
+		const uint64_t cq1_w1 = *CQE_PTR_OFF(cq0, 1, 8, flags);
+		const uint64_t cq1_w2 = *CQE_PTR_OFF(cq0, 1, 16, flags);
+		const uint64_t cq2_w0 = *CQE_PTR_OFF(cq0, 2, 0, flags);
+		const uint64_t cq2_w1 = *CQE_PTR_OFF(cq0, 2, 8, flags);
+		const uint64_t cq2_w2 = *CQE_PTR_OFF(cq0, 2, 16, flags);
+		const uint64_t cq3_w0 = *CQE_PTR_OFF(cq0, 3, 0, flags);
+		const uint64_t cq3_w1 = *CQE_PTR_OFF(cq0, 3, 8, flags);
+		const uint64_t cq3_w2 = *CQE_PTR_OFF(cq0, 3, 16, flags);
+
+		if (flags & NIX_RX_VWQE_F) {
+			uint16_t psize0, psize1, psize2, psize3;
+
+			psize0 = (cq0_w2 & 0xFFFF) + 1;
+			psize1 = (cq1_w2 & 0xFFFF) + 1;
+			psize2 = (cq2_w2 & 0xFFFF) + 1;
+			psize3 = (cq3_w2 & 0xFFFF) + 1;
+
+			f0 = vdupq_n_u64(0);
+			f1 = vdupq_n_u64(0);
+			f2 = vdupq_n_u64(0);
+			f3 = vdupq_n_u64(0);
+
+			f0 = vsetq_lane_u16(psize0, f0, 2);
+			f0 = vsetq_lane_u16(psize0, f0, 4);
+
+			f1 = vsetq_lane_u16(psize1, f1, 2);
+			f1 = vsetq_lane_u16(psize1, f1, 4);
+
+			f2 = vsetq_lane_u16(psize2, f2, 2);
+			f2 = vsetq_lane_u16(psize2, f2, 4);
+
+			f3 = vsetq_lane_u16(psize3, f3, 2);
+			f3 = vsetq_lane_u16(psize3, f3, 4);
+		}
+
+		if (flags & NIX_RX_OFFLOAD_RSS_F) {
+			/* Fill rss in the rx_descriptor_fields1 */
+			f0 = vsetq_lane_u32(cq0_w0, f0, 3);
+			f1 = vsetq_lane_u32(cq1_w0, f1, 3);
+			f2 = vsetq_lane_u32(cq2_w0, f2, 3);
+			f3 = vsetq_lane_u32(cq3_w0, f3, 3);
+			ol_flags0 = RTE_MBUF_F_RX_RSS_HASH;
+			ol_flags1 = RTE_MBUF_F_RX_RSS_HASH;
+			ol_flags2 = RTE_MBUF_F_RX_RSS_HASH;
+			ol_flags3 = RTE_MBUF_F_RX_RSS_HASH;
+		} else {
+			ol_flags0 = 0;
+			ol_flags1 = 0;
+			ol_flags2 = 0;
+			ol_flags3 = 0;
+		}
+
+		if (flags & NIX_RX_OFFLOAD_PTYPE_F) {
+			/* Fill packet_type in the rx_descriptor_fields1 */
+			f0 = vsetq_lane_u32(nix_ptype_get(lookup_mem, cq0_w1), f0, 0);
+			f1 = vsetq_lane_u32(nix_ptype_get(lookup_mem, cq1_w1), f1, 0);
+			f2 = vsetq_lane_u32(nix_ptype_get(lookup_mem, cq2_w1), f2, 0);
+			f3 = vsetq_lane_u32(nix_ptype_get(lookup_mem, cq3_w1), f3, 0);
+		}
+
+		if (flags & NIX_RX_OFFLOAD_CHECKSUM_F) {
+			ol_flags0 |= (uint64_t)nix_rx_olflags_get(lookup_mem, cq0_w1);
+			ol_flags1 |= (uint64_t)nix_rx_olflags_get(lookup_mem, cq1_w1);
+			ol_flags2 |= (uint64_t)nix_rx_olflags_get(lookup_mem, cq2_w1);
+			ol_flags3 |= (uint64_t)nix_rx_olflags_get(lookup_mem, cq3_w1);
+		}
+
+		if (flags & NIX_RX_OFFLOAD_VLAN_STRIP_F) {
+			ol_flags0 = nix_vlan_update(cq0_w2, ol_flags0, &f0);
+			ol_flags1 = nix_vlan_update(cq1_w2, ol_flags1, &f1);
+			ol_flags2 = nix_vlan_update(cq2_w2, ol_flags2, &f2);
+			ol_flags3 = nix_vlan_update(cq3_w2, ol_flags3, &f3);
+
+			ol_flags0 = nix_qinq_update(cq0_w2, ol_flags0, mbuf0);
+			ol_flags1 = nix_qinq_update(cq1_w2, ol_flags1, mbuf1);
+			ol_flags2 = nix_qinq_update(cq2_w2, ol_flags2, mbuf2);
+			ol_flags3 = nix_qinq_update(cq3_w2, ol_flags3, mbuf3);
+		}
+
+		if (flags & NIX_RX_OFFLOAD_MARK_UPDATE_F) {
+			ol_flags0 = nix_update_match_id(*(uint16_t *)CQE_PTR_OFF(cq0, 0, 38, flags),
+							ol_flags0, mbuf0);
+			ol_flags1 = nix_update_match_id(*(uint16_t *)CQE_PTR_OFF(cq0, 1, 38, flags),
+							ol_flags1, mbuf1);
+			ol_flags2 = nix_update_match_id(*(uint16_t *)CQE_PTR_OFF(cq0, 2, 38, flags),
+							ol_flags2, mbuf2);
+			ol_flags3 = nix_update_match_id(*(uint16_t *)CQE_PTR_OFF(cq0, 3, 38, flags),
+							ol_flags3, mbuf3);
+		}
+
+		if ((flags & NIX_RX_OFFLOAD_TSTAMP_F) && ((flags & NIX_RX_VWQE_F) && tstamp)) {
+			const uint16x8_t len_off = {0,				 /* ptype   0:15 */
+						    0,				 /* ptype  16:32 */
+						    CNXK_NIX_TIMESYNC_RX_OFFSET, /* pktlen  0:15*/
+						    0,				 /* pktlen 16:32 */
+						    CNXK_NIX_TIMESYNC_RX_OFFSET, /* datalen 0:15 */
+						    0,
+						    0,
+						    0};
+			const uint32x4_t ptype = {
+				RTE_PTYPE_L2_ETHER_TIMESYNC, RTE_PTYPE_L2_ETHER_TIMESYNC,
+				RTE_PTYPE_L2_ETHER_TIMESYNC, RTE_PTYPE_L2_ETHER_TIMESYNC};
+			const uint64_t ts_olf = RTE_MBUF_F_RX_IEEE1588_PTP |
+						RTE_MBUF_F_RX_IEEE1588_TMST |
+						tstamp->rx_tstamp_dynflag;
+			const uint32x4_t and_mask = {0x1, 0x2, 0x4, 0x8};
+			uint64x2_t ts01, ts23, mask;
+			uint64_t ts[4];
+			uint8_t res;
+
+			/* Subtract timesync length from total pkt length. */
+			f0 = vsubq_u16(f0, len_off);
+			f1 = vsubq_u16(f1, len_off);
+			f2 = vsubq_u16(f2, len_off);
+			f3 = vsubq_u16(f3, len_off);
+
+			/* Get the address of actual timestamp. */
+			ts01 = vaddq_u64(mbuf01, data_off);
+			ts23 = vaddq_u64(mbuf23, data_off);
+			/* Load timestamp from address. */
+			ts01 = vsetq_lane_u64(*(uint64_t *)vgetq_lane_u64(ts01, 0), ts01, 0);
+			ts01 = vsetq_lane_u64(*(uint64_t *)vgetq_lane_u64(ts01, 1), ts01, 1);
+			ts23 = vsetq_lane_u64(*(uint64_t *)vgetq_lane_u64(ts23, 0), ts23, 0);
+			ts23 = vsetq_lane_u64(*(uint64_t *)vgetq_lane_u64(ts23, 1), ts23, 1);
+			/* Convert from be to cpu byteorder. */
+			ts01 = vrev64q_u8(ts01);
+			ts23 = vrev64q_u8(ts23);
+			/* Store timestamp into scalar for later use. */
+			ts[0] = vgetq_lane_u64(ts01, 0);
+			ts[1] = vgetq_lane_u64(ts01, 1);
+			ts[2] = vgetq_lane_u64(ts23, 0);
+			ts[3] = vgetq_lane_u64(ts23, 1);
+
+			/* Store timestamp into dynfield. */
+			*cnxk_nix_timestamp_dynfield(mbuf0, tstamp) = ts[0];
+			*cnxk_nix_timestamp_dynfield(mbuf1, tstamp) = ts[1];
+			*cnxk_nix_timestamp_dynfield(mbuf2, tstamp) = ts[2];
+			*cnxk_nix_timestamp_dynfield(mbuf3, tstamp) = ts[3];
+
+			/* Generate ptype mask to filter L2 ether timesync */
+			mask = vdupq_n_u32(vgetq_lane_u32(f0, 0));
+			mask = vsetq_lane_u32(vgetq_lane_u32(f1, 0), mask, 1);
+			mask = vsetq_lane_u32(vgetq_lane_u32(f2, 0), mask, 2);
+			mask = vsetq_lane_u32(vgetq_lane_u32(f3, 0), mask, 3);
+
+			/* Match against L2 ether timesync. */
+			mask = vceqq_u32(mask, ptype);
+			/* Convert from vector from scalar mask */
+			res = vaddvq_u32(vandq_u32(mask, and_mask));
+			res &= 0xF;
+
+			if (res) {
+				/* Fill in the ol_flags for any packets that
+				 * matched.
+				 */
+				ol_flags0 |= ((res & 0x1) ? ts_olf : 0);
+				ol_flags1 |= ((res & 0x2) ? ts_olf : 0);
+				ol_flags2 |= ((res & 0x4) ? ts_olf : 0);
+				ol_flags3 |= ((res & 0x8) ? ts_olf : 0);
+
+				/* Update Rxq timestamp with the latest
+				 * timestamp.
+				 */
+				tstamp->rx_ready = 1;
+				tstamp->rx_tstamp = ts[31 - rte_clz32(res)];
+			}
+		}
+
+		/* Form rearm_data with ol_flags */
+		rearm0 = vsetq_lane_u64(ol_flags0, rearm0, 1);
+		rearm1 = vsetq_lane_u64(ol_flags1, rearm1, 1);
+		rearm2 = vsetq_lane_u64(ol_flags2, rearm2, 1);
+		rearm3 = vsetq_lane_u64(ol_flags3, rearm3, 1);
+
+		/* Update rx_descriptor_fields1 */
+		vst1q_u64((uint64_t *)mbuf0->rx_descriptor_fields1, f0);
+		vst1q_u64((uint64_t *)mbuf1->rx_descriptor_fields1, f1);
+		vst1q_u64((uint64_t *)mbuf2->rx_descriptor_fields1, f2);
+		vst1q_u64((uint64_t *)mbuf3->rx_descriptor_fields1, f3);
+
+		/* Update rearm_data */
+		vst1q_u64((uint64_t *)mbuf0->rearm_data, rearm0);
+		vst1q_u64((uint64_t *)mbuf1->rearm_data, rearm1);
+		vst1q_u64((uint64_t *)mbuf2->rearm_data, rearm2);
+		vst1q_u64((uint64_t *)mbuf3->rearm_data, rearm3);
+
+		if (flags & NIX_RX_MULTI_SEG_F) {
+			/* Multi segment is enable build mseg list for
+			 * individual mbufs in scalar mode.
+			 */
+			nix_cqe_xtract_mseg((union nix_rx_parse_u *)(CQE_PTR_OFF(cq0, 0, 8, flags)),
+					    mbuf0, mbuf_initializer, cpth0, sa_base, flags);
+			nix_cqe_xtract_mseg((union nix_rx_parse_u *)(CQE_PTR_OFF(cq0, 1, 8, flags)),
+					    mbuf1, mbuf_initializer, cpth1, sa_base, flags);
+			nix_cqe_xtract_mseg((union nix_rx_parse_u *)(CQE_PTR_OFF(cq0, 2, 8, flags)),
+					    mbuf2, mbuf_initializer, cpth2, sa_base, flags);
+			nix_cqe_xtract_mseg((union nix_rx_parse_u *)(CQE_PTR_OFF(cq0, 3, 8, flags)),
+					    mbuf3, mbuf_initializer, cpth3, sa_base, flags);
+		}
+
+		/* Store the mbufs to rx_pkts */
+		vst1q_u64((uint64_t *)&mbufs[packets], mbuf01);
+		vst1q_u64((uint64_t *)&mbufs[packets + 2], mbuf23);
+
+		nix_mbuf_validate_next(mbuf0);
+		nix_mbuf_validate_next(mbuf1);
+		nix_mbuf_validate_next(mbuf2);
+		nix_mbuf_validate_next(mbuf3);
+
+		packets += NIX_DESCS_PER_LOOP;
+
+		if (!(flags & NIX_RX_VWQE_F)) {
+			/* Advance head pointer and packets */
+			head += NIX_DESCS_PER_LOOP;
+			head &= qmask;
+		}
+	}
+
+	if (flags & NIX_RX_VWQE_F)
+		return packets;
+
+	rxq->head = head;
+	rxq->available -= packets;
+
+	rte_io_wmb();
+	/* Free all the CQs that we've processed */
+	plt_write64((rxq->wdata | packets), rxq->cq_door);
+
+	if (unlikely(pkts_left))
+		packets += cn20k_nix_recv_pkts(args, &mbufs[packets], pkts_left, flags);
+
+	return packets;
+}
+
+#else
+
+static inline uint16_t
+cn20k_nix_recv_pkts_vector(void *args, struct rte_mbuf **mbufs, uint16_t pkts, const uint16_t flags,
+			   void *lookup_mem, struct cnxk_timesync_info *tstamp, uintptr_t lmt_base,
+			   uint64_t meta_aura)
+{
+	RTE_SET_USED(args);
+	RTE_SET_USED(mbufs);
+	RTE_SET_USED(pkts);
+	RTE_SET_USED(flags);
+	RTE_SET_USED(lookup_mem);
+	RTE_SET_USED(tstamp);
+	RTE_SET_USED(lmt_base);
+	RTE_SET_USED(meta_aura);
+
+	return 0;
+}
+
+#endif
+
 #define RSS_F	  NIX_RX_OFFLOAD_RSS_F
 #define PTYPE_F	  NIX_RX_OFFLOAD_PTYPE_F
 #define CKSUM_F	  NIX_RX_OFFLOAD_CHECKSUM_F
@@ -618,10 +1075,8 @@ NIX_RX_FASTPATH_MODES
 	uint16_t __rte_noinline __rte_hot fn(void *rx_queue, struct rte_mbuf **rx_pkts,            \
 					     uint16_t pkts)                                        \
 	{                                                                                          \
-		RTE_SET_USED(rx_queue);                                                            \
-		RTE_SET_USED(rx_pkts);                                                             \
-		RTE_SET_USED(pkts);                                                                \
-		return 0;                                                                          \
+		return cn20k_nix_recv_pkts_vector(rx_queue, rx_pkts, pkts, (flags), NULL, NULL, 0, \
+						  0);                                              \
 	}
 
 #define NIX_RX_RECV_VEC_MSEG(fn, flags) NIX_RX_RECV_VEC(fn, flags | NIX_RX_MULTI_SEG_F)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 15/18] net/cnxk: support Tx burst scalar for cn20k
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
                     ` (13 preceding siblings ...)
  2024-09-26 16:01   ` [PATCH v2 14/18] net/cnxk: support Rx burst vector " Nithin Dabilpuram
@ 2024-09-26 16:01   ` Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 16/18] net/cnxk: support Tx multi-seg in cn20k Nithin Dabilpuram
                     ` (3 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-26 16:01 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rahul Bhansali, Pavan Nikhilesh

Add scalar Tx support for cn20k.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 drivers/net/cnxk/cn20k_ethdev.c | 127 ++++
 drivers/net/cnxk/cn20k_tx.h     | 987 +++++++++++++++++++++++++++++++-
 2 files changed, 1110 insertions(+), 4 deletions(-)

diff --git a/drivers/net/cnxk/cn20k_ethdev.c b/drivers/net/cnxk/cn20k_ethdev.c
index cad7b1316a..011c5f8362 100644
--- a/drivers/net/cnxk/cn20k_ethdev.c
+++ b/drivers/net/cnxk/cn20k_ethdev.c
@@ -361,6 +361,10 @@ static int
 cn20k_nix_tx_queue_stop(struct rte_eth_dev *eth_dev, uint16_t qidx)
 {
 	struct cn20k_eth_txq *txq = eth_dev->data->tx_queues[qidx];
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	uint16_t flags = dev->tx_offload_flags;
+	struct roc_nix *nix = &dev->nix;
+	uint32_t head = 0, tail = 0;
 	int rc;
 
 	rc = cnxk_nix_tx_queue_stop(eth_dev, qidx);
@@ -370,6 +374,20 @@ cn20k_nix_tx_queue_stop(struct rte_eth_dev *eth_dev, uint16_t qidx)
 	/* Clear fc cache pkts to trigger worker stop */
 	txq->fc_cache_pkts = 0;
 
+	if ((flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) && txq->tx_compl.ena) {
+		struct roc_nix_sq *sq = &dev->sqs[qidx];
+		do {
+			handle_tx_completion_pkts(txq, flags & NIX_TX_VWQE_F);
+			/* Check if SQ is empty */
+			roc_nix_sq_head_tail_get(nix, sq->qid, &head, &tail);
+			if (head != tail)
+				continue;
+
+			/* Check if completion CQ is empty */
+			roc_nix_cq_head_tail_get(nix, sq->cqid, &head, &tail);
+		} while (head != tail);
+	}
+
 	return 0;
 }
 
@@ -690,6 +708,112 @@ cn20k_rx_descriptor_dump(const struct rte_eth_dev *eth_dev, uint16_t qid, uint16
 	return 0;
 }
 
+static int
+cn20k_nix_tm_mark_vlan_dei(struct rte_eth_dev *eth_dev, int mark_green, int mark_yellow,
+			   int mark_red, struct rte_tm_error *error)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct roc_nix *roc_nix = &dev->nix;
+	uint64_t mark_fmt, mark_flag;
+	int rc, i;
+
+	rc = cnxk_nix_tm_mark_vlan_dei(eth_dev, mark_green, mark_yellow, mark_red, error);
+
+	if (rc)
+		goto exit;
+
+	mark_fmt = roc_nix_tm_mark_format_get(roc_nix, &mark_flag);
+	if (mark_flag) {
+		dev->tx_offload_flags |= NIX_TX_OFFLOAD_VLAN_QINQ_F;
+		dev->tx_mark = true;
+	} else {
+		dev->tx_mark = false;
+		if (!(dev->tx_offloads & RTE_ETH_TX_OFFLOAD_VLAN_INSERT ||
+		      dev->tx_offloads & RTE_ETH_TX_OFFLOAD_QINQ_INSERT))
+			dev->tx_offload_flags &= ~NIX_TX_OFFLOAD_VLAN_QINQ_F;
+	}
+
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		struct cn20k_eth_txq *txq = eth_dev->data->tx_queues[i];
+
+		txq->mark_flag = mark_flag & CNXK_TM_MARK_MASK;
+		txq->mark_fmt = mark_fmt & CNXK_TX_MARK_FMT_MASK;
+	}
+	cn20k_eth_set_tx_function(eth_dev);
+exit:
+	return rc;
+}
+
+static int
+cn20k_nix_tm_mark_ip_ecn(struct rte_eth_dev *eth_dev, int mark_green, int mark_yellow, int mark_red,
+			 struct rte_tm_error *error)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct roc_nix *roc_nix = &dev->nix;
+	uint64_t mark_fmt, mark_flag;
+	int rc, i;
+
+	rc = cnxk_nix_tm_mark_ip_ecn(eth_dev, mark_green, mark_yellow, mark_red, error);
+	if (rc)
+		goto exit;
+
+	mark_fmt = roc_nix_tm_mark_format_get(roc_nix, &mark_flag);
+	if (mark_flag) {
+		dev->tx_offload_flags |= NIX_TX_OFFLOAD_VLAN_QINQ_F;
+		dev->tx_mark = true;
+	} else {
+		dev->tx_mark = false;
+		if (!(dev->tx_offloads & RTE_ETH_TX_OFFLOAD_VLAN_INSERT ||
+		      dev->tx_offloads & RTE_ETH_TX_OFFLOAD_QINQ_INSERT))
+			dev->tx_offload_flags &= ~NIX_TX_OFFLOAD_VLAN_QINQ_F;
+	}
+
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		struct cn20k_eth_txq *txq = eth_dev->data->tx_queues[i];
+
+		txq->mark_flag = mark_flag & CNXK_TM_MARK_MASK;
+		txq->mark_fmt = mark_fmt & CNXK_TX_MARK_FMT_MASK;
+	}
+	cn20k_eth_set_tx_function(eth_dev);
+exit:
+	return rc;
+}
+
+static int
+cn20k_nix_tm_mark_ip_dscp(struct rte_eth_dev *eth_dev, int mark_green, int mark_yellow,
+			  int mark_red, struct rte_tm_error *error)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct roc_nix *roc_nix = &dev->nix;
+	uint64_t mark_fmt, mark_flag;
+	int rc, i;
+
+	rc = cnxk_nix_tm_mark_ip_dscp(eth_dev, mark_green, mark_yellow, mark_red, error);
+	if (rc)
+		goto exit;
+
+	mark_fmt = roc_nix_tm_mark_format_get(roc_nix, &mark_flag);
+	if (mark_flag) {
+		dev->tx_offload_flags |= NIX_TX_OFFLOAD_VLAN_QINQ_F;
+		dev->tx_mark = true;
+	} else {
+		dev->tx_mark = false;
+		if (!(dev->tx_offloads & RTE_ETH_TX_OFFLOAD_VLAN_INSERT ||
+		      dev->tx_offloads & RTE_ETH_TX_OFFLOAD_QINQ_INSERT))
+			dev->tx_offload_flags &= ~NIX_TX_OFFLOAD_VLAN_QINQ_F;
+	}
+
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		struct cn20k_eth_txq *txq = eth_dev->data->tx_queues[i];
+
+		txq->mark_flag = mark_flag & CNXK_TM_MARK_MASK;
+		txq->mark_fmt = mark_fmt & CNXK_TX_MARK_FMT_MASK;
+	}
+	cn20k_eth_set_tx_function(eth_dev);
+exit:
+	return rc;
+}
+
 /* Update platform specific eth dev ops */
 static void
 nix_eth_dev_ops_override(void)
@@ -728,6 +852,9 @@ nix_tm_ops_override(void)
 	init_once = 1;
 
 	/* Update platform specific ops */
+	cnxk_tm_ops.mark_vlan_dei = cn20k_nix_tm_mark_vlan_dei;
+	cnxk_tm_ops.mark_ip_ecn = cn20k_nix_tm_mark_ip_ecn;
+	cnxk_tm_ops.mark_ip_dscp = cn20k_nix_tm_mark_ip_dscp;
 }
 
 static void
diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
index 9fd925ac34..610d64f21b 100644
--- a/drivers/net/cnxk/cn20k_tx.h
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -32,6 +32,984 @@
 #define NIX_TX_NEED_EXT_HDR                                                                        \
 	(NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSTAMP_F | NIX_TX_OFFLOAD_TSO_F)
 
+#define NIX_XMIT_FC_OR_RETURN(txq, pkts)                                                           \
+	do {                                                                                       \
+		int64_t avail;                                                                     \
+		/* Cached value is low, Update the fc_cache_pkts */                                \
+		if (unlikely((txq)->fc_cache_pkts < (pkts))) {                                     \
+			avail = txq->nb_sqb_bufs_adj - *txq->fc_mem;                               \
+			/* Multiply with sqe_per_sqb to express in pkts */                         \
+			(txq)->fc_cache_pkts = (avail << (txq)->sqes_per_sqb_log2) - avail;        \
+			/* Check it again for the room */                                          \
+			if (unlikely((txq)->fc_cache_pkts < (pkts)))                               \
+				return 0;                                                          \
+		}                                                                                  \
+	} while (0)
+
+#define NIX_XMIT_FC_OR_RETURN_MTS(txq, pkts)                                                       \
+	do {                                                                                       \
+		int64_t *fc_cache = &(txq)->fc_cache_pkts;                                         \
+		uint8_t retry_count = 8;                                                           \
+		int64_t val, newval;                                                               \
+	retry:                                                                                     \
+		/* Reduce the cached count */                                                      \
+		val = (int64_t)__atomic_fetch_sub(fc_cache, pkts, __ATOMIC_RELAXED);               \
+		val -= pkts;                                                                       \
+		/* Cached value is low, Update the fc_cache_pkts */                                \
+		if (unlikely(val < 0)) {                                                           \
+			/* Multiply with sqe_per_sqb to express in pkts */                         \
+			newval = txq->nb_sqb_bufs_adj -                                            \
+				 __atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED);                   \
+			newval = (newval << (txq)->sqes_per_sqb_log2) - newval;                    \
+			newval -= pkts;                                                            \
+			if (!__atomic_compare_exchange_n(fc_cache, &val, newval, false,            \
+							 __ATOMIC_RELAXED, __ATOMIC_RELAXED)) {    \
+				if (retry_count) {                                                 \
+					retry_count--;                                             \
+					goto retry;                                                \
+				} else                                                             \
+					return 0;                                                  \
+			}                                                                          \
+			/* Update and check it again for the room */                               \
+			if (unlikely(newval < 0))                                                  \
+				return 0;                                                          \
+		}                                                                                  \
+	} while (0)
+
+#define NIX_XMIT_FC_CHECK_RETURN(txq, pkts)                                                        \
+	do {                                                                                       \
+		if (unlikely((txq)->flag))                                                         \
+			NIX_XMIT_FC_OR_RETURN_MTS(txq, pkts);                                      \
+		else {                                                                             \
+			NIX_XMIT_FC_OR_RETURN(txq, pkts);                                          \
+			/* Reduce the cached count */                                              \
+			txq->fc_cache_pkts -= pkts;                                                \
+		}                                                                                  \
+	} while (0)
+
+/* Encoded number of segments to number of dwords macro, each value of nb_segs
+ * is encoded as 4bits.
+ */
+#define NIX_SEGDW_MAGIC 0x76654432210ULL
+
+#define NIX_NB_SEGS_TO_SEGDW(x) ((NIX_SEGDW_MAGIC >> ((x) << 2)) & 0xF)
+
+static __plt_always_inline uint8_t
+cn20k_nix_mbuf_sg_dwords(struct rte_mbuf *m)
+{
+	uint32_t nb_segs = m->nb_segs;
+	uint16_t aura0, aura;
+	int segw, sg_segs;
+
+	aura0 = roc_npa_aura_handle_to_aura(m->pool->pool_id);
+
+	nb_segs--;
+	segw = 2;
+	sg_segs = 1;
+	while (nb_segs) {
+		m = m->next;
+		aura = roc_npa_aura_handle_to_aura(m->pool->pool_id);
+		if (aura != aura0) {
+			segw += 2 + (sg_segs == 2);
+			sg_segs = 0;
+		} else {
+			segw += (sg_segs == 0); /* SUBDC */
+			segw += 1;		/* IOVA */
+			sg_segs += 1;
+			sg_segs %= 3;
+		}
+		nb_segs--;
+	}
+
+	return (segw + 1) / 2;
+}
+
+static __plt_always_inline void
+cn20k_nix_tx_mbuf_validate(struct rte_mbuf *m, const uint32_t flags)
+{
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+	uint16_t segdw;
+
+	segdw = cn20k_nix_mbuf_sg_dwords(m);
+	segdw += 1 + !!(flags & NIX_TX_NEED_EXT_HDR) + !!(flags & NIX_TX_OFFLOAD_TSTAMP_F);
+
+	PLT_ASSERT(segdw <= 8);
+#else
+	RTE_SET_USED(m);
+	RTE_SET_USED(flags);
+#endif
+}
+
+static __plt_always_inline void
+cn20k_nix_vwqe_wait_fc(struct cn20k_eth_txq *txq, uint16_t req)
+{
+	int64_t cached, refill;
+	int64_t pkts;
+
+retry:
+#ifdef RTE_ARCH_ARM64
+
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldxr %[pkts], [%[addr]]			\n"
+		     "		tbz %[pkts], 63, .Ldne%=		\n"
+		     "		sevl					\n"
+		     ".Lrty%=:	wfe					\n"
+		     "		ldxr %[pkts], [%[addr]]			\n"
+		     "		tbnz %[pkts], 63, .Lrty%=		\n"
+		     ".Ldne%=:						\n"
+		     : [pkts] "=&r"(pkts)
+		     : [addr] "r"(&txq->fc_cache_pkts)
+		     : "memory");
+#else
+	RTE_SET_USED(pkts);
+	while (__atomic_load_n(&txq->fc_cache_pkts, __ATOMIC_RELAXED) < 0)
+		;
+#endif
+	cached = __atomic_fetch_sub(&txq->fc_cache_pkts, req, __ATOMIC_ACQUIRE) - req;
+	/* Check if there is enough space, else update and retry. */
+	if (cached >= 0)
+		return;
+
+	/* Check if we have space else retry. */
+#ifdef RTE_ARCH_ARM64
+	int64_t val;
+
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldxr %[val], [%[addr]]			\n"
+		     "		sub %[val], %[adj], %[val]		\n"
+		     "		lsl %[refill], %[val], %[shft]		\n"
+		     "		sub %[refill], %[refill], %[val]	\n"
+		     "		sub %[refill], %[refill], %[sub]	\n"
+		     "		cmp %[refill], #0x0			\n"
+		     "		b.ge .Ldne%=				\n"
+		     "		sevl					\n"
+		     ".Lrty%=:	wfe					\n"
+		     "		ldxr %[val], [%[addr]]			\n"
+		     "		sub %[val], %[adj], %[val]		\n"
+		     "		lsl %[refill], %[val], %[shft]		\n"
+		     "		sub %[refill], %[refill], %[val]	\n"
+		     "		sub %[refill], %[refill], %[sub]	\n"
+		     "		cmp %[refill], #0x0			\n"
+		     "		b.lt .Lrty%=				\n"
+		     ".Ldne%=:						\n"
+		     : [refill] "=&r"(refill), [val] "=&r" (val)
+		     : [addr] "r"(txq->fc_mem), [adj] "r"(txq->nb_sqb_bufs_adj),
+		       [shft] "r"(txq->sqes_per_sqb_log2), [sub] "r"(req)
+		     : "memory");
+#else
+	do {
+		refill = (txq->nb_sqb_bufs_adj - __atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED));
+		refill = (refill << txq->sqes_per_sqb_log2) - refill;
+		refill -= req;
+	} while (refill < 0);
+#endif
+	if (!__atomic_compare_exchange(&txq->fc_cache_pkts, &cached, &refill, 0, __ATOMIC_RELEASE,
+				       __ATOMIC_RELAXED))
+		goto retry;
+}
+
+/* Function to determine no of tx subdesc required in case ext
+ * sub desc is enabled.
+ */
+static __rte_always_inline int
+cn20k_nix_tx_ext_subs(const uint16_t flags)
+{
+	return (flags & NIX_TX_OFFLOAD_TSTAMP_F) ?
+		       2 :
+		       ((flags & (NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSO_F)) ? 1 : 0);
+}
+
+static __rte_always_inline uint64_t
+cn20k_nix_tx_steor_data(const uint16_t flags)
+{
+	const uint64_t dw_m1 = cn20k_nix_tx_ext_subs(flags) + 1;
+	uint64_t data;
+
+	/* This will be moved to addr area */
+	data = dw_m1;
+	/* 15 vector sizes for single seg */
+	data |= dw_m1 << 19;
+	data |= dw_m1 << 22;
+	data |= dw_m1 << 25;
+	data |= dw_m1 << 28;
+	data |= dw_m1 << 31;
+	data |= dw_m1 << 34;
+	data |= dw_m1 << 37;
+	data |= dw_m1 << 40;
+	data |= dw_m1 << 43;
+	data |= dw_m1 << 46;
+	data |= dw_m1 << 49;
+	data |= dw_m1 << 52;
+	data |= dw_m1 << 55;
+	data |= dw_m1 << 58;
+	data |= dw_m1 << 61;
+
+	return data;
+}
+
+static __rte_always_inline void
+cn20k_nix_tx_skeleton(struct cn20k_eth_txq *txq, uint64_t *cmd, const uint16_t flags,
+		      const uint16_t static_sz)
+{
+	if (static_sz)
+		cmd[0] = txq->send_hdr_w0;
+	else
+		cmd[0] = (txq->send_hdr_w0 & 0xFFFFF00000000000) |
+			 ((uint64_t)(cn20k_nix_tx_ext_subs(flags) + 1) << 40);
+	cmd[1] = 0;
+
+	if (flags & NIX_TX_NEED_EXT_HDR) {
+		if (flags & NIX_TX_OFFLOAD_TSTAMP_F)
+			cmd[2] = (NIX_SUBDC_EXT << 60) | BIT_ULL(15);
+		else
+			cmd[2] = NIX_SUBDC_EXT << 60;
+		cmd[3] = 0;
+		if (!(flags & NIX_TX_OFFLOAD_MBUF_NOFF_F))
+			cmd[4] = (NIX_SUBDC_SG << 60) | (NIX_SENDLDTYPE_LDWB << 58) | BIT_ULL(48);
+		else
+			cmd[4] = (NIX_SUBDC_SG << 60) | BIT_ULL(48);
+	} else {
+		if (!(flags & NIX_TX_OFFLOAD_MBUF_NOFF_F))
+			cmd[2] = (NIX_SUBDC_SG << 60) | (NIX_SENDLDTYPE_LDWB << 58) | BIT_ULL(48);
+		else
+			cmd[2] = (NIX_SUBDC_SG << 60) | BIT_ULL(48);
+	}
+}
+
+static __rte_always_inline void
+cn20k_nix_sec_fc_wait(struct cn20k_eth_txq *txq, uint16_t nb_pkts)
+{
+	int32_t nb_desc, val, newval;
+	int32_t *fc_sw;
+	uint64_t *fc;
+
+	/* Check if there is any CPT instruction to submit */
+	if (!nb_pkts)
+		return;
+
+again:
+	fc_sw = txq->cpt_fc_sw;
+#ifdef RTE_ARCH_ARM64
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldxr %w[pkts], [%[addr]]		\n"
+		     "		tbz %w[pkts], 31, .Ldne%=		\n"
+		     "		sevl					\n"
+		     ".Lrty%=:	wfe					\n"
+		     "		ldxr %w[pkts], [%[addr]]		\n"
+		     "		tbnz %w[pkts], 31, .Lrty%=		\n"
+		     ".Ldne%=:						\n"
+		     : [pkts] "=&r"(val)
+		     : [addr] "r"(fc_sw)
+		     : "memory");
+#else
+	/* Wait for primary core to refill FC. */
+	while (__atomic_load_n(fc_sw, __ATOMIC_RELAXED) < 0)
+		;
+#endif
+
+	val = __atomic_fetch_sub(fc_sw, nb_pkts, __ATOMIC_ACQUIRE) - nb_pkts;
+	if (likely(val >= 0))
+		return;
+
+	nb_desc = txq->cpt_desc;
+	fc = txq->cpt_fc;
+#ifdef RTE_ARCH_ARM64
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldxr %[refill], [%[addr]]		\n"
+		     "		sub %[refill], %[desc], %[refill]	\n"
+		     "		sub %[refill], %[refill], %[pkts]	\n"
+		     "		cmp %[refill], #0x0			\n"
+		     "		b.ge .Ldne%=				\n"
+		     "		sevl					\n"
+		     ".Lrty%=:	wfe					\n"
+		     "		ldxr %[refill], [%[addr]]		\n"
+		     "		sub %[refill], %[desc], %[refill]	\n"
+		     "		sub %[refill], %[refill], %[pkts]	\n"
+		     "		cmp %[refill], #0x0			\n"
+		     "		b.lt .Lrty%=				\n"
+		     ".Ldne%=:						\n"
+		     : [refill] "=&r"(newval)
+		     : [addr] "r"(fc), [desc] "r"(nb_desc), [pkts] "r"(nb_pkts)
+		     : "memory");
+#else
+	while (true) {
+		newval = nb_desc - __atomic_load_n(fc, __ATOMIC_RELAXED);
+		newval -= nb_pkts;
+		if (newval >= 0)
+			break;
+	}
+#endif
+
+	if (!__atomic_compare_exchange_n(fc_sw, &val, newval, false, __ATOMIC_RELEASE,
+					 __ATOMIC_RELAXED))
+		goto again;
+}
+
+#if defined(RTE_ARCH_ARM64)
+
+static __rte_always_inline void
+cn20k_nix_prep_sec(struct rte_mbuf *m, uint64_t *cmd, uintptr_t *nixtx_addr, uintptr_t lbase,
+		   uint8_t *lnum, uint8_t *loff, uint8_t *shft, uint64_t sa_base,
+		   const uint16_t flags)
+{
+	struct cn20k_sec_sess_priv sess_priv;
+	uint32_t pkt_len, dlen_adj, rlen;
+	struct nix_send_hdr_s *send_hdr;
+	uint8_t l3l4type, chksum;
+	uint64x2_t cmd01, cmd23;
+	union nix_send_sg_s *sg;
+	uint8_t l2_len, l3_len;
+	uintptr_t dptr, nixtx;
+	uint64_t ucode_cmd[4];
+	uint64_t *laddr, w0;
+	uint16_t tag;
+	uint64_t sa;
+
+	/* Move to our line from base */
+	sess_priv.u64 = *rte_security_dynfield(m);
+	send_hdr = (struct nix_send_hdr_s *)cmd;
+	if (flags & NIX_TX_NEED_EXT_HDR)
+		sg = (union nix_send_sg_s *)&cmd[4];
+	else
+		sg = (union nix_send_sg_s *)&cmd[2];
+
+	if (flags & NIX_TX_NEED_SEND_HDR_W1) {
+		/* Extract l3l4type either from il3il4type or ol3ol4type */
+		if (flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F && flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) {
+			l2_len = (cmd[1] >> 16) & 0xFF;
+			/* L4 ptr from send hdr includes l2 and l3 len */
+			l3_len = ((cmd[1] >> 24) & 0xFF) - l2_len;
+			l3l4type = (cmd[1] >> 40) & 0xFF;
+		} else {
+			l2_len = cmd[1] & 0xFF;
+			/* L4 ptr from send hdr includes l2 and l3 len */
+			l3_len = ((cmd[1] >> 8) & 0xFF) - l2_len;
+			l3l4type = (cmd[1] >> 32) & 0xFF;
+		}
+
+		chksum = (l3l4type & 0x1) << 1 | !!(l3l4type & 0x30);
+		chksum = ~chksum;
+		sess_priv.chksum = sess_priv.chksum & chksum;
+		/* Clear SEND header flags */
+		cmd[1] &= ~(0xFFFFUL << 32);
+	} else {
+		l2_len = m->l2_len;
+		l3_len = m->l3_len;
+	}
+
+	/* Retrieve DPTR */
+	dptr = *(uint64_t *)(sg + 1);
+	pkt_len = send_hdr->w0.total;
+
+	/* Calculate dlen adj */
+	dlen_adj = pkt_len - l2_len;
+	/* Exclude l3 len from roundup for transport mode */
+	dlen_adj -= sess_priv.mode ? 0 : l3_len;
+	rlen = (dlen_adj + sess_priv.roundup_len) + (sess_priv.roundup_byte - 1);
+	rlen &= ~(uint64_t)(sess_priv.roundup_byte - 1);
+	rlen += sess_priv.partial_len;
+	dlen_adj = rlen - dlen_adj;
+
+	/* Update send descriptors. Security is single segment only */
+	send_hdr->w0.total = pkt_len + dlen_adj;
+
+	/* CPT word 5 and word 6 */
+	w0 = 0;
+	ucode_cmd[2] = 0;
+	if (flags & NIX_TX_MULTI_SEG_F && m->nb_segs > 1) {
+		struct rte_mbuf *last = rte_pktmbuf_lastseg(m);
+
+		/* Get area where NIX descriptor needs to be stored */
+		nixtx = rte_pktmbuf_mtod_offset(last, uintptr_t, last->data_len + dlen_adj);
+		nixtx += BIT_ULL(7);
+		nixtx = (nixtx - 1) & ~(BIT_ULL(7) - 1);
+		nixtx += 16;
+
+		dptr = nixtx + ((flags & NIX_TX_NEED_EXT_HDR) ? 32 : 16);
+
+		/* Set l2 length as data offset */
+		w0 = (uint64_t)l2_len << 16;
+		w0 |= cn20k_nix_tx_ext_subs(flags) + NIX_NB_SEGS_TO_SEGDW(m->nb_segs);
+		ucode_cmd[1] = dptr | ((uint64_t)m->nb_segs << 60);
+	} else {
+		/* Get area where NIX descriptor needs to be stored */
+		nixtx = dptr + pkt_len + dlen_adj;
+		nixtx += BIT_ULL(7);
+		nixtx = (nixtx - 1) & ~(BIT_ULL(7) - 1);
+		nixtx += 16;
+
+		w0 |= cn20k_nix_tx_ext_subs(flags) + 1ULL;
+		dptr += l2_len;
+		ucode_cmd[1] = dptr;
+		sg->seg1_size = pkt_len + dlen_adj;
+		pkt_len -= l2_len;
+	}
+	w0 |= ((((int64_t)nixtx - (int64_t)dptr) & 0xFFFFF) << 32);
+	/* CPT word 0 and 1 */
+	cmd01 = vdupq_n_u64(0);
+	cmd01 = vsetq_lane_u64(w0, cmd01, 0);
+	/* CPT_RES_S is 16B above NIXTX */
+	cmd01 = vsetq_lane_u64(nixtx - 16, cmd01, 1);
+
+	/* Return nixtx addr */
+	*nixtx_addr = nixtx;
+
+	/* CPT Word 4 and Word 7 */
+	tag = sa_base & 0xFFFFUL;
+	sa_base &= ~0xFFFFUL;
+	sa = (uintptr_t)roc_nix_inl_ot_ipsec_outb_sa(sa_base, sess_priv.sa_idx);
+	ucode_cmd[3] = (ROC_CPT_DFLT_ENG_GRP_SE_IE << 61 | 1UL << 60 | sa);
+	ucode_cmd[0] = (ROC_IE_OT_MAJOR_OP_PROCESS_OUTBOUND_IPSEC << 48 | 1UL << 54 |
+			((uint64_t)sess_priv.chksum) << 32 | ((uint64_t)sess_priv.dec_ttl) << 34 |
+			pkt_len);
+
+	/* CPT word 2 and 3 */
+	cmd23 = vdupq_n_u64(0);
+	cmd23 = vsetq_lane_u64(
+		(((uint64_t)RTE_EVENT_TYPE_CPU << 28) | tag | CNXK_ETHDEV_SEC_OUTB_EV_SUB << 20),
+		cmd23, 0);
+	cmd23 = vsetq_lane_u64((uintptr_t)m | 1, cmd23, 1);
+
+	/* Move to our line */
+	laddr = LMT_OFF(lbase, *lnum, *loff ? 64 : 0);
+
+	/* Write CPT instruction to lmt line */
+	vst1q_u64(laddr, cmd01);
+	vst1q_u64((laddr + 2), cmd23);
+
+	*(__uint128_t *)(laddr + 4) = *(__uint128_t *)ucode_cmd;
+	*(__uint128_t *)(laddr + 6) = *(__uint128_t *)(ucode_cmd + 2);
+
+	/* Move to next line for every other CPT inst */
+	*loff = !(*loff);
+	*lnum = *lnum + (*loff ? 0 : 1);
+	*shft = *shft + (*loff ? 0 : 3);
+}
+
+#else
+
+static __rte_always_inline void
+cn20k_nix_prep_sec(struct rte_mbuf *m, uint64_t *cmd, uintptr_t *nixtx_addr, uintptr_t lbase,
+		   uint8_t *lnum, uint8_t *loff, uint8_t *shft, uint64_t sa_base,
+		   const uint16_t flags)
+{
+	RTE_SET_USED(m);
+	RTE_SET_USED(cmd);
+	RTE_SET_USED(nixtx_addr);
+	RTE_SET_USED(lbase);
+	RTE_SET_USED(lnum);
+	RTE_SET_USED(loff);
+	RTE_SET_USED(shft);
+	RTE_SET_USED(sa_base);
+	RTE_SET_USED(flags);
+}
+#endif
+
+static inline void
+cn20k_nix_free_extmbuf(struct rte_mbuf *m)
+{
+	struct rte_mbuf *m_next;
+	while (m != NULL) {
+		m_next = m->next;
+		rte_pktmbuf_free_seg(m);
+		m = m_next;
+	}
+}
+
+static __rte_always_inline uint64_t
+cn20k_nix_prefree_seg(struct rte_mbuf *m, struct rte_mbuf **extm, struct cn20k_eth_txq *txq,
+		      struct nix_send_hdr_s *send_hdr, uint64_t *aura)
+{
+	struct rte_mbuf *prev = NULL;
+	uint32_t sqe_id;
+
+	if (RTE_MBUF_HAS_EXTBUF(m)) {
+		if (unlikely(txq->tx_compl.ena == 0)) {
+			m->next = *extm;
+			*extm = m;
+			return 1;
+		}
+		if (send_hdr->w0.pnc) {
+			sqe_id = send_hdr->w1.sqe_id;
+			prev = txq->tx_compl.ptr[sqe_id];
+			m->next = prev;
+			txq->tx_compl.ptr[sqe_id] = m;
+		} else {
+			sqe_id = __atomic_fetch_add(&txq->tx_compl.sqe_id, 1, __ATOMIC_RELAXED);
+			send_hdr->w0.pnc = 1;
+			send_hdr->w1.sqe_id = sqe_id & txq->tx_compl.nb_desc_mask;
+			txq->tx_compl.ptr[send_hdr->w1.sqe_id] = m;
+		}
+		return 1;
+	} else {
+		return cnxk_nix_prefree_seg(m, aura);
+	}
+}
+
+static __rte_always_inline void
+cn20k_nix_xmit_prepare_tso(struct rte_mbuf *m, const uint64_t flags)
+{
+	uint64_t mask, ol_flags = m->ol_flags;
+
+	if (flags & NIX_TX_OFFLOAD_TSO_F && (ol_flags & RTE_MBUF_F_TX_TCP_SEG)) {
+		uintptr_t mdata = rte_pktmbuf_mtod(m, uintptr_t);
+		uint16_t *iplen, *oiplen, *oudplen;
+		uint16_t lso_sb, paylen;
+
+		mask = -!!(ol_flags & (RTE_MBUF_F_TX_OUTER_IPV4 | RTE_MBUF_F_TX_OUTER_IPV6));
+		lso_sb = (mask & (m->outer_l2_len + m->outer_l3_len)) + m->l2_len + m->l3_len +
+			 m->l4_len;
+
+		/* Reduce payload len from base headers */
+		paylen = m->pkt_len - lso_sb;
+
+		/* Get iplen position assuming no tunnel hdr */
+		iplen = (uint16_t *)(mdata + m->l2_len + (2 << !!(ol_flags & RTE_MBUF_F_TX_IPV6)));
+		/* Handle tunnel tso */
+		if ((flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) &&
+		    (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)) {
+			const uint8_t is_udp_tun =
+				(CNXK_NIX_UDP_TUN_BITMASK >>
+				 ((ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) >> 45)) &
+				0x1;
+
+			oiplen = (uint16_t *)(mdata + m->outer_l2_len +
+					      (2 << !!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV6)));
+			*oiplen = rte_cpu_to_be_16(rte_be_to_cpu_16(*oiplen) - paylen);
+
+			/* Update format for UDP tunneled packet */
+			if (is_udp_tun) {
+				oudplen =
+					(uint16_t *)(mdata + m->outer_l2_len + m->outer_l3_len + 4);
+				*oudplen = rte_cpu_to_be_16(rte_be_to_cpu_16(*oudplen) - paylen);
+			}
+
+			/* Update iplen position to inner ip hdr */
+			iplen = (uint16_t *)(mdata + lso_sb - m->l3_len - m->l4_len +
+					     (2 << !!(ol_flags & RTE_MBUF_F_TX_IPV6)));
+		}
+
+		*iplen = rte_cpu_to_be_16(rte_be_to_cpu_16(*iplen) - paylen);
+	}
+}
+
+static __rte_always_inline void
+cn20k_nix_xmit_prepare(struct cn20k_eth_txq *txq, struct rte_mbuf *m, struct rte_mbuf **extm,
+		       uint64_t *cmd, const uint16_t flags, const uint64_t lso_tun_fmt, bool *sec,
+		       uint8_t mark_flag, uint64_t mark_fmt)
+{
+	uint8_t mark_off = 0, mark_vlan = 0, markptr = 0;
+	struct nix_send_ext_s *send_hdr_ext;
+	struct nix_send_hdr_s *send_hdr;
+	uint64_t ol_flags = 0, mask;
+	union nix_send_hdr_w1_u w1;
+	union nix_send_sg_s *sg;
+	uint16_t mark_form = 0;
+
+	send_hdr = (struct nix_send_hdr_s *)cmd;
+	if (flags & NIX_TX_NEED_EXT_HDR) {
+		send_hdr_ext = (struct nix_send_ext_s *)(cmd + 2);
+		sg = (union nix_send_sg_s *)(cmd + 4);
+		/* Clear previous markings */
+		send_hdr_ext->w0.lso = 0;
+		send_hdr_ext->w0.mark_en = 0;
+		send_hdr_ext->w1.u = 0;
+		ol_flags = m->ol_flags;
+	} else {
+		sg = (union nix_send_sg_s *)(cmd + 2);
+	}
+
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F)
+		send_hdr->w0.pnc = 0;
+
+	if (flags & (NIX_TX_NEED_SEND_HDR_W1 | NIX_TX_OFFLOAD_SECURITY_F)) {
+		ol_flags = m->ol_flags;
+		w1.u = 0;
+	}
+
+	if (!(flags & NIX_TX_MULTI_SEG_F))
+		send_hdr->w0.total = m->data_len;
+	else
+		send_hdr->w0.total = m->pkt_len;
+	send_hdr->w0.aura = roc_npa_aura_handle_to_aura(m->pool->pool_id);
+
+	/*
+	 * L3type:  2 => IPV4
+	 *          3 => IPV4 with csum
+	 *          4 => IPV6
+	 * L3type and L3ptr needs to be set for either
+	 * L3 csum or L4 csum or LSO
+	 *
+	 */
+
+	if ((flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) && (flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F)) {
+		const uint8_t csum = !!(ol_flags & RTE_MBUF_F_TX_OUTER_UDP_CKSUM);
+		const uint8_t ol3type = ((!!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV4)) << 1) +
+					((!!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV6)) << 2) +
+					!!(ol_flags & RTE_MBUF_F_TX_OUTER_IP_CKSUM);
+
+		/* Outer L3 */
+		w1.ol3type = ol3type;
+		mask = 0xffffull << ((!!ol3type) << 4);
+		w1.ol3ptr = ~mask & m->outer_l2_len;
+		w1.ol4ptr = ~mask & (w1.ol3ptr + m->outer_l3_len);
+
+		/* Outer L4 */
+		w1.ol4type = csum + (csum << 1);
+
+		/* Inner L3 */
+		w1.il3type = ((!!(ol_flags & RTE_MBUF_F_TX_IPV4)) << 1) +
+			     ((!!(ol_flags & RTE_MBUF_F_TX_IPV6)) << 2);
+		w1.il3ptr = w1.ol4ptr + m->l2_len;
+		w1.il4ptr = w1.il3ptr + m->l3_len;
+		/* Increment it by 1 if it is IPV4 as 3 is with csum */
+		w1.il3type = w1.il3type + !!(ol_flags & RTE_MBUF_F_TX_IP_CKSUM);
+
+		/* Inner L4 */
+		w1.il4type = (ol_flags & RTE_MBUF_F_TX_L4_MASK) >> 52;
+
+		/* In case of no tunnel header use only
+		 * shift IL3/IL4 fields a bit to use
+		 * OL3/OL4 for header checksum
+		 */
+		mask = !ol3type;
+		w1.u = ((w1.u & 0xFFFFFFFF00000000) >> (mask << 3)) |
+		       ((w1.u & 0X00000000FFFFFFFF) >> (mask << 4));
+
+	} else if (flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) {
+		const uint8_t csum = !!(ol_flags & RTE_MBUF_F_TX_OUTER_UDP_CKSUM);
+		const uint8_t outer_l2_len = m->outer_l2_len;
+
+		/* Outer L3 */
+		w1.ol3ptr = outer_l2_len;
+		w1.ol4ptr = outer_l2_len + m->outer_l3_len;
+		/* Increment it by 1 if it is IPV4 as 3 is with csum */
+		w1.ol3type = ((!!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV4)) << 1) +
+			     ((!!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV6)) << 2) +
+			     !!(ol_flags & RTE_MBUF_F_TX_OUTER_IP_CKSUM);
+
+		/* Outer L4 */
+		w1.ol4type = csum + (csum << 1);
+
+	} else if (flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F) {
+		const uint8_t l2_len = m->l2_len;
+
+		/* Always use OLXPTR and OLXTYPE when only
+		 * when one header is present
+		 */
+
+		/* Inner L3 */
+		w1.ol3ptr = l2_len;
+		w1.ol4ptr = l2_len + m->l3_len;
+		/* Increment it by 1 if it is IPV4 as 3 is with csum */
+		w1.ol3type = ((!!(ol_flags & RTE_MBUF_F_TX_IPV4)) << 1) +
+			     ((!!(ol_flags & RTE_MBUF_F_TX_IPV6)) << 2) +
+			     !!(ol_flags & RTE_MBUF_F_TX_IP_CKSUM);
+
+		/* Inner L4 */
+		w1.ol4type = (ol_flags & RTE_MBUF_F_TX_L4_MASK) >> 52;
+	}
+
+	if (flags & NIX_TX_NEED_EXT_HDR && flags & NIX_TX_OFFLOAD_VLAN_QINQ_F) {
+		const uint8_t ipv6 = !!(ol_flags & RTE_MBUF_F_TX_IPV6);
+		const uint8_t ip = !!(ol_flags & (RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IPV6));
+
+		send_hdr_ext->w1.vlan1_ins_ena = !!(ol_flags & RTE_MBUF_F_TX_VLAN);
+		/* HW will update ptr after vlan0 update */
+		send_hdr_ext->w1.vlan1_ins_ptr = 12;
+		send_hdr_ext->w1.vlan1_ins_tci = m->vlan_tci;
+
+		send_hdr_ext->w1.vlan0_ins_ena = !!(ol_flags & RTE_MBUF_F_TX_QINQ);
+		/* 2B before end of l2 header */
+		send_hdr_ext->w1.vlan0_ins_ptr = 12;
+		send_hdr_ext->w1.vlan0_ins_tci = m->vlan_tci_outer;
+		/* Fill for VLAN marking only when VLAN insertion enabled */
+		mark_vlan = ((mark_flag & CNXK_TM_MARK_VLAN_DEI) &
+			     (send_hdr_ext->w1.vlan1_ins_ena || send_hdr_ext->w1.vlan0_ins_ena));
+
+		/* Mask requested flags with packet data information */
+		mark_off = mark_flag & ((ip << 2) | (ip << 1) | mark_vlan);
+		mark_off = ffs(mark_off & CNXK_TM_MARK_MASK);
+
+		mark_form = (mark_fmt >> ((mark_off - !!mark_off) << 4));
+		mark_form = (mark_form >> (ipv6 << 3)) & 0xFF;
+		markptr = m->l2_len + (mark_form >> 7) - (mark_vlan << 2);
+
+		send_hdr_ext->w0.mark_en = !!mark_off;
+		send_hdr_ext->w0.markform = mark_form & 0x7F;
+		send_hdr_ext->w0.markptr = markptr;
+	}
+
+	if (flags & NIX_TX_NEED_EXT_HDR && flags & NIX_TX_OFFLOAD_TSO_F &&
+	    (ol_flags & RTE_MBUF_F_TX_TCP_SEG)) {
+		uint16_t lso_sb;
+		uint64_t mask;
+
+		mask = -(!w1.il3type);
+		lso_sb = (mask & w1.ol4ptr) + (~mask & w1.il4ptr) + m->l4_len;
+
+		send_hdr_ext->w0.lso_sb = lso_sb;
+		send_hdr_ext->w0.lso = 1;
+		send_hdr_ext->w0.lso_mps = m->tso_segsz;
+		send_hdr_ext->w0.lso_format =
+			NIX_LSO_FORMAT_IDX_TSOV4 + !!(ol_flags & RTE_MBUF_F_TX_IPV6);
+		w1.ol4type = NIX_SENDL4TYPE_TCP_CKSUM;
+
+		/* Handle tunnel tso */
+		if ((flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) &&
+		    (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)) {
+			const uint8_t is_udp_tun =
+				(CNXK_NIX_UDP_TUN_BITMASK >>
+				 ((ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) >> 45)) &
+				0x1;
+			uint8_t shift = is_udp_tun ? 32 : 0;
+
+			shift += (!!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV6) << 4);
+			shift += (!!(ol_flags & RTE_MBUF_F_TX_IPV6) << 3);
+
+			w1.il4type = NIX_SENDL4TYPE_TCP_CKSUM;
+			w1.ol4type = is_udp_tun ? NIX_SENDL4TYPE_UDP_CKSUM : 0;
+			/* Update format for UDP tunneled packet */
+			send_hdr_ext->w0.lso_format = (lso_tun_fmt >> shift);
+		}
+	}
+
+	if (flags & NIX_TX_NEED_SEND_HDR_W1)
+		send_hdr->w1.u = w1.u;
+
+	if (!(flags & NIX_TX_MULTI_SEG_F)) {
+		struct rte_mbuf *cookie;
+
+		sg->seg1_size = send_hdr->w0.total;
+		*(rte_iova_t *)(sg + 1) = rte_mbuf_data_iova(m);
+		cookie = RTE_MBUF_DIRECT(m) ? m : rte_mbuf_from_indirect(m);
+
+		if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) {
+			uint64_t aura;
+
+			/* DF bit = 1 if refcount of current mbuf or parent mbuf
+			 *		is greater than 1
+			 * DF bit = 0 otherwise
+			 */
+			aura = send_hdr->w0.aura;
+			send_hdr->w0.df = cn20k_nix_prefree_seg(m, extm, txq, send_hdr, &aura);
+			send_hdr->w0.aura = aura;
+		}
+#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
+		/* Mark mempool object as "put" since it is freed by NIX */
+		if (!send_hdr->w0.df)
+			RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+#else
+		RTE_SET_USED(cookie);
+#endif
+	} else {
+		sg->seg1_size = m->data_len;
+		*(rte_iova_t *)(sg + 1) = rte_mbuf_data_iova(m);
+
+		/* NOFF is handled later for multi-seg */
+	}
+
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+		*sec = !!(ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD);
+}
+
+static __rte_always_inline void
+cn20k_nix_xmit_mv_lmt_base(uintptr_t lmt_addr, uint64_t *cmd, const uint16_t flags)
+{
+	struct nix_send_ext_s *send_hdr_ext;
+	union nix_send_sg_s *sg;
+
+	/* With minimal offloads, 'cmd' being local could be optimized out to
+	 * registers. In other cases, 'cmd' will be in stack. Intent is
+	 * 'cmd' stores content from txq->cmd which is copied only once.
+	 */
+	*((struct nix_send_hdr_s *)lmt_addr) = *(struct nix_send_hdr_s *)cmd;
+	lmt_addr += 16;
+	if (flags & NIX_TX_NEED_EXT_HDR) {
+		send_hdr_ext = (struct nix_send_ext_s *)(cmd + 2);
+		*((struct nix_send_ext_s *)lmt_addr) = *send_hdr_ext;
+		lmt_addr += 16;
+
+		sg = (union nix_send_sg_s *)(cmd + 4);
+	} else {
+		sg = (union nix_send_sg_s *)(cmd + 2);
+	}
+	/* In case of multi-seg, sg template is stored here */
+	*((union nix_send_sg_s *)lmt_addr) = *sg;
+	*(rte_iova_t *)(lmt_addr + 8) = *(rte_iova_t *)(sg + 1);
+}
+
+static __rte_always_inline void
+cn20k_nix_xmit_prepare_tstamp(struct cn20k_eth_txq *txq, uintptr_t lmt_addr,
+			      const uint64_t ol_flags, const uint16_t no_segdw,
+			      const uint16_t flags)
+{
+	if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+		const uint8_t is_ol_tstamp = !(ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST);
+		uint64_t *lmt = (uint64_t *)lmt_addr;
+		uint16_t off = (no_segdw - 1) << 1;
+		struct nix_send_mem_s *send_mem;
+
+		send_mem = (struct nix_send_mem_s *)(lmt + off);
+		/* Packets for which PKT_TX_IEEE1588_TMST is not set, tx tstamp
+		 * should not be recorded, hence changing the alg type to
+		 * NIX_SENDMEMALG_SUB and also changing send mem addr field to
+		 * next 8 bytes as it corrupts the actual Tx tstamp registered
+		 * address.
+		 */
+		send_mem->w0.subdc = NIX_SUBDC_MEM;
+		send_mem->w0.alg = NIX_SENDMEMALG_SETTSTMP + (is_ol_tstamp << 3);
+		send_mem->addr = (rte_iova_t)(((uint64_t *)txq->ts_mem) + is_ol_tstamp);
+	}
+}
+
+static __rte_always_inline uint16_t
+cn20k_nix_xmit_pkts(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, uint16_t pkts,
+		    uint64_t *cmd, const uint16_t flags)
+{
+	struct cn20k_eth_txq *txq = tx_queue;
+	const rte_iova_t io_addr = txq->io_addr;
+	uint8_t lnum, c_lnum, c_shft, c_loff;
+	uintptr_t pa, lbase = txq->lmt_base;
+	uint16_t lmt_id, burst, left, i;
+	struct rte_mbuf *extm = NULL;
+	uintptr_t c_lbase = lbase;
+	uint64_t lso_tun_fmt = 0;
+	uint64_t mark_fmt = 0;
+	uint8_t mark_flag = 0;
+	rte_iova_t c_io_addr;
+	uint16_t c_lmt_id;
+	uint64_t sa_base;
+	uintptr_t laddr;
+	uint64_t data;
+	bool sec;
+
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && txq->tx_compl.ena)
+		handle_tx_completion_pkts(txq, flags & NIX_TX_VWQE_F);
+
+	if (!(flags & NIX_TX_VWQE_F))
+		NIX_XMIT_FC_CHECK_RETURN(txq, pkts);
+
+	/* Get cmd skeleton */
+	cn20k_nix_tx_skeleton(txq, cmd, flags, !(flags & NIX_TX_VWQE_F));
+
+	if (flags & NIX_TX_OFFLOAD_TSO_F)
+		lso_tun_fmt = txq->lso_tun_fmt;
+
+	if (flags & NIX_TX_OFFLOAD_VLAN_QINQ_F) {
+		mark_fmt = txq->mark_fmt;
+		mark_flag = txq->mark_flag;
+	}
+
+	/* Get LMT base address and LMT ID as lcore id */
+	ROC_LMT_BASE_ID_GET(lbase, lmt_id);
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		ROC_LMT_CPT_BASE_ID_GET(c_lbase, c_lmt_id);
+		c_io_addr = txq->cpt_io_addr;
+		sa_base = txq->sa_base;
+	}
+
+	left = pkts;
+again:
+	burst = left > 32 ? 32 : left;
+
+	lnum = 0;
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		c_lnum = 0;
+		c_loff = 0;
+		c_shft = 16;
+	}
+
+	for (i = 0; i < burst; i++) {
+		/* Perform header writes for TSO, barrier at
+		 * lmt steorl will suffice.
+		 */
+		if (flags & NIX_TX_OFFLOAD_TSO_F)
+			cn20k_nix_xmit_prepare_tso(tx_pkts[i], flags);
+
+		cn20k_nix_xmit_prepare(txq, tx_pkts[i], &extm, cmd, flags, lso_tun_fmt, &sec,
+				       mark_flag, mark_fmt);
+
+		laddr = (uintptr_t)LMT_OFF(lbase, lnum, 0);
+
+		/* Prepare CPT instruction and get nixtx addr */
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F && sec)
+			cn20k_nix_prep_sec(tx_pkts[i], cmd, &laddr, c_lbase, &c_lnum, &c_loff,
+					   &c_shft, sa_base, flags);
+
+		/* Move NIX desc to LMT/NIXTX area */
+		cn20k_nix_xmit_mv_lmt_base(laddr, cmd, flags);
+		cn20k_nix_xmit_prepare_tstamp(txq, laddr, tx_pkts[i]->ol_flags, 4, flags);
+		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F) || !sec)
+			lnum++;
+	}
+
+	if ((flags & NIX_TX_VWQE_F) && !(ws[3] & BIT_ULL(35)))
+		ws[3] = roc_sso_hws_head_wait(ws[0]);
+
+	left -= burst;
+	tx_pkts += burst;
+
+	/* Submit CPT instructions if any */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		uint16_t sec_pkts = ((c_lnum << 1) + c_loff);
+
+		/* Reduce pkts to be sent to CPT */
+		burst -= sec_pkts;
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, sec_pkts);
+		cn20k_nix_sec_fc_wait(txq, sec_pkts);
+		cn20k_nix_sec_steorl(c_io_addr, c_lmt_id, c_lnum, c_loff, c_shft);
+	}
+
+	/* Trigger LMTST */
+	if (burst > 16) {
+		data = cn20k_nix_tx_steor_data(flags);
+		pa = io_addr | (data & 0x7) << 4;
+		data &= ~0x7ULL;
+		data |= (15ULL << 12);
+		data |= (uint64_t)lmt_id;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, 16);
+		/* STEOR0 */
+		roc_lmt_submit_steorl(data, pa);
+
+		data = cn20k_nix_tx_steor_data(flags);
+		pa = io_addr | (data & 0x7) << 4;
+		data &= ~0x7ULL;
+		data |= ((uint64_t)(burst - 17)) << 12;
+		data |= (uint64_t)(lmt_id + 16);
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, burst - 16);
+		/* STEOR1 */
+		roc_lmt_submit_steorl(data, pa);
+	} else if (burst) {
+		data = cn20k_nix_tx_steor_data(flags);
+		pa = io_addr | (data & 0x7) << 4;
+		data &= ~0x7ULL;
+		data |= ((uint64_t)(burst - 1)) << 12;
+		data |= (uint64_t)lmt_id;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, burst);
+		/* STEOR0 */
+		roc_lmt_submit_steorl(data, pa);
+	}
+
+	rte_io_wmb();
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && !txq->tx_compl.ena) {
+		cn20k_nix_free_extmbuf(extm);
+		extm = NULL;
+	}
+
+	if (left)
+		goto again;
+
+	return pkts;
+}
+
 #define L3L4CSUM_F   NIX_TX_OFFLOAD_L3_L4_CSUM_F
 #define OL3OL4CSUM_F NIX_TX_OFFLOAD_OL3_OL4_CSUM_F
 #define VLAN_F	     NIX_TX_OFFLOAD_VLAN_QINQ_F
@@ -225,10 +1203,11 @@ NIX_TX_FASTPATH_MODES
 	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
 					     uint16_t pkts)                                        \
 	{                                                                                          \
-		RTE_SET_USED(tx_queue);                                                            \
-		RTE_SET_USED(tx_pkts);                                                             \
-		RTE_SET_USED(pkts);                                                                \
-		return 0;                                                                          \
+		uint64_t cmd[sz];                                                                  \
+		/* For TSO inner checksum is a must */                                             \
+		if (((flags) & NIX_TX_OFFLOAD_TSO_F) && !((flags) & NIX_TX_OFFLOAD_L3_L4_CSUM_F))  \
+			return 0;                                                                  \
+		return cn20k_nix_xmit_pkts(tx_queue, NULL, tx_pkts, pkts, cmd, flags);             \
 	}
 
 #define NIX_TX_XMIT_MSEG(fn, sz, flags)                                                            \
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 16/18] net/cnxk: support Tx multi-seg in cn20k
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
                     ` (14 preceding siblings ...)
  2024-09-26 16:01   ` [PATCH v2 15/18] net/cnxk: support Tx burst scalar " Nithin Dabilpuram
@ 2024-09-26 16:01   ` Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 17/18] net/cnxk: support Tx burst vector for cn20k Nithin Dabilpuram
                     ` (2 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-26 16:01 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rahul Bhansali, Pavan Nikhilesh

Add Tx multi-seg support in scalar for cn20k.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 drivers/net/cnxk/cn20k_tx.h | 352 +++++++++++++++++++++++++++++++++++-
 1 file changed, 347 insertions(+), 5 deletions(-)

diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
index 610d64f21b..3f163285f0 100644
--- a/drivers/net/cnxk/cn20k_tx.h
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -863,6 +863,183 @@ cn20k_nix_xmit_prepare_tstamp(struct cn20k_eth_txq *txq, uintptr_t lmt_addr,
 	}
 }
 
+static __rte_always_inline uint16_t
+cn20k_nix_prepare_mseg(struct cn20k_eth_txq *txq, struct rte_mbuf *m, struct rte_mbuf **extm,
+		       uint64_t *cmd, const uint16_t flags)
+{
+	uint64_t prefree = 0, aura0, aura, nb_segs, segdw;
+	struct nix_send_hdr_s *send_hdr;
+	union nix_send_sg_s *sg, l_sg;
+	union nix_send_sg2_s l_sg2;
+	struct rte_mbuf *cookie;
+	struct rte_mbuf *m_next;
+	uint8_t off, is_sg2;
+	uint64_t len, dlen;
+	uint64_t ol_flags;
+	uint64_t *slist;
+
+	send_hdr = (struct nix_send_hdr_s *)cmd;
+
+	if (flags & NIX_TX_NEED_EXT_HDR)
+		off = 2;
+	else
+		off = 0;
+
+	sg = (union nix_send_sg_s *)&cmd[2 + off];
+	len = send_hdr->w0.total;
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+		ol_flags = m->ol_flags;
+
+	/* Start from second segment, first segment is already there */
+	dlen = m->data_len;
+	is_sg2 = 0;
+	l_sg.u = sg->u;
+	/* Clear l_sg.u first seg length that might be stale from vector path */
+	l_sg.u &= ~0xFFFFUL;
+	l_sg.u |= dlen;
+	len -= dlen;
+	nb_segs = m->nb_segs - 1;
+	m_next = m->next;
+	m->next = NULL;
+	m->nb_segs = 1;
+	slist = &cmd[3 + off + 1];
+
+	cookie = RTE_MBUF_DIRECT(m) ? m : rte_mbuf_from_indirect(m);
+	/* Set invert df if buffer is not to be freed by H/W */
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) {
+		aura = send_hdr->w0.aura;
+		prefree = cn20k_nix_prefree_seg(m, extm, txq, send_hdr, &aura);
+		send_hdr->w0.aura = aura;
+		l_sg.i1 = prefree;
+	}
+
+#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
+	/* Mark mempool object as "put" since it is freed by NIX */
+	if (!prefree)
+		RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+	rte_io_wmb();
+#else
+	RTE_SET_USED(cookie);
+#endif
+
+	/* Quickly handle single segmented packets. With this if-condition
+	 * compiler will completely optimize out the below do-while loop
+	 * from the Tx handler when NIX_TX_MULTI_SEG_F offload is not set.
+	 */
+	if (!(flags & NIX_TX_MULTI_SEG_F))
+		goto done;
+
+	aura0 = send_hdr->w0.aura;
+	m = m_next;
+	if (!m)
+		goto done;
+
+	/* Fill mbuf segments */
+	do {
+		uint64_t iova;
+
+		/* Save the current mbuf properties. These can get cleared in
+		 * cnxk_nix_prefree_seg()
+		 */
+		m_next = m->next;
+		iova = rte_mbuf_data_iova(m);
+		dlen = m->data_len;
+		len -= dlen;
+
+		nb_segs--;
+		aura = aura0;
+		prefree = 0;
+
+		m->next = NULL;
+
+		cookie = RTE_MBUF_DIRECT(m) ? m : rte_mbuf_from_indirect(m);
+		if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) {
+			aura = roc_npa_aura_handle_to_aura(m->pool->pool_id);
+			prefree = cn20k_nix_prefree_seg(m, extm, txq, send_hdr, &aura);
+			is_sg2 = aura != aura0 && !prefree;
+		}
+
+		if (unlikely(is_sg2)) {
+			/* This mbuf belongs to a different pool and
+			 * DF bit is not to be set, so use SG2 subdesc
+			 * so that it is freed to the appropriate pool.
+			 */
+
+			/* Write the previous descriptor out */
+			sg->u = l_sg.u;
+
+			/* If the current SG subdc does not have any
+			 * iovas in it, then the SG2 subdc can overwrite
+			 * that SG subdc.
+			 *
+			 * If the current SG subdc has 2 iovas in it, then
+			 * the current iova word should be left empty.
+			 */
+			slist += (-1 + (int)l_sg.segs);
+			sg = (union nix_send_sg_s *)slist;
+
+			l_sg2.u = l_sg.u & 0xC00000000000000; /* LD_TYPE */
+			l_sg2.subdc = NIX_SUBDC_SG2;
+			l_sg2.aura = aura;
+			l_sg2.seg1_size = dlen;
+			l_sg.u = l_sg2.u;
+
+			slist++;
+			*slist = iova;
+			slist++;
+		} else {
+			*slist = iova;
+			/* Set invert df if buffer is not to be freed by H/W */
+			l_sg.u |= (prefree << (l_sg.segs + 55));
+			/* Set the segment length */
+			l_sg.u |= ((uint64_t)dlen << (l_sg.segs << 4));
+			l_sg.segs += 1;
+			slist++;
+		}
+
+		if ((is_sg2 || l_sg.segs > 2) && nb_segs) {
+			sg->u = l_sg.u;
+			/* Next SG subdesc */
+			sg = (union nix_send_sg_s *)slist;
+			l_sg.u &= 0xC00000000000000; /* LD_TYPE */
+			l_sg.subdc = NIX_SUBDC_SG;
+			slist++;
+		}
+
+#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
+		/* Mark mempool object as "put" since it is freed by NIX
+		 */
+		if (!prefree)
+			RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+#else
+		RTE_SET_USED(cookie);
+#endif
+		m = m_next;
+	} while (nb_segs);
+
+done:
+	/* Add remaining bytes of security data to last seg */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F && ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD && len) {
+		uint8_t shft = (l_sg.subdc == NIX_SUBDC_SG) ? ((l_sg.segs - 1) << 4) : 0;
+
+		dlen = ((l_sg.u >> shft) & 0xFFFFULL) + len;
+		l_sg.u = l_sg.u & ~(0xFFFFULL << shft);
+		l_sg.u |= dlen << shft;
+	}
+
+	/* Write the last subdc out */
+	sg->u = l_sg.u;
+
+	segdw = (uint64_t *)slist - (uint64_t *)&cmd[2 + off];
+	/* Roundup extra dwords to multiple of 2 */
+	segdw = (segdw >> 1) + (segdw & 0x1);
+	/* Default dwords */
+	segdw += (off >> 1) + 1 + !!(flags & NIX_TX_OFFLOAD_TSTAMP_F);
+	send_hdr->w0.sizem1 = segdw - 1;
+
+	return segdw;
+}
+
 static __rte_always_inline uint16_t
 cn20k_nix_xmit_pkts(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, uint16_t pkts,
 		    uint64_t *cmd, const uint16_t flags)
@@ -1010,6 +1187,170 @@ cn20k_nix_xmit_pkts(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, uin
 	return pkts;
 }
 
+static __rte_always_inline uint16_t
+cn20k_nix_xmit_pkts_mseg(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, uint16_t pkts,
+			 uint64_t *cmd, const uint16_t flags)
+{
+	struct cn20k_eth_txq *txq = tx_queue;
+	uintptr_t pa0, pa1, lbase = txq->lmt_base;
+	const rte_iova_t io_addr = txq->io_addr;
+	uint16_t segdw, lmt_id, burst, left, i;
+	struct rte_mbuf *extm = NULL;
+	uint8_t lnum, c_lnum, c_loff;
+	uintptr_t c_lbase = lbase;
+	uint64_t lso_tun_fmt = 0;
+	uint64_t mark_fmt = 0;
+	uint8_t mark_flag = 0;
+	uint64_t data0, data1;
+	rte_iova_t c_io_addr;
+	uint8_t shft, c_shft;
+	__uint128_t data128;
+	uint16_t c_lmt_id;
+	uint64_t sa_base;
+	uintptr_t laddr;
+	bool sec;
+
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && txq->tx_compl.ena)
+		handle_tx_completion_pkts(txq, flags & NIX_TX_VWQE_F);
+
+	if (!(flags & NIX_TX_VWQE_F))
+		NIX_XMIT_FC_CHECK_RETURN(txq, pkts);
+
+	/* Get cmd skeleton */
+	cn20k_nix_tx_skeleton(txq, cmd, flags, !(flags & NIX_TX_VWQE_F));
+
+	if (flags & NIX_TX_OFFLOAD_TSO_F)
+		lso_tun_fmt = txq->lso_tun_fmt;
+
+	if (flags & NIX_TX_OFFLOAD_VLAN_QINQ_F) {
+		mark_fmt = txq->mark_fmt;
+		mark_flag = txq->mark_flag;
+	}
+
+	/* Get LMT base address and LMT ID as lcore id */
+	ROC_LMT_BASE_ID_GET(lbase, lmt_id);
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		ROC_LMT_CPT_BASE_ID_GET(c_lbase, c_lmt_id);
+		c_io_addr = txq->cpt_io_addr;
+		sa_base = txq->sa_base;
+	}
+
+	left = pkts;
+again:
+	burst = left > 32 ? 32 : left;
+	shft = 16;
+	data128 = 0;
+
+	lnum = 0;
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		c_lnum = 0;
+		c_loff = 0;
+		c_shft = 16;
+	}
+
+	for (i = 0; i < burst; i++) {
+		cn20k_nix_tx_mbuf_validate(tx_pkts[i], flags);
+
+		/* Perform header writes for TSO, barrier at
+		 * lmt steorl will suffice.
+		 */
+		if (flags & NIX_TX_OFFLOAD_TSO_F)
+			cn20k_nix_xmit_prepare_tso(tx_pkts[i], flags);
+
+		cn20k_nix_xmit_prepare(txq, tx_pkts[i], &extm, cmd, flags, lso_tun_fmt, &sec,
+				       mark_flag, mark_fmt);
+
+		laddr = (uintptr_t)LMT_OFF(lbase, lnum, 0);
+
+		/* Prepare CPT instruction and get nixtx addr */
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F && sec)
+			cn20k_nix_prep_sec(tx_pkts[i], cmd, &laddr, c_lbase, &c_lnum, &c_loff,
+					   &c_shft, sa_base, flags);
+
+		/* Move NIX desc to LMT/NIXTX area */
+		cn20k_nix_xmit_mv_lmt_base(laddr, cmd, flags);
+		/* Store sg list directly on lmt line */
+		segdw = cn20k_nix_prepare_mseg(txq, tx_pkts[i], &extm, (uint64_t *)laddr, flags);
+		cn20k_nix_xmit_prepare_tstamp(txq, laddr, tx_pkts[i]->ol_flags, segdw, flags);
+		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F) || !sec) {
+			lnum++;
+			data128 |= (((__uint128_t)(segdw - 1)) << shft);
+			shft += 3;
+		}
+	}
+
+	if ((flags & NIX_TX_VWQE_F) && !(ws[3] & BIT_ULL(35)))
+		ws[3] = roc_sso_hws_head_wait(ws[0]);
+
+	left -= burst;
+	tx_pkts += burst;
+
+	/* Submit CPT instructions if any */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		uint16_t sec_pkts = ((c_lnum << 1) + c_loff);
+
+		/* Reduce pkts to be sent to CPT */
+		burst -= sec_pkts;
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, sec_pkts);
+		cn20k_nix_sec_fc_wait(txq, sec_pkts);
+		cn20k_nix_sec_steorl(c_io_addr, c_lmt_id, c_lnum, c_loff, c_shft);
+	}
+
+	data0 = (uint64_t)data128;
+	data1 = (uint64_t)(data128 >> 64);
+	/* Make data0 similar to data1 */
+	data0 >>= 16;
+	/* Trigger LMTST */
+	if (burst > 16) {
+		pa0 = io_addr | (data0 & 0x7) << 4;
+		data0 &= ~0x7ULL;
+		/* Move lmtst1..15 sz to bits 63:19 */
+		data0 <<= 16;
+		data0 |= (15ULL << 12);
+		data0 |= (uint64_t)lmt_id;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, 16);
+		/* STEOR0 */
+		roc_lmt_submit_steorl(data0, pa0);
+
+		pa1 = io_addr | (data1 & 0x7) << 4;
+		data1 &= ~0x7ULL;
+		data1 <<= 16;
+		data1 |= ((uint64_t)(burst - 17)) << 12;
+		data1 |= (uint64_t)(lmt_id + 16);
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, burst - 16);
+		/* STEOR1 */
+		roc_lmt_submit_steorl(data1, pa1);
+	} else if (burst) {
+		pa0 = io_addr | (data0 & 0x7) << 4;
+		data0 &= ~0x7ULL;
+		/* Move lmtst1..15 sz to bits 63:19 */
+		data0 <<= 16;
+		data0 |= ((burst - 1ULL) << 12);
+		data0 |= (uint64_t)lmt_id;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, burst);
+		/* STEOR0 */
+		roc_lmt_submit_steorl(data0, pa0);
+	}
+
+	rte_io_wmb();
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && !txq->tx_compl.ena) {
+		cn20k_nix_free_extmbuf(extm);
+		extm = NULL;
+	}
+
+	if (left)
+		goto again;
+
+	return pkts;
+}
+
 #define L3L4CSUM_F   NIX_TX_OFFLOAD_L3_L4_CSUM_F
 #define OL3OL4CSUM_F NIX_TX_OFFLOAD_OL3_OL4_CSUM_F
 #define VLAN_F	     NIX_TX_OFFLOAD_VLAN_QINQ_F
@@ -1214,10 +1555,12 @@ NIX_TX_FASTPATH_MODES
 	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
 					     uint16_t pkts)                                        \
 	{                                                                                          \
-		RTE_SET_USED(tx_queue);                                                            \
-		RTE_SET_USED(tx_pkts);                                                             \
-		RTE_SET_USED(pkts);                                                                \
-		return 0;                                                                          \
+		uint64_t cmd[(sz) + CNXK_NIX_TX_MSEG_SG_DWORDS - 2];                               \
+		/* For TSO inner checksum is a must */                                             \
+		if (((flags) & NIX_TX_OFFLOAD_TSO_F) && !((flags) & NIX_TX_OFFLOAD_L3_L4_CSUM_F))  \
+			return 0;                                                                  \
+		return cn20k_nix_xmit_pkts_mseg(tx_queue, NULL, tx_pkts, pkts, cmd,                \
+						flags | NIX_TX_MULTI_SEG_F);                       \
 	}
 
 #define NIX_TX_XMIT_VEC(fn, sz, flags)                                                             \
@@ -1247,5 +1590,4 @@ uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_all_offload(void *tx_queue
 uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_vec_all_offload(void *tx_queue,
 								      struct rte_mbuf **tx_pkts,
 								      uint16_t pkts);
-
 #endif /* __CN20K_TX_H__ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 17/18] net/cnxk: support Tx burst vector for cn20k
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
                     ` (15 preceding siblings ...)
  2024-09-26 16:01   ` [PATCH v2 16/18] net/cnxk: support Tx multi-seg in cn20k Nithin Dabilpuram
@ 2024-09-26 16:01   ` Nithin Dabilpuram
  2024-09-26 16:01   ` [PATCH v2 18/18] net/cnxk: support Tx multi-seg in " Nithin Dabilpuram
  2024-10-01 11:01   ` [PATCH v2 00/18] add Marvell cn20k SOC support for mempool and net Jerin Jacob
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-26 16:01 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rahul Bhansali, Pavan Nikhilesh

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 drivers/net/cnxk/cn20k_tx.h | 1445 ++++++++++++++++++++++++++++++++++-
 1 file changed, 1441 insertions(+), 4 deletions(-)

diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
index 3f163285f0..05c8b80fcb 100644
--- a/drivers/net/cnxk/cn20k_tx.h
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -219,6 +219,28 @@ cn20k_nix_tx_ext_subs(const uint16_t flags)
 		       ((flags & (NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSO_F)) ? 1 : 0);
 }
 
+static __rte_always_inline uint8_t
+cn20k_nix_tx_dwords(const uint16_t flags, const uint8_t segdw)
+{
+	if (!(flags & NIX_TX_MULTI_SEG_F))
+		return cn20k_nix_tx_ext_subs(flags) + 2;
+
+	/* Already everything is accounted for in segdw */
+	return segdw;
+}
+
+static __rte_always_inline uint8_t
+cn20k_nix_pkts_per_vec_brst(const uint16_t flags)
+{
+	return ((flags & NIX_TX_NEED_EXT_HDR) ? 2 : 4) << ROC_LMT_LINES_PER_CORE_LOG2;
+}
+
+static __rte_always_inline uint8_t
+cn20k_nix_tx_dwords_per_line(const uint16_t flags)
+{
+	return (flags & NIX_TX_NEED_EXT_HDR) ? ((flags & NIX_TX_OFFLOAD_TSTAMP_F) ? 8 : 6) : 8;
+}
+
 static __rte_always_inline uint64_t
 cn20k_nix_tx_steor_data(const uint16_t flags)
 {
@@ -247,6 +269,40 @@ cn20k_nix_tx_steor_data(const uint16_t flags)
 	return data;
 }
 
+static __rte_always_inline uint8_t
+cn20k_nix_tx_dwords_per_line_seg(const uint16_t flags)
+{
+	return ((flags & NIX_TX_NEED_EXT_HDR) ? (flags & NIX_TX_OFFLOAD_TSTAMP_F) ? 8 : 6 : 4);
+}
+
+static __rte_always_inline uint64_t
+cn20k_nix_tx_steor_vec_data(const uint16_t flags)
+{
+	const uint64_t dw_m1 = cn20k_nix_tx_dwords_per_line(flags) - 1;
+	uint64_t data;
+
+	/* This will be moved to addr area */
+	data = dw_m1;
+	/* 15 vector sizes for single seg */
+	data |= dw_m1 << 19;
+	data |= dw_m1 << 22;
+	data |= dw_m1 << 25;
+	data |= dw_m1 << 28;
+	data |= dw_m1 << 31;
+	data |= dw_m1 << 34;
+	data |= dw_m1 << 37;
+	data |= dw_m1 << 40;
+	data |= dw_m1 << 43;
+	data |= dw_m1 << 46;
+	data |= dw_m1 << 49;
+	data |= dw_m1 << 52;
+	data |= dw_m1 << 55;
+	data |= dw_m1 << 58;
+	data |= dw_m1 << 61;
+
+	return data;
+}
+
 static __rte_always_inline void
 cn20k_nix_tx_skeleton(struct cn20k_eth_txq *txq, uint64_t *cmd, const uint16_t flags,
 		      const uint16_t static_sz)
@@ -276,6 +332,33 @@ cn20k_nix_tx_skeleton(struct cn20k_eth_txq *txq, uint64_t *cmd, const uint16_t f
 	}
 }
 
+static __rte_always_inline void
+cn20k_nix_sec_fc_wait_one(struct cn20k_eth_txq *txq)
+{
+	uint64_t nb_desc = txq->cpt_desc;
+	uint64_t fc;
+
+#ifdef RTE_ARCH_ARM64
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldxr %[space], [%[addr]]		\n"
+		     "		cmp %[nb_desc], %[space]		\n"
+		     "		b.hi .Ldne%=				\n"
+		     "		sevl					\n"
+		     ".Lrty%=:	wfe					\n"
+		     "		ldxr %[space], [%[addr]]		\n"
+		     "		cmp %[nb_desc], %[space]		\n"
+		     "		b.ls .Lrty%=				\n"
+		     ".Ldne%=:						\n"
+		     : [space] "=&r"(fc)
+		     : [nb_desc] "r"(nb_desc), [addr] "r"(txq->cpt_fc)
+		     : "memory");
+#else
+	RTE_SET_USED(fc);
+	while (nb_desc <= __atomic_load_n(txq->cpt_fc, __ATOMIC_RELAXED))
+		;
+#endif
+}
+
 static __rte_always_inline void
 cn20k_nix_sec_fc_wait(struct cn20k_eth_txq *txq, uint16_t nb_pkts)
 {
@@ -346,6 +429,137 @@ cn20k_nix_sec_fc_wait(struct cn20k_eth_txq *txq, uint16_t nb_pkts)
 }
 
 #if defined(RTE_ARCH_ARM64)
+static __rte_always_inline void
+cn20k_nix_prep_sec_vec(struct rte_mbuf *m, uint64x2_t *cmd0, uint64x2_t *cmd1,
+		       uintptr_t *nixtx_addr, uintptr_t lbase, uint8_t *lnum, uint8_t *loff,
+		       uint8_t *shft, uint64_t sa_base, const uint16_t flags)
+{
+	struct cn20k_sec_sess_priv sess_priv;
+	uint32_t pkt_len, dlen_adj, rlen;
+	uint8_t l3l4type, chksum;
+	uint64x2_t cmd01, cmd23;
+	uint8_t l2_len, l3_len;
+	uintptr_t dptr, nixtx;
+	uint64_t ucode_cmd[4];
+	uint64_t *laddr, w0;
+	uint16_t tag;
+	uint64_t sa;
+
+	sess_priv.u64 = *rte_security_dynfield(m);
+
+	if (flags & NIX_TX_NEED_SEND_HDR_W1) {
+		/* Extract l3l4type either from il3il4type or ol3ol4type */
+		if (flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F && flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) {
+			l2_len = vgetq_lane_u8(*cmd0, 10);
+			/* L4 ptr from send hdr includes l2 and l3 len */
+			l3_len = vgetq_lane_u8(*cmd0, 11) - l2_len;
+			l3l4type = vgetq_lane_u8(*cmd0, 13);
+		} else {
+			l2_len = vgetq_lane_u8(*cmd0, 8);
+			/* L4 ptr from send hdr includes l2 and l3 len */
+			l3_len = vgetq_lane_u8(*cmd0, 9) - l2_len;
+			l3l4type = vgetq_lane_u8(*cmd0, 12);
+		}
+
+		chksum = (l3l4type & 0x1) << 1 | !!(l3l4type & 0x30);
+		chksum = ~chksum;
+		sess_priv.chksum = sess_priv.chksum & chksum;
+		/* Clear SEND header flags */
+		*cmd0 = vsetq_lane_u16(0, *cmd0, 6);
+	} else {
+		l2_len = m->l2_len;
+		l3_len = m->l3_len;
+	}
+
+	/* Retrieve DPTR */
+	dptr = vgetq_lane_u64(*cmd1, 1);
+	pkt_len = vgetq_lane_u16(*cmd0, 0);
+
+	/* Calculate dlen adj */
+	dlen_adj = pkt_len - l2_len;
+	/* Exclude l3 len from roundup for transport mode */
+	dlen_adj -= sess_priv.mode ? 0 : l3_len;
+	rlen = (dlen_adj + sess_priv.roundup_len) + (sess_priv.roundup_byte - 1);
+	rlen &= ~(uint64_t)(sess_priv.roundup_byte - 1);
+	rlen += sess_priv.partial_len;
+	dlen_adj = rlen - dlen_adj;
+
+	/* Update send descriptors. Security is single segment only */
+	*cmd0 = vsetq_lane_u16(pkt_len + dlen_adj, *cmd0, 0);
+
+	/* CPT word 5 and word 6 */
+	w0 = 0;
+	ucode_cmd[2] = 0;
+	if (flags & NIX_TX_MULTI_SEG_F && m->nb_segs > 1) {
+		struct rte_mbuf *last = rte_pktmbuf_lastseg(m);
+
+		/* Get area where NIX descriptor needs to be stored */
+		nixtx = rte_pktmbuf_mtod_offset(last, uintptr_t, last->data_len + dlen_adj);
+		nixtx += BIT_ULL(7);
+		nixtx = (nixtx - 1) & ~(BIT_ULL(7) - 1);
+		nixtx += 16;
+
+		dptr = nixtx + ((flags & NIX_TX_NEED_EXT_HDR) ? 32 : 16);
+
+		/* Set l2 length as data offset */
+		w0 = (uint64_t)l2_len << 16;
+		w0 |= cn20k_nix_tx_ext_subs(flags) + NIX_NB_SEGS_TO_SEGDW(m->nb_segs);
+		ucode_cmd[1] = dptr | ((uint64_t)m->nb_segs << 60);
+	} else {
+		/* Get area where NIX descriptor needs to be stored */
+		nixtx = dptr + pkt_len + dlen_adj;
+		nixtx += BIT_ULL(7);
+		nixtx = (nixtx - 1) & ~(BIT_ULL(7) - 1);
+		nixtx += 16;
+
+		w0 |= cn20k_nix_tx_ext_subs(flags) + 1ULL;
+		dptr += l2_len;
+		ucode_cmd[1] = dptr;
+		*cmd1 = vsetq_lane_u16(pkt_len + dlen_adj, *cmd1, 0);
+		/* DLEN passed is excluding L2 HDR */
+		pkt_len -= l2_len;
+	}
+	w0 |= ((((int64_t)nixtx - (int64_t)dptr) & 0xFFFFF) << 32);
+	/* CPT word 0 and 1 */
+	cmd01 = vdupq_n_u64(0);
+	cmd01 = vsetq_lane_u64(w0, cmd01, 0);
+	/* CPT_RES_S is 16B above NIXTX */
+	cmd01 = vsetq_lane_u64(nixtx - 16, cmd01, 1);
+
+	/* Return nixtx addr */
+	*nixtx_addr = nixtx;
+
+	/* CPT Word 4 and Word 7 */
+	tag = sa_base & 0xFFFFUL;
+	sa_base &= ~0xFFFFUL;
+	sa = (uintptr_t)roc_nix_inl_ot_ipsec_outb_sa(sa_base, sess_priv.sa_idx);
+	ucode_cmd[3] = (ROC_CPT_DFLT_ENG_GRP_SE_IE << 61 | 1UL << 60 | sa);
+	ucode_cmd[0] = (ROC_IE_OT_MAJOR_OP_PROCESS_OUTBOUND_IPSEC << 48 | 1UL << 54 |
+			((uint64_t)sess_priv.chksum) << 32 | ((uint64_t)sess_priv.dec_ttl) << 34 |
+			pkt_len);
+
+	/* CPT word 2 and 3 */
+	cmd23 = vdupq_n_u64(0);
+	cmd23 = vsetq_lane_u64(
+		(((uint64_t)RTE_EVENT_TYPE_CPU << 28) | tag | CNXK_ETHDEV_SEC_OUTB_EV_SUB << 20),
+		cmd23, 0);
+	cmd23 = vsetq_lane_u64((uintptr_t)m | 1, cmd23, 1);
+
+	/* Move to our line */
+	laddr = LMT_OFF(lbase, *lnum, *loff ? 64 : 0);
+
+	/* Write CPT instruction to lmt line */
+	vst1q_u64(laddr, cmd01);
+	vst1q_u64((laddr + 2), cmd23);
+
+	*(__uint128_t *)(laddr + 4) = *(__uint128_t *)ucode_cmd;
+	*(__uint128_t *)(laddr + 6) = *(__uint128_t *)(ucode_cmd + 2);
+
+	/* Move to next line for every other CPT inst */
+	*loff = !(*loff);
+	*lnum = *lnum + (*loff ? 0 : 1);
+	*shft = *shft + (*loff ? 0 : 3);
+}
 
 static __rte_always_inline void
 cn20k_nix_prep_sec(struct rte_mbuf *m, uint64_t *cmd, uintptr_t *nixtx_addr, uintptr_t lbase,
@@ -546,6 +760,156 @@ cn20k_nix_prefree_seg(struct rte_mbuf *m, struct rte_mbuf **extm, struct cn20k_e
 	}
 }
 
+#if defined(RTE_ARCH_ARM64)
+/* Only called for first segments of single segmented mbufs */
+static __rte_always_inline void
+cn20k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct rte_mbuf **extm,
+			  struct cn20k_eth_txq *txq, uint64x2_t *senddesc01_w0,
+			  uint64x2_t *senddesc23_w0, uint64x2_t *senddesc01_w1,
+			  uint64x2_t *senddesc23_w1)
+{
+	struct rte_mbuf **tx_compl_ptr = txq->tx_compl.ptr;
+	uint32_t nb_desc_mask = txq->tx_compl.nb_desc_mask;
+	bool tx_compl_ena = txq->tx_compl.ena;
+	struct rte_mbuf *m0, *m1, *m2, *m3;
+	struct rte_mbuf *cookie;
+	uint64_t w0, w1, aura;
+	uint64_t sqe_id;
+
+	m0 = mbufs[0];
+	m1 = mbufs[1];
+	m2 = mbufs[2];
+	m3 = mbufs[3];
+
+	/* mbuf 0 */
+	w0 = vgetq_lane_u64(*senddesc01_w0, 0);
+	if (RTE_MBUF_HAS_EXTBUF(m0)) {
+		w0 |= BIT_ULL(19);
+		w1 = vgetq_lane_u64(*senddesc01_w1, 0);
+		w1 &= ~0xFFFF000000000000UL;
+		if (unlikely(!tx_compl_ena)) {
+			m0->next = *extm;
+			*extm = m0;
+		} else {
+			sqe_id = rte_atomic_fetch_add_explicit(&txq->tx_compl.sqe_id, 1,
+							       rte_memory_order_relaxed);
+			sqe_id = sqe_id & nb_desc_mask;
+			/* Set PNC */
+			w0 |= BIT_ULL(43);
+			w1 |= sqe_id << 48;
+			tx_compl_ptr[sqe_id] = m0;
+			*senddesc01_w1 = vsetq_lane_u64(w1, *senddesc01_w1, 0);
+		}
+	} else {
+		cookie = RTE_MBUF_DIRECT(m0) ? m0 : rte_mbuf_from_indirect(m0);
+		aura = (w0 >> 20) & 0xFFFFF;
+		w0 &= ~0xFFFFF00000UL;
+		w0 |= cnxk_nix_prefree_seg(m0, &aura) << 19;
+		w0 |= aura << 20;
+
+		if ((w0 & BIT_ULL(19)) == 0)
+			RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+	}
+	*senddesc01_w0 = vsetq_lane_u64(w0, *senddesc01_w0, 0);
+
+	/* mbuf1 */
+	w0 = vgetq_lane_u64(*senddesc01_w0, 1);
+	if (RTE_MBUF_HAS_EXTBUF(m1)) {
+		w0 |= BIT_ULL(19);
+		w1 = vgetq_lane_u64(*senddesc01_w1, 1);
+		w1 &= ~0xFFFF000000000000UL;
+		if (unlikely(!tx_compl_ena)) {
+			m1->next = *extm;
+			*extm = m1;
+		} else {
+			sqe_id = rte_atomic_fetch_add_explicit(&txq->tx_compl.sqe_id, 1,
+							       rte_memory_order_relaxed);
+			sqe_id = sqe_id & nb_desc_mask;
+			/* Set PNC */
+			w0 |= BIT_ULL(43);
+			w1 |= sqe_id << 48;
+			tx_compl_ptr[sqe_id] = m1;
+			*senddesc01_w1 = vsetq_lane_u64(w1, *senddesc01_w1, 1);
+		}
+	} else {
+		cookie = RTE_MBUF_DIRECT(m1) ? m1 : rte_mbuf_from_indirect(m1);
+		aura = (w0 >> 20) & 0xFFFFF;
+		w0 &= ~0xFFFFF00000UL;
+		w0 |= cnxk_nix_prefree_seg(m1, &aura) << 19;
+		w0 |= aura << 20;
+
+		if ((w0 & BIT_ULL(19)) == 0)
+			RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+	}
+	*senddesc01_w0 = vsetq_lane_u64(w0, *senddesc01_w0, 1);
+
+	/* mbuf 2 */
+	w0 = vgetq_lane_u64(*senddesc23_w0, 0);
+	if (RTE_MBUF_HAS_EXTBUF(m2)) {
+		w0 |= BIT_ULL(19);
+		w1 = vgetq_lane_u64(*senddesc23_w1, 0);
+		w1 &= ~0xFFFF000000000000UL;
+		if (unlikely(!tx_compl_ena)) {
+			m2->next = *extm;
+			*extm = m2;
+		} else {
+			sqe_id = rte_atomic_fetch_add_explicit(&txq->tx_compl.sqe_id, 1,
+							       rte_memory_order_relaxed);
+			sqe_id = sqe_id & nb_desc_mask;
+			/* Set PNC */
+			w0 |= BIT_ULL(43);
+			w1 |= sqe_id << 48;
+			tx_compl_ptr[sqe_id] = m2;
+			*senddesc23_w1 = vsetq_lane_u64(w1, *senddesc23_w1, 0);
+		}
+	} else {
+		cookie = RTE_MBUF_DIRECT(m2) ? m2 : rte_mbuf_from_indirect(m2);
+		aura = (w0 >> 20) & 0xFFFFF;
+		w0 &= ~0xFFFFF00000UL;
+		w0 |= cnxk_nix_prefree_seg(m2, &aura) << 19;
+		w0 |= aura << 20;
+
+		if ((w0 & BIT_ULL(19)) == 0)
+			RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+	}
+	*senddesc23_w0 = vsetq_lane_u64(w0, *senddesc23_w0, 0);
+
+	/* mbuf3 */
+	w0 = vgetq_lane_u64(*senddesc23_w0, 1);
+	if (RTE_MBUF_HAS_EXTBUF(m3)) {
+		w0 |= BIT_ULL(19);
+		w1 = vgetq_lane_u64(*senddesc23_w1, 1);
+		w1 &= ~0xFFFF000000000000UL;
+		if (unlikely(!tx_compl_ena)) {
+			m3->next = *extm;
+			*extm = m3;
+		} else {
+			sqe_id = rte_atomic_fetch_add_explicit(&txq->tx_compl.sqe_id, 1,
+							       rte_memory_order_relaxed);
+			sqe_id = sqe_id & nb_desc_mask;
+			/* Set PNC */
+			w0 |= BIT_ULL(43);
+			w1 |= sqe_id << 48;
+			tx_compl_ptr[sqe_id] = m3;
+			*senddesc23_w1 = vsetq_lane_u64(w1, *senddesc23_w1, 1);
+		}
+	} else {
+		cookie = RTE_MBUF_DIRECT(m3) ? m3 : rte_mbuf_from_indirect(m3);
+		aura = (w0 >> 20) & 0xFFFFF;
+		w0 &= ~0xFFFFF00000UL;
+		w0 |= cnxk_nix_prefree_seg(m3, &aura) << 19;
+		w0 |= aura << 20;
+
+		if ((w0 & BIT_ULL(19)) == 0)
+			RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+	}
+	*senddesc23_w0 = vsetq_lane_u64(w0, *senddesc23_w0, 1);
+#ifndef RTE_LIBRTE_MEMPOOL_DEBUG
+	RTE_SET_USED(cookie);
+#endif
+}
+#endif
+
 static __rte_always_inline void
 cn20k_nix_xmit_prepare_tso(struct rte_mbuf *m, const uint64_t flags)
 {
@@ -1351,6 +1715,1078 @@ cn20k_nix_xmit_pkts_mseg(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts
 	return pkts;
 }
 
+#if defined(RTE_ARCH_ARM64)
+
+#define NIX_DESCS_PER_LOOP 4
+
+static __rte_always_inline void
+cn20k_nix_lmt_next(uint8_t dw, uintptr_t laddr, uint8_t *lnum, uint8_t *loff, uint8_t *shift,
+		   __uint128_t *data128, uintptr_t *next)
+{
+	/* Go to next line if we are out of space */
+	if ((*loff + (dw << 4)) > 128) {
+		*data128 = *data128 | (((__uint128_t)((*loff >> 4) - 1)) << *shift);
+		*shift = *shift + 3;
+		*loff = 0;
+		*lnum = *lnum + 1;
+	}
+
+	*next = (uintptr_t)LMT_OFF(laddr, *lnum, *loff);
+	*loff = *loff + (dw << 4);
+}
+
+static __rte_always_inline void
+cn20k_nix_xmit_store(struct cn20k_eth_txq *txq, struct rte_mbuf *mbuf, struct rte_mbuf **extm,
+		     uint8_t segdw, uintptr_t laddr, uint64x2_t cmd0, uint64x2_t cmd1,
+		     uint64x2_t cmd2, uint64x2_t cmd3, const uint16_t flags)
+{
+	RTE_SET_USED(txq);
+	RTE_SET_USED(mbuf);
+	RTE_SET_USED(extm);
+	RTE_SET_USED(segdw);
+
+	if (flags & NIX_TX_NEED_EXT_HDR) {
+		/* Store the prepared send desc to LMT lines */
+		if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+			vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
+			vst1q_u64(LMT_OFF(laddr, 0, 16), cmd2);
+			vst1q_u64(LMT_OFF(laddr, 0, 32), cmd1);
+			vst1q_u64(LMT_OFF(laddr, 0, 48), cmd3);
+		} else {
+			vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
+			vst1q_u64(LMT_OFF(laddr, 0, 16), cmd2);
+			vst1q_u64(LMT_OFF(laddr, 0, 32), cmd1);
+		}
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf->pool, (void **)&mbuf, 1, 0);
+	} else {
+		/* Store the prepared send desc to LMT lines */
+		vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
+		vst1q_u64(LMT_OFF(laddr, 0, 16), cmd1);
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf->pool, (void **)&mbuf, 1, 0);
+	}
+}
+
+static __rte_always_inline uint16_t
+cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, uint16_t pkts,
+			   uint64_t *cmd, const uint16_t flags)
+{
+	uint64x2_t dataoff_iova0, dataoff_iova1, dataoff_iova2, dataoff_iova3;
+	uint64x2_t len_olflags0, len_olflags1, len_olflags2, len_olflags3;
+	uint64x2_t cmd0[NIX_DESCS_PER_LOOP], cmd1[NIX_DESCS_PER_LOOP], cmd2[NIX_DESCS_PER_LOOP],
+		cmd3[NIX_DESCS_PER_LOOP];
+	uint16_t left, scalar, burst, i, lmt_id, c_lmt_id;
+	uint64_t *mbuf0, *mbuf1, *mbuf2, *mbuf3, pa;
+	uint64x2_t senddesc01_w0, senddesc23_w0;
+	uint64x2_t senddesc01_w1, senddesc23_w1;
+	uint64x2_t sendext01_w0, sendext23_w0;
+	uint64x2_t sendext01_w1, sendext23_w1;
+	uint64x2_t sendmem01_w0, sendmem23_w0;
+	uint64x2_t sendmem01_w1, sendmem23_w1;
+	uint8_t segdw[NIX_DESCS_PER_LOOP + 1];
+	uint64x2_t sgdesc01_w0, sgdesc23_w0;
+	uint64x2_t sgdesc01_w1, sgdesc23_w1;
+	struct cn20k_eth_txq *txq = tx_queue;
+	rte_iova_t io_addr = txq->io_addr;
+	uint8_t lnum, shift = 0, loff = 0;
+	uintptr_t laddr = txq->lmt_base;
+	uint8_t c_lnum, c_shft, c_loff;
+	uint64x2_t ltypes01, ltypes23;
+	uint64x2_t xtmp128, ytmp128;
+	uint64x2_t xmask01, xmask23;
+	uintptr_t c_laddr = laddr;
+	rte_iova_t c_io_addr;
+	uint64_t sa_base;
+	union wdata {
+		__uint128_t data128;
+		uint64_t data[2];
+	} wd;
+	struct rte_mbuf *extm = NULL;
+
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && txq->tx_compl.ena)
+		handle_tx_completion_pkts(txq, flags & NIX_TX_VWQE_F);
+
+	if (!(flags & NIX_TX_VWQE_F)) {
+		scalar = pkts & (NIX_DESCS_PER_LOOP - 1);
+		pkts = RTE_ALIGN_FLOOR(pkts, NIX_DESCS_PER_LOOP);
+		NIX_XMIT_FC_CHECK_RETURN(txq, pkts);
+	} else {
+		scalar = pkts & (NIX_DESCS_PER_LOOP - 1);
+		pkts = RTE_ALIGN_FLOOR(pkts, NIX_DESCS_PER_LOOP);
+	}
+
+	if (!(flags & NIX_TX_VWQE_F)) {
+		senddesc01_w0 = vld1q_dup_u64(&txq->send_hdr_w0);
+	} else {
+		uint64_t w0 = (txq->send_hdr_w0 & 0xFFFFF00000000000) |
+			      ((uint64_t)(cn20k_nix_tx_ext_subs(flags) + 1) << 40);
+
+		senddesc01_w0 = vdupq_n_u64(w0);
+	}
+	senddesc23_w0 = senddesc01_w0;
+
+	senddesc01_w1 = vdupq_n_u64(0);
+	senddesc23_w1 = senddesc01_w1;
+	if (!(flags & NIX_TX_OFFLOAD_MBUF_NOFF_F))
+		sgdesc01_w0 = vdupq_n_u64((NIX_SUBDC_SG << 60) | (NIX_SENDLDTYPE_LDWB << 58) |
+					  BIT_ULL(48));
+	else
+		sgdesc01_w0 = vdupq_n_u64((NIX_SUBDC_SG << 60) | BIT_ULL(48));
+	sgdesc23_w0 = sgdesc01_w0;
+
+	if (flags & NIX_TX_NEED_EXT_HDR) {
+		if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+			sendext01_w0 = vdupq_n_u64((NIX_SUBDC_EXT << 60) | BIT_ULL(15));
+			sendmem01_w0 = vdupq_n_u64((NIX_SUBDC_MEM << 60) |
+						   (NIX_SENDMEMALG_SETTSTMP << 56));
+			sendmem23_w0 = sendmem01_w0;
+			sendmem01_w1 = vdupq_n_u64(txq->ts_mem);
+			sendmem23_w1 = sendmem01_w1;
+		} else {
+			sendext01_w0 = vdupq_n_u64((NIX_SUBDC_EXT << 60));
+		}
+		sendext23_w0 = sendext01_w0;
+
+		if (flags & NIX_TX_OFFLOAD_VLAN_QINQ_F)
+			sendext01_w1 = vdupq_n_u64(12 | 12U << 24);
+		else
+			sendext01_w1 = vdupq_n_u64(0);
+		sendext23_w1 = sendext01_w1;
+	}
+
+	/* Get LMT base address and LMT ID as lcore id */
+	ROC_LMT_BASE_ID_GET(laddr, lmt_id);
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		ROC_LMT_CPT_BASE_ID_GET(c_laddr, c_lmt_id);
+		c_io_addr = txq->cpt_io_addr;
+		sa_base = txq->sa_base;
+	}
+
+	left = pkts;
+again:
+	/* Number of packets to prepare depends on offloads enabled. */
+	burst = left > cn20k_nix_pkts_per_vec_brst(flags) ? cn20k_nix_pkts_per_vec_brst(flags) :
+							    left;
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		wd.data128 = 0;
+		shift = 16;
+	}
+	lnum = 0;
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		loff = 0;
+		c_loff = 0;
+		c_lnum = 0;
+		c_shft = 16;
+	}
+
+	for (i = 0; i < burst; i += NIX_DESCS_PER_LOOP) {
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F &&
+		    (((int)((16 - c_lnum) << 1) - c_loff) < 4)) {
+			burst = i;
+			break;
+		}
+
+		/* Clear lower 32bit of SEND_HDR_W0 and SEND_SG_W0 */
+		senddesc01_w0 = vbicq_u64(senddesc01_w0, vdupq_n_u64(0x800FFFFFFFF));
+		sgdesc01_w0 = vbicq_u64(sgdesc01_w0, vdupq_n_u64(0xFFFFFFFF));
+
+		senddesc23_w0 = senddesc01_w0;
+		sgdesc23_w0 = sgdesc01_w0;
+
+		/* Clear vlan enables. */
+		if (flags & NIX_TX_NEED_EXT_HDR) {
+			sendext01_w1 = vbicq_u64(sendext01_w1, vdupq_n_u64(0x3FFFF00FFFF00));
+			sendext23_w1 = sendext01_w1;
+		}
+
+		if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+			/* Reset send mem alg to SETTSTMP from SUB*/
+			sendmem01_w0 = vbicq_u64(sendmem01_w0, vdupq_n_u64(BIT_ULL(59)));
+			/* Reset send mem address to default. */
+			sendmem01_w1 = vbicq_u64(sendmem01_w1, vdupq_n_u64(0xF));
+			sendmem23_w0 = sendmem01_w0;
+			sendmem23_w1 = sendmem01_w1;
+		}
+
+		/* Move mbufs to iova */
+		mbuf0 = (uint64_t *)tx_pkts[0];
+		mbuf1 = (uint64_t *)tx_pkts[1];
+		mbuf2 = (uint64_t *)tx_pkts[2];
+		mbuf3 = (uint64_t *)tx_pkts[3];
+
+		/*
+		 * Get mbuf's, olflags, iova, pktlen, dataoff
+		 * dataoff_iovaX.D[0] = iova,
+		 * dataoff_iovaX.D[1](15:0) = mbuf->dataoff
+		 * len_olflagsX.D[0] = ol_flags,
+		 * len_olflagsX.D[1](63:32) = mbuf->pkt_len
+		 */
+		dataoff_iova0 =
+			vsetq_lane_u64(((struct rte_mbuf *)mbuf0)->data_off, vld1q_u64(mbuf0), 1);
+		len_olflags0 = vld1q_u64(mbuf0 + 3);
+		dataoff_iova1 =
+			vsetq_lane_u64(((struct rte_mbuf *)mbuf1)->data_off, vld1q_u64(mbuf1), 1);
+		len_olflags1 = vld1q_u64(mbuf1 + 3);
+		dataoff_iova2 =
+			vsetq_lane_u64(((struct rte_mbuf *)mbuf2)->data_off, vld1q_u64(mbuf2), 1);
+		len_olflags2 = vld1q_u64(mbuf2 + 3);
+		dataoff_iova3 =
+			vsetq_lane_u64(((struct rte_mbuf *)mbuf3)->data_off, vld1q_u64(mbuf3), 1);
+		len_olflags3 = vld1q_u64(mbuf3 + 3);
+
+		/* Move mbufs to point pool */
+		mbuf0 = (uint64_t *)((uintptr_t)mbuf0 + offsetof(struct rte_mbuf, pool));
+		mbuf1 = (uint64_t *)((uintptr_t)mbuf1 + offsetof(struct rte_mbuf, pool));
+		mbuf2 = (uint64_t *)((uintptr_t)mbuf2 + offsetof(struct rte_mbuf, pool));
+		mbuf3 = (uint64_t *)((uintptr_t)mbuf3 + offsetof(struct rte_mbuf, pool));
+
+		if (flags & (NIX_TX_OFFLOAD_OL3_OL4_CSUM_F | NIX_TX_OFFLOAD_L3_L4_CSUM_F)) {
+			/* Get tx_offload for ol2, ol3, l2, l3 lengths */
+			/*
+			 * E(8):OL2_LEN(7):OL3_LEN(9):E(24):L3_LEN(9):L2_LEN(7)
+			 * E(8):OL2_LEN(7):OL3_LEN(9):E(24):L3_LEN(9):L2_LEN(7)
+			 */
+
+			asm volatile("LD1 {%[a].D}[0],[%[in]]\n\t"
+				     : [a] "+w"(senddesc01_w1)
+				     : [in] "r"(mbuf0 + 2)
+				     : "memory");
+
+			asm volatile("LD1 {%[a].D}[1],[%[in]]\n\t"
+				     : [a] "+w"(senddesc01_w1)
+				     : [in] "r"(mbuf1 + 2)
+				     : "memory");
+
+			asm volatile("LD1 {%[b].D}[0],[%[in]]\n\t"
+				     : [b] "+w"(senddesc23_w1)
+				     : [in] "r"(mbuf2 + 2)
+				     : "memory");
+
+			asm volatile("LD1 {%[b].D}[1],[%[in]]\n\t"
+				     : [b] "+w"(senddesc23_w1)
+				     : [in] "r"(mbuf3 + 2)
+				     : "memory");
+
+			/* Get pool pointer alone */
+			mbuf0 = (uint64_t *)*mbuf0;
+			mbuf1 = (uint64_t *)*mbuf1;
+			mbuf2 = (uint64_t *)*mbuf2;
+			mbuf3 = (uint64_t *)*mbuf3;
+		} else {
+			/* Get pool pointer alone */
+			mbuf0 = (uint64_t *)*mbuf0;
+			mbuf1 = (uint64_t *)*mbuf1;
+			mbuf2 = (uint64_t *)*mbuf2;
+			mbuf3 = (uint64_t *)*mbuf3;
+		}
+
+		const uint8x16_t shuf_mask2 = {
+			0x4, 0x5, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+			0xc, 0xd, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+		};
+		xtmp128 = vzip2q_u64(len_olflags0, len_olflags1);
+		ytmp128 = vzip2q_u64(len_olflags2, len_olflags3);
+
+		/*
+		 * Pick only 16 bits of pktlen preset at bits 63:32
+		 * and place them at bits 15:0.
+		 */
+		xtmp128 = vqtbl1q_u8(xtmp128, shuf_mask2);
+		ytmp128 = vqtbl1q_u8(ytmp128, shuf_mask2);
+
+		/* Add pairwise to get dataoff + iova in sgdesc_w1 */
+		sgdesc01_w1 = vpaddq_u64(dataoff_iova0, dataoff_iova1);
+		sgdesc23_w1 = vpaddq_u64(dataoff_iova2, dataoff_iova3);
+
+		/* Orr both sgdesc_w0 and senddesc_w0 with 16 bits of
+		 * pktlen at 15:0 position.
+		 */
+		sgdesc01_w0 = vorrq_u64(sgdesc01_w0, xtmp128);
+		sgdesc23_w0 = vorrq_u64(sgdesc23_w0, ytmp128);
+		senddesc01_w0 = vorrq_u64(senddesc01_w0, xtmp128);
+		senddesc23_w0 = vorrq_u64(senddesc23_w0, ytmp128);
+
+		/* Move mbuf to point to pool_id. */
+		mbuf0 = (uint64_t *)((uintptr_t)mbuf0 + offsetof(struct rte_mempool, pool_id));
+		mbuf1 = (uint64_t *)((uintptr_t)mbuf1 + offsetof(struct rte_mempool, pool_id));
+		mbuf2 = (uint64_t *)((uintptr_t)mbuf2 + offsetof(struct rte_mempool, pool_id));
+		mbuf3 = (uint64_t *)((uintptr_t)mbuf3 + offsetof(struct rte_mempool, pool_id));
+
+		if ((flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F) &&
+		    !(flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F)) {
+			/*
+			 * Lookup table to translate ol_flags to
+			 * il3/il4 types. But we still use ol3/ol4 types in
+			 * senddesc_w1 as only one header processing is enabled.
+			 */
+			const uint8x16_t tbl = {
+				/* [0-15] = il4type:il3type */
+				0x04, /* none (IPv6 assumed) */
+				0x14, /* RTE_MBUF_F_TX_TCP_CKSUM (IPv6 assumed) */
+				0x24, /* RTE_MBUF_F_TX_SCTP_CKSUM (IPv6 assumed) */
+				0x34, /* RTE_MBUF_F_TX_UDP_CKSUM (IPv6 assumed) */
+				0x03, /* RTE_MBUF_F_TX_IP_CKSUM */
+				0x13, /* RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_TCP_CKSUM */
+				0x23, /* RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_SCTP_CKSUM */
+				0x33, /* RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_UDP_CKSUM */
+				0x02, /* RTE_MBUF_F_TX_IPV4  */
+				0x12, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_TCP_CKSUM */
+				0x22, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_SCTP_CKSUM */
+				0x32, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_UDP_CKSUM */
+				0x03, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM */
+				0x13, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM |
+				       * RTE_MBUF_F_TX_TCP_CKSUM
+				       */
+				0x23, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM |
+				       * RTE_MBUF_F_TX_SCTP_CKSUM
+				       */
+				0x33, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM |
+				       * RTE_MBUF_F_TX_UDP_CKSUM
+				       */
+			};
+
+			/* Extract olflags to translate to iltypes */
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			/*
+			 * E(47):L3_LEN(9):L2_LEN(7+z)
+			 * E(47):L3_LEN(9):L2_LEN(7+z)
+			 */
+			senddesc01_w1 = vshlq_n_u64(senddesc01_w1, 1);
+			senddesc23_w1 = vshlq_n_u64(senddesc23_w1, 1);
+
+			/* Move OLFLAGS bits 55:52 to 51:48
+			 * with zeros preprended on the byte and rest
+			 * don't care
+			 */
+			xtmp128 = vshrq_n_u8(xtmp128, 4);
+			ytmp128 = vshrq_n_u8(ytmp128, 4);
+			/*
+			 * E(48):L3_LEN(8):L2_LEN(z+7)
+			 * E(48):L3_LEN(8):L2_LEN(z+7)
+			 */
+			const int8x16_t tshft3 = {
+				-1, 0, 8, 8, 8, 8, 8, 8,
+				-1, 0, 8, 8, 8, 8, 8, 8,
+			};
+
+			senddesc01_w1 = vshlq_u8(senddesc01_w1, tshft3);
+			senddesc23_w1 = vshlq_u8(senddesc23_w1, tshft3);
+
+			/* Do the lookup */
+			ltypes01 = vqtbl1q_u8(tbl, xtmp128);
+			ltypes23 = vqtbl1q_u8(tbl, ytmp128);
+
+			/* Pick only relevant fields i.e Bit 48:55 of iltype
+			 * and place it in ol3/ol4type of senddesc_w1
+			 */
+			const uint8x16_t shuf_mask0 = {
+				0xFF, 0xFF, 0xFF, 0xFF, 0x6, 0xFF, 0xFF, 0xFF,
+				0xFF, 0xFF, 0xFF, 0xFF, 0xE, 0xFF, 0xFF, 0xFF,
+			};
+
+			ltypes01 = vqtbl1q_u8(ltypes01, shuf_mask0);
+			ltypes23 = vqtbl1q_u8(ltypes23, shuf_mask0);
+
+			/* Prepare ol4ptr, ol3ptr from ol3len, ol2len.
+			 * a [E(32):E(16):OL3(8):OL2(8)]
+			 * a = a + (a << 8)
+			 * a [E(32):E(16):(OL3+OL2):OL2]
+			 * => E(32):E(16)::OL4PTR(8):OL3PTR(8)
+			 */
+			senddesc01_w1 = vaddq_u8(senddesc01_w1, vshlq_n_u16(senddesc01_w1, 8));
+			senddesc23_w1 = vaddq_u8(senddesc23_w1, vshlq_n_u16(senddesc23_w1, 8));
+
+			/* Move ltypes to senddesc*_w1 */
+			senddesc01_w1 = vorrq_u64(senddesc01_w1, ltypes01);
+			senddesc23_w1 = vorrq_u64(senddesc23_w1, ltypes23);
+		} else if (!(flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F) &&
+			   (flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F)) {
+			/*
+			 * Lookup table to translate ol_flags to
+			 * ol3/ol4 types.
+			 */
+
+			const uint8x16_t tbl = {
+				/* [0-15] = ol4type:ol3type */
+				0x00, /* none */
+				0x03, /* OUTER_IP_CKSUM */
+				0x02, /* OUTER_IPV4 */
+				0x03, /* OUTER_IPV4 | OUTER_IP_CKSUM */
+				0x04, /* OUTER_IPV6 */
+				0x00, /* OUTER_IPV6 | OUTER_IP_CKSUM */
+				0x00, /* OUTER_IPV6 | OUTER_IPV4 */
+				0x00, /* OUTER_IPV6 | OUTER_IPV4 |
+				       * OUTER_IP_CKSUM
+				       */
+				0x00, /* OUTER_UDP_CKSUM */
+				0x33, /* OUTER_UDP_CKSUM | OUTER_IP_CKSUM */
+				0x32, /* OUTER_UDP_CKSUM | OUTER_IPV4 */
+				0x33, /* OUTER_UDP_CKSUM | OUTER_IPV4 |
+				       * OUTER_IP_CKSUM
+				       */
+				0x34, /* OUTER_UDP_CKSUM | OUTER_IPV6 */
+				0x00, /* OUTER_UDP_CKSUM | OUTER_IPV6 |
+				       * OUTER_IP_CKSUM
+				       */
+				0x00, /* OUTER_UDP_CKSUM | OUTER_IPV6 |
+				       * OUTER_IPV4
+				       */
+				0x00, /* OUTER_UDP_CKSUM | OUTER_IPV6 |
+				       * OUTER_IPV4 | OUTER_IP_CKSUM
+				       */
+			};
+
+			/* Extract olflags to translate to iltypes */
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			/*
+			 * E(47):OL3_LEN(9):OL2_LEN(7+z)
+			 * E(47):OL3_LEN(9):OL2_LEN(7+z)
+			 */
+			const uint8x16_t shuf_mask5 = {
+				0x6, 0x5, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+				0xE, 0xD, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+			};
+			senddesc01_w1 = vqtbl1q_u8(senddesc01_w1, shuf_mask5);
+			senddesc23_w1 = vqtbl1q_u8(senddesc23_w1, shuf_mask5);
+
+			/* Extract outer ol flags only */
+			const uint64x2_t o_cksum_mask = {
+				0x1C00020000000000,
+				0x1C00020000000000,
+			};
+
+			xtmp128 = vandq_u64(xtmp128, o_cksum_mask);
+			ytmp128 = vandq_u64(ytmp128, o_cksum_mask);
+
+			/* Extract OUTER_UDP_CKSUM bit 41 and
+			 * move it to bit 61
+			 */
+
+			xtmp128 = xtmp128 | vshlq_n_u64(xtmp128, 20);
+			ytmp128 = ytmp128 | vshlq_n_u64(ytmp128, 20);
+
+			/* Shift oltype by 2 to start nibble from BIT(56)
+			 * instead of BIT(58)
+			 */
+			xtmp128 = vshrq_n_u8(xtmp128, 2);
+			ytmp128 = vshrq_n_u8(ytmp128, 2);
+			/*
+			 * E(48):L3_LEN(8):L2_LEN(z+7)
+			 * E(48):L3_LEN(8):L2_LEN(z+7)
+			 */
+			const int8x16_t tshft3 = {
+				-1, 0, 8, 8, 8, 8, 8, 8,
+				-1, 0, 8, 8, 8, 8, 8, 8,
+			};
+
+			senddesc01_w1 = vshlq_u8(senddesc01_w1, tshft3);
+			senddesc23_w1 = vshlq_u8(senddesc23_w1, tshft3);
+
+			/* Do the lookup */
+			ltypes01 = vqtbl1q_u8(tbl, xtmp128);
+			ltypes23 = vqtbl1q_u8(tbl, ytmp128);
+
+			/* Pick only relevant fields i.e Bit 56:63 of oltype
+			 * and place it in ol3/ol4type of senddesc_w1
+			 */
+			const uint8x16_t shuf_mask0 = {
+				0xFF, 0xFF, 0xFF, 0xFF, 0x7, 0xFF, 0xFF, 0xFF,
+				0xFF, 0xFF, 0xFF, 0xFF, 0xF, 0xFF, 0xFF, 0xFF,
+			};
+
+			ltypes01 = vqtbl1q_u8(ltypes01, shuf_mask0);
+			ltypes23 = vqtbl1q_u8(ltypes23, shuf_mask0);
+
+			/* Prepare ol4ptr, ol3ptr from ol3len, ol2len.
+			 * a [E(32):E(16):OL3(8):OL2(8)]
+			 * a = a + (a << 8)
+			 * a [E(32):E(16):(OL3+OL2):OL2]
+			 * => E(32):E(16)::OL4PTR(8):OL3PTR(8)
+			 */
+			senddesc01_w1 = vaddq_u8(senddesc01_w1, vshlq_n_u16(senddesc01_w1, 8));
+			senddesc23_w1 = vaddq_u8(senddesc23_w1, vshlq_n_u16(senddesc23_w1, 8));
+
+			/* Move ltypes to senddesc*_w1 */
+			senddesc01_w1 = vorrq_u64(senddesc01_w1, ltypes01);
+			senddesc23_w1 = vorrq_u64(senddesc23_w1, ltypes23);
+		} else if ((flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F) &&
+			   (flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F)) {
+			/* Lookup table to translate ol_flags to
+			 * ol4type, ol3type, il4type, il3type of senddesc_w1
+			 */
+			const uint8x16x2_t tbl = {{
+				{
+					/* [0-15] = il4type:il3type */
+					0x04, /* none (IPv6) */
+					0x14, /* RTE_MBUF_F_TX_TCP_CKSUM (IPv6) */
+					0x24, /* RTE_MBUF_F_TX_SCTP_CKSUM (IPv6) */
+					0x34, /* RTE_MBUF_F_TX_UDP_CKSUM (IPv6) */
+					0x03, /* RTE_MBUF_F_TX_IP_CKSUM */
+					0x13, /* RTE_MBUF_F_TX_IP_CKSUM |
+					       * RTE_MBUF_F_TX_TCP_CKSUM
+					       */
+					0x23, /* RTE_MBUF_F_TX_IP_CKSUM |
+					       * RTE_MBUF_F_TX_SCTP_CKSUM
+					       */
+					0x33, /* RTE_MBUF_F_TX_IP_CKSUM |
+					       * RTE_MBUF_F_TX_UDP_CKSUM
+					       */
+					0x02, /* RTE_MBUF_F_TX_IPV4 */
+					0x12, /* RTE_MBUF_F_TX_IPV4 |
+					       * RTE_MBUF_F_TX_TCP_CKSUM
+					       */
+					0x22, /* RTE_MBUF_F_TX_IPV4 |
+					       * RTE_MBUF_F_TX_SCTP_CKSUM
+					       */
+					0x32, /* RTE_MBUF_F_TX_IPV4 |
+					       * RTE_MBUF_F_TX_UDP_CKSUM
+					       */
+					0x03, /* RTE_MBUF_F_TX_IPV4 |
+					       * RTE_MBUF_F_TX_IP_CKSUM
+					       */
+					0x13, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM |
+					       * RTE_MBUF_F_TX_TCP_CKSUM
+					       */
+					0x23, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM |
+					       * RTE_MBUF_F_TX_SCTP_CKSUM
+					       */
+					0x33, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM |
+					       * RTE_MBUF_F_TX_UDP_CKSUM
+					       */
+				},
+
+				{
+					/* [16-31] = ol4type:ol3type */
+					0x00, /* none */
+					0x03, /* OUTER_IP_CKSUM */
+					0x02, /* OUTER_IPV4 */
+					0x03, /* OUTER_IPV4 | OUTER_IP_CKSUM */
+					0x04, /* OUTER_IPV6 */
+					0x00, /* OUTER_IPV6 | OUTER_IP_CKSUM */
+					0x00, /* OUTER_IPV6 | OUTER_IPV4 */
+					0x00, /* OUTER_IPV6 | OUTER_IPV4 |
+					       * OUTER_IP_CKSUM
+					       */
+					0x00, /* OUTER_UDP_CKSUM */
+					0x33, /* OUTER_UDP_CKSUM |
+					       * OUTER_IP_CKSUM
+					       */
+					0x32, /* OUTER_UDP_CKSUM |
+					       * OUTER_IPV4
+					       */
+					0x33, /* OUTER_UDP_CKSUM |
+					       * OUTER_IPV4 | OUTER_IP_CKSUM
+					       */
+					0x34, /* OUTER_UDP_CKSUM |
+					       * OUTER_IPV6
+					       */
+					0x00, /* OUTER_UDP_CKSUM | OUTER_IPV6 |
+					       * OUTER_IP_CKSUM
+					       */
+					0x00, /* OUTER_UDP_CKSUM | OUTER_IPV6 |
+					       * OUTER_IPV4
+					       */
+					0x00, /* OUTER_UDP_CKSUM | OUTER_IPV6 |
+					       * OUTER_IPV4 | OUTER_IP_CKSUM
+					       */
+				},
+			}};
+
+			/* Extract olflags to translate to oltype & iltype */
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			/*
+			 * E(8):OL2_LN(7):OL3_LN(9):E(23):L3_LN(9):L2_LN(7+z)
+			 * E(8):OL2_LN(7):OL3_LN(9):E(23):L3_LN(9):L2_LN(7+z)
+			 */
+			const uint32x4_t tshft_4 = {
+				1,
+				0,
+				1,
+				0,
+			};
+			senddesc01_w1 = vshlq_u32(senddesc01_w1, tshft_4);
+			senddesc23_w1 = vshlq_u32(senddesc23_w1, tshft_4);
+
+			/*
+			 * E(32):L3_LEN(8):L2_LEN(7+Z):OL3_LEN(8):OL2_LEN(7+Z)
+			 * E(32):L3_LEN(8):L2_LEN(7+Z):OL3_LEN(8):OL2_LEN(7+Z)
+			 */
+			const uint8x16_t shuf_mask5 = {
+				0x6, 0x5, 0x0, 0x1, 0xFF, 0xFF, 0xFF, 0xFF,
+				0xE, 0xD, 0x8, 0x9, 0xFF, 0xFF, 0xFF, 0xFF,
+			};
+			senddesc01_w1 = vqtbl1q_u8(senddesc01_w1, shuf_mask5);
+			senddesc23_w1 = vqtbl1q_u8(senddesc23_w1, shuf_mask5);
+
+			/* Extract outer and inner header ol_flags */
+			const uint64x2_t oi_cksum_mask = {
+				0x1CF0020000000000,
+				0x1CF0020000000000,
+			};
+
+			xtmp128 = vandq_u64(xtmp128, oi_cksum_mask);
+			ytmp128 = vandq_u64(ytmp128, oi_cksum_mask);
+
+			/* Extract OUTER_UDP_CKSUM bit 41 and
+			 * move it to bit 61
+			 */
+
+			xtmp128 = xtmp128 | vshlq_n_u64(xtmp128, 20);
+			ytmp128 = ytmp128 | vshlq_n_u64(ytmp128, 20);
+
+			/* Shift right oltype by 2 and iltype by 4
+			 * to start oltype nibble from BIT(58)
+			 * instead of BIT(56) and iltype nibble from BIT(48)
+			 * instead of BIT(52).
+			 */
+			const int8x16_t tshft5 = {
+				8, 8, 8, 8, 8, 8, -4, -2,
+				8, 8, 8, 8, 8, 8, -4, -2,
+			};
+
+			xtmp128 = vshlq_u8(xtmp128, tshft5);
+			ytmp128 = vshlq_u8(ytmp128, tshft5);
+			/*
+			 * E(32):L3_LEN(8):L2_LEN(8):OL3_LEN(8):OL2_LEN(8)
+			 * E(32):L3_LEN(8):L2_LEN(8):OL3_LEN(8):OL2_LEN(8)
+			 */
+			const int8x16_t tshft3 = {
+				-1, 0, -1, 0, 0, 0, 0, 0,
+				-1, 0, -1, 0, 0, 0, 0, 0,
+			};
+
+			senddesc01_w1 = vshlq_u8(senddesc01_w1, tshft3);
+			senddesc23_w1 = vshlq_u8(senddesc23_w1, tshft3);
+
+			/* Mark Bit(4) of oltype */
+			const uint64x2_t oi_cksum_mask2 = {
+				0x1000000000000000,
+				0x1000000000000000,
+			};
+
+			xtmp128 = vorrq_u64(xtmp128, oi_cksum_mask2);
+			ytmp128 = vorrq_u64(ytmp128, oi_cksum_mask2);
+
+			/* Do the lookup */
+			ltypes01 = vqtbl2q_u8(tbl, xtmp128);
+			ltypes23 = vqtbl2q_u8(tbl, ytmp128);
+
+			/* Pick only relevant fields i.e Bit 48:55 of iltype and
+			 * Bit 56:63 of oltype and place it in corresponding
+			 * place in senddesc_w1.
+			 */
+			const uint8x16_t shuf_mask0 = {
+				0xFF, 0xFF, 0xFF, 0xFF, 0x7, 0x6, 0xFF, 0xFF,
+				0xFF, 0xFF, 0xFF, 0xFF, 0xF, 0xE, 0xFF, 0xFF,
+			};
+
+			ltypes01 = vqtbl1q_u8(ltypes01, shuf_mask0);
+			ltypes23 = vqtbl1q_u8(ltypes23, shuf_mask0);
+
+			/* Prepare l4ptr, l3ptr, ol4ptr, ol3ptr from
+			 * l3len, l2len, ol3len, ol2len.
+			 * a [E(32):L3(8):L2(8):OL3(8):OL2(8)]
+			 * a = a + (a << 8)
+			 * a [E:(L3+L2):(L2+OL3):(OL3+OL2):OL2]
+			 * a = a + (a << 16)
+			 * a [E:(L3+L2+OL3+OL2):(L2+OL3+OL2):(OL3+OL2):OL2]
+			 * => E(32):IL4PTR(8):IL3PTR(8):OL4PTR(8):OL3PTR(8)
+			 */
+			senddesc01_w1 = vaddq_u8(senddesc01_w1, vshlq_n_u32(senddesc01_w1, 8));
+			senddesc23_w1 = vaddq_u8(senddesc23_w1, vshlq_n_u32(senddesc23_w1, 8));
+
+			/* Continue preparing l4ptr, l3ptr, ol4ptr, ol3ptr */
+			senddesc01_w1 = vaddq_u8(senddesc01_w1, vshlq_n_u32(senddesc01_w1, 16));
+			senddesc23_w1 = vaddq_u8(senddesc23_w1, vshlq_n_u32(senddesc23_w1, 16));
+
+			/* Move ltypes to senddesc*_w1 */
+			senddesc01_w1 = vorrq_u64(senddesc01_w1, ltypes01);
+			senddesc23_w1 = vorrq_u64(senddesc23_w1, ltypes23);
+		}
+
+		xmask01 = vdupq_n_u64(0);
+		xmask23 = xmask01;
+		asm volatile("LD1 {%[a].H}[0],[%[in]]\n\t"
+			     : [a] "+w"(xmask01)
+			     : [in] "r"(mbuf0)
+			     : "memory");
+
+		asm volatile("LD1 {%[a].H}[4],[%[in]]\n\t"
+			     : [a] "+w"(xmask01)
+			     : [in] "r"(mbuf1)
+			     : "memory");
+
+		asm volatile("LD1 {%[b].H}[0],[%[in]]\n\t"
+			     : [b] "+w"(xmask23)
+			     : [in] "r"(mbuf2)
+			     : "memory");
+
+		asm volatile("LD1 {%[b].H}[4],[%[in]]\n\t"
+			     : [b] "+w"(xmask23)
+			     : [in] "r"(mbuf3)
+			     : "memory");
+		xmask01 = vshlq_n_u64(xmask01, 20);
+		xmask23 = vshlq_n_u64(xmask23, 20);
+
+		senddesc01_w0 = vorrq_u64(senddesc01_w0, xmask01);
+		senddesc23_w0 = vorrq_u64(senddesc23_w0, xmask23);
+
+		if (flags & NIX_TX_OFFLOAD_VLAN_QINQ_F) {
+			/* Tx ol_flag for vlan. */
+			const uint64x2_t olv = {RTE_MBUF_F_TX_VLAN, RTE_MBUF_F_TX_VLAN};
+			/* Bit enable for VLAN1 */
+			const uint64x2_t mlv = {BIT_ULL(49), BIT_ULL(49)};
+			/* Tx ol_flag for QnQ. */
+			const uint64x2_t olq = {RTE_MBUF_F_TX_QINQ, RTE_MBUF_F_TX_QINQ};
+			/* Bit enable for VLAN0 */
+			const uint64x2_t mlq = {BIT_ULL(48), BIT_ULL(48)};
+			/* Load vlan values from packet. outer is VLAN 0 */
+			uint64x2_t ext01 = {
+				((uint32_t)tx_pkts[0]->vlan_tci_outer) << 8 |
+					((uint64_t)tx_pkts[0]->vlan_tci) << 32,
+				((uint32_t)tx_pkts[1]->vlan_tci_outer) << 8 |
+					((uint64_t)tx_pkts[1]->vlan_tci) << 32,
+			};
+			uint64x2_t ext23 = {
+				((uint32_t)tx_pkts[2]->vlan_tci_outer) << 8 |
+					((uint64_t)tx_pkts[2]->vlan_tci) << 32,
+				((uint32_t)tx_pkts[3]->vlan_tci_outer) << 8 |
+					((uint64_t)tx_pkts[3]->vlan_tci) << 32,
+			};
+
+			/* Get ol_flags of the packets. */
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			/* ORR vlan outer/inner values into cmd. */
+			sendext01_w1 = vorrq_u64(sendext01_w1, ext01);
+			sendext23_w1 = vorrq_u64(sendext23_w1, ext23);
+
+			/* Test for offload enable bits and generate masks. */
+			xtmp128 = vorrq_u64(vandq_u64(vtstq_u64(xtmp128, olv), mlv),
+					    vandq_u64(vtstq_u64(xtmp128, olq), mlq));
+			ytmp128 = vorrq_u64(vandq_u64(vtstq_u64(ytmp128, olv), mlv),
+					    vandq_u64(vtstq_u64(ytmp128, olq), mlq));
+
+			/* Set vlan enable bits into cmd based on mask. */
+			sendext01_w1 = vorrq_u64(sendext01_w1, xtmp128);
+			sendext23_w1 = vorrq_u64(sendext23_w1, ytmp128);
+		}
+
+		if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+			/* Tx ol_flag for timestamp. */
+			const uint64x2_t olf = {RTE_MBUF_F_TX_IEEE1588_TMST,
+						RTE_MBUF_F_TX_IEEE1588_TMST};
+			/* Set send mem alg to SUB. */
+			const uint64x2_t alg = {BIT_ULL(59), BIT_ULL(59)};
+			/* Increment send mem address by 8. */
+			const uint64x2_t addr = {0x8, 0x8};
+
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			/* Check if timestamp is requested and generate inverted
+			 * mask as we need not make any changes to default cmd
+			 * value.
+			 */
+			xtmp128 = vmvnq_u32(vtstq_u64(olf, xtmp128));
+			ytmp128 = vmvnq_u32(vtstq_u64(olf, ytmp128));
+
+			/* Change send mem address to an 8 byte offset when
+			 * TSTMP is disabled.
+			 */
+			sendmem01_w1 = vaddq_u64(sendmem01_w1, vandq_u64(xtmp128, addr));
+			sendmem23_w1 = vaddq_u64(sendmem23_w1, vandq_u64(ytmp128, addr));
+			/* Change send mem alg to SUB when TSTMP is disabled. */
+			sendmem01_w0 = vorrq_u64(sendmem01_w0, vandq_u64(xtmp128, alg));
+			sendmem23_w0 = vorrq_u64(sendmem23_w0, vandq_u64(ytmp128, alg));
+
+			cmd3[0] = vzip1q_u64(sendmem01_w0, sendmem01_w1);
+			cmd3[1] = vzip2q_u64(sendmem01_w0, sendmem01_w1);
+			cmd3[2] = vzip1q_u64(sendmem23_w0, sendmem23_w1);
+			cmd3[3] = vzip2q_u64(sendmem23_w0, sendmem23_w1);
+		}
+
+		if ((flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) && !(flags & NIX_TX_OFFLOAD_SECURITY_F)) {
+			/* Set don't free bit if reference count > 1 */
+			cn20k_nix_prefree_seg_vec(tx_pkts, &extm, txq, &senddesc01_w0,
+						  &senddesc23_w0, &senddesc01_w1, &senddesc23_w1);
+		} else if (!(flags & NIX_TX_MULTI_SEG_F) && !(flags & NIX_TX_OFFLOAD_SECURITY_F)) {
+			/* Move mbufs to iova */
+			mbuf0 = (uint64_t *)tx_pkts[0];
+			mbuf1 = (uint64_t *)tx_pkts[1];
+			mbuf2 = (uint64_t *)tx_pkts[2];
+			mbuf3 = (uint64_t *)tx_pkts[3];
+
+			/* Mark mempool object as "put" since
+			 * it is freed by NIX
+			 */
+			RTE_MEMPOOL_CHECK_COOKIES(((struct rte_mbuf *)mbuf0)->pool, (void **)&mbuf0,
+						  1, 0);
+
+			RTE_MEMPOOL_CHECK_COOKIES(((struct rte_mbuf *)mbuf1)->pool, (void **)&mbuf1,
+						  1, 0);
+
+			RTE_MEMPOOL_CHECK_COOKIES(((struct rte_mbuf *)mbuf2)->pool, (void **)&mbuf2,
+						  1, 0);
+
+			RTE_MEMPOOL_CHECK_COOKIES(((struct rte_mbuf *)mbuf3)->pool, (void **)&mbuf3,
+						  1, 0);
+		}
+
+		/* Create 4W cmd for 4 mbufs (sendhdr, sgdesc) */
+		cmd0[0] = vzip1q_u64(senddesc01_w0, senddesc01_w1);
+		cmd0[1] = vzip2q_u64(senddesc01_w0, senddesc01_w1);
+		cmd0[2] = vzip1q_u64(senddesc23_w0, senddesc23_w1);
+		cmd0[3] = vzip2q_u64(senddesc23_w0, senddesc23_w1);
+
+		cmd1[0] = vzip1q_u64(sgdesc01_w0, sgdesc01_w1);
+		cmd1[1] = vzip2q_u64(sgdesc01_w0, sgdesc01_w1);
+		cmd1[2] = vzip1q_u64(sgdesc23_w0, sgdesc23_w1);
+		cmd1[3] = vzip2q_u64(sgdesc23_w0, sgdesc23_w1);
+
+		if (flags & NIX_TX_NEED_EXT_HDR) {
+			cmd2[0] = vzip1q_u64(sendext01_w0, sendext01_w1);
+			cmd2[1] = vzip2q_u64(sendext01_w0, sendext01_w1);
+			cmd2[2] = vzip1q_u64(sendext23_w0, sendext23_w1);
+			cmd2[3] = vzip2q_u64(sendext23_w0, sendext23_w1);
+		}
+
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+			const uint64x2_t olf = {RTE_MBUF_F_TX_SEC_OFFLOAD,
+						RTE_MBUF_F_TX_SEC_OFFLOAD};
+			uintptr_t next;
+			uint8_t dw;
+
+			/* Extract ol_flags. */
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			xtmp128 = vtstq_u64(olf, xtmp128);
+			ytmp128 = vtstq_u64(olf, ytmp128);
+
+			/* Process mbuf0 */
+			dw = cn20k_nix_tx_dwords(flags, segdw[0]);
+			if (vgetq_lane_u64(xtmp128, 0))
+				cn20k_nix_prep_sec_vec(tx_pkts[0], &cmd0[0], &cmd1[0], &next,
+						       c_laddr, &c_lnum, &c_loff, &c_shft, sa_base,
+						       flags);
+			else
+				cn20k_nix_lmt_next(dw, laddr, &lnum, &loff, &shift, &wd.data128,
+						   &next);
+
+			/* Store mbuf0 to LMTLINE/CPT NIXTX area */
+			cn20k_nix_xmit_store(txq, tx_pkts[0], &extm, segdw[0], next, cmd0[0],
+					     cmd1[0], cmd2[0], cmd3[0], flags);
+
+			/* Process mbuf1 */
+			dw = cn20k_nix_tx_dwords(flags, segdw[1]);
+			if (vgetq_lane_u64(xtmp128, 1))
+				cn20k_nix_prep_sec_vec(tx_pkts[1], &cmd0[1], &cmd1[1], &next,
+						       c_laddr, &c_lnum, &c_loff, &c_shft, sa_base,
+						       flags);
+			else
+				cn20k_nix_lmt_next(dw, laddr, &lnum, &loff, &shift, &wd.data128,
+						   &next);
+
+			/* Store mbuf1 to LMTLINE/CPT NIXTX area */
+			cn20k_nix_xmit_store(txq, tx_pkts[1], &extm, segdw[1], next, cmd0[1],
+					     cmd1[1], cmd2[1], cmd3[1], flags);
+
+			/* Process mbuf2 */
+			dw = cn20k_nix_tx_dwords(flags, segdw[2]);
+			if (vgetq_lane_u64(ytmp128, 0))
+				cn20k_nix_prep_sec_vec(tx_pkts[2], &cmd0[2], &cmd1[2], &next,
+						       c_laddr, &c_lnum, &c_loff, &c_shft, sa_base,
+						       flags);
+			else
+				cn20k_nix_lmt_next(dw, laddr, &lnum, &loff, &shift, &wd.data128,
+						   &next);
+
+			/* Store mbuf2 to LMTLINE/CPT NIXTX area */
+			cn20k_nix_xmit_store(txq, tx_pkts[2], &extm, segdw[2], next, cmd0[2],
+					     cmd1[2], cmd2[2], cmd3[2], flags);
+
+			/* Process mbuf3 */
+			dw = cn20k_nix_tx_dwords(flags, segdw[3]);
+			if (vgetq_lane_u64(ytmp128, 1))
+				cn20k_nix_prep_sec_vec(tx_pkts[3], &cmd0[3], &cmd1[3], &next,
+						       c_laddr, &c_lnum, &c_loff, &c_shft, sa_base,
+						       flags);
+			else
+				cn20k_nix_lmt_next(dw, laddr, &lnum, &loff, &shift, &wd.data128,
+						   &next);
+
+			/* Store mbuf3 to LMTLINE/CPT NIXTX area */
+			cn20k_nix_xmit_store(txq, tx_pkts[3], &extm, segdw[3], next, cmd0[3],
+					     cmd1[3], cmd2[3], cmd3[3], flags);
+
+		} else if (flags & NIX_TX_NEED_EXT_HDR) {
+			/* Store the prepared send desc to LMT lines */
+			if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+				vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd2[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd1[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd3[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd0[1]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd2[1]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 96), cmd1[1]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 112), cmd3[1]);
+				lnum += 1;
+				vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd2[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd1[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd3[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd0[3]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd2[3]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 96), cmd1[3]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 112), cmd3[3]);
+			} else {
+				vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd2[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd1[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd0[1]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd2[1]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd1[1]);
+				lnum += 1;
+				vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd2[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd1[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd0[3]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd2[3]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd1[3]);
+			}
+			lnum += 1;
+		} else {
+			/* Store the prepared send desc to LMT lines */
+			vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[0]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd1[0]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd0[1]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd1[1]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd0[2]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd1[2]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 96), cmd0[3]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 112), cmd1[3]);
+			lnum += 1;
+		}
+
+		tx_pkts = tx_pkts + NIX_DESCS_PER_LOOP;
+	}
+
+	/* Roundup lnum to last line if it is partial */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		lnum = lnum + !!loff;
+		wd.data128 = wd.data128 | (((__uint128_t)(((loff >> 4) - 1) & 0x7) << shift));
+	}
+
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+		wd.data[0] >>= 16;
+
+	if ((flags & NIX_TX_VWQE_F) && !(ws[3] & BIT_ULL(35)))
+		ws[3] = roc_sso_hws_head_wait(ws[0]);
+
+	left -= burst;
+
+	/* Submit CPT instructions if any */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		uint16_t sec_pkts = (c_lnum << 1) + c_loff;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, sec_pkts);
+		cn20k_nix_sec_fc_wait(txq, sec_pkts);
+		cn20k_nix_sec_steorl(c_io_addr, c_lmt_id, c_lnum, c_loff, c_shft);
+	}
+
+	/* Trigger LMTST */
+	if (lnum > 16) {
+		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F))
+			wd.data[0] = cn20k_nix_tx_steor_vec_data(flags);
+
+		pa = io_addr | (wd.data[0] & 0x7) << 4;
+		wd.data[0] &= ~0x7ULL;
+
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+			wd.data[0] <<= 16;
+
+		wd.data[0] |= (15ULL << 12);
+		wd.data[0] |= (uint64_t)lmt_id;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, cn20k_nix_pkts_per_vec_brst(flags) >> 1);
+		/* STEOR0 */
+		roc_lmt_submit_steorl(wd.data[0], pa);
+
+		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F))
+			wd.data[1] = cn20k_nix_tx_steor_vec_data(flags);
+
+		pa = io_addr | (wd.data[1] & 0x7) << 4;
+		wd.data[1] &= ~0x7ULL;
+
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+			wd.data[1] <<= 16;
+
+		wd.data[1] |= ((uint64_t)(lnum - 17)) << 12;
+		wd.data[1] |= (uint64_t)(lmt_id + 16);
+
+		if (flags & NIX_TX_VWQE_F) {
+			cn20k_nix_vwqe_wait_fc(txq,
+					       burst - (cn20k_nix_pkts_per_vec_brst(flags) >> 1));
+		}
+		/* STEOR1 */
+		roc_lmt_submit_steorl(wd.data[1], pa);
+	} else if (lnum) {
+		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F))
+			wd.data[0] = cn20k_nix_tx_steor_vec_data(flags);
+
+		pa = io_addr | (wd.data[0] & 0x7) << 4;
+		wd.data[0] &= ~0x7ULL;
+
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+			wd.data[0] <<= 16;
+
+		wd.data[0] |= ((uint64_t)(lnum - 1)) << 12;
+		wd.data[0] |= (uint64_t)lmt_id;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, burst);
+		/* STEOR0 */
+		roc_lmt_submit_steorl(wd.data[0], pa);
+	}
+
+	rte_io_wmb();
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && !txq->tx_compl.ena) {
+		cn20k_nix_free_extmbuf(extm);
+		extm = NULL;
+	}
+
+	if (left)
+		goto again;
+
+	if (unlikely(scalar))
+		pkts += cn20k_nix_xmit_pkts(tx_queue, ws, tx_pkts, scalar, cmd, flags);
+	return pkts;
+}
+
+#else
+static __rte_always_inline uint16_t
+cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, uint16_t pkts,
+			   uint64_t *cmd, const uint16_t flags)
+{
+	RTE_SET_USED(ws);
+	RTE_SET_USED(tx_queue);
+	RTE_SET_USED(tx_pkts);
+	RTE_SET_USED(pkts);
+	RTE_SET_USED(cmd);
+	RTE_SET_USED(flags);
+	return 0;
+}
+#endif
+
 #define L3L4CSUM_F   NIX_TX_OFFLOAD_L3_L4_CSUM_F
 #define OL3OL4CSUM_F NIX_TX_OFFLOAD_OL3_OL4_CSUM_F
 #define VLAN_F	     NIX_TX_OFFLOAD_VLAN_QINQ_F
@@ -1567,10 +3003,11 @@ NIX_TX_FASTPATH_MODES
 	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
 					     uint16_t pkts)                                        \
 	{                                                                                          \
-		RTE_SET_USED(tx_queue);                                                            \
-		RTE_SET_USED(tx_pkts);                                                             \
-		RTE_SET_USED(pkts);                                                                \
-		return 0;                                                                          \
+		uint64_t cmd[sz];                                                                  \
+		/* For TSO inner checksum is a must */                                             \
+		if (((flags) & NIX_TX_OFFLOAD_TSO_F) && !((flags) & NIX_TX_OFFLOAD_L3_L4_CSUM_F))  \
+			return 0;                                                                  \
+		return cn20k_nix_xmit_pkts_vector(tx_queue, NULL, tx_pkts, pkts, cmd, (flags));    \
 	}
 
 #define NIX_TX_XMIT_VEC_MSEG(fn, sz, flags)                                                        \
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v2 18/18] net/cnxk: support Tx multi-seg in vector for cn20k
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
                     ` (16 preceding siblings ...)
  2024-09-26 16:01   ` [PATCH v2 17/18] net/cnxk: support Tx burst vector for cn20k Nithin Dabilpuram
@ 2024-09-26 16:01   ` Nithin Dabilpuram
  2024-10-01 11:01   ` [PATCH v2 00/18] add Marvell cn20k SOC support for mempool and net Jerin Jacob
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-09-26 16:01 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rahul Bhansali, Pavan Nikhilesh

Add Tx multi-seg support for cn20k.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 drivers/net/cnxk/cn20k_tx.h | 485 ++++++++++++++++++++++++++++++++++--
 1 file changed, 463 insertions(+), 22 deletions(-)

diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
index 05c8b80fcb..9b6a2e62bd 100644
--- a/drivers/net/cnxk/cn20k_tx.h
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -1717,8 +1717,301 @@ cn20k_nix_xmit_pkts_mseg(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts
 
 #if defined(RTE_ARCH_ARM64)
 
+static __rte_always_inline void
+cn20k_nix_prepare_tso(struct rte_mbuf *m, union nix_send_hdr_w1_u *w1, union nix_send_ext_w0_u *w0,
+		      uint64_t ol_flags, const uint64_t flags, const uint64_t lso_tun_fmt)
+{
+	uint16_t lso_sb;
+	uint64_t mask;
+
+	if (!(ol_flags & RTE_MBUF_F_TX_TCP_SEG))
+		return;
+
+	mask = -(!w1->il3type);
+	lso_sb = (mask & w1->ol4ptr) + (~mask & w1->il4ptr) + m->l4_len;
+
+	w0->u |= BIT(14);
+	w0->lso_sb = lso_sb;
+	w0->lso_mps = m->tso_segsz;
+	w0->lso_format = NIX_LSO_FORMAT_IDX_TSOV4 + !!(ol_flags & RTE_MBUF_F_TX_IPV6);
+	w1->ol4type = NIX_SENDL4TYPE_TCP_CKSUM;
+
+	/* Handle tunnel tso */
+	if ((flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) && (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)) {
+		const uint8_t is_udp_tun = (CNXK_NIX_UDP_TUN_BITMASK >>
+					    ((ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) >> 45)) &
+					   0x1;
+		uint8_t shift = is_udp_tun ? 32 : 0;
+
+		shift += (!!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV6) << 4);
+		shift += (!!(ol_flags & RTE_MBUF_F_TX_IPV6) << 3);
+
+		w1->il4type = NIX_SENDL4TYPE_TCP_CKSUM;
+		w1->ol4type = is_udp_tun ? NIX_SENDL4TYPE_UDP_CKSUM : 0;
+		/* Update format for UDP tunneled packet */
+
+		w0->lso_format = (lso_tun_fmt >> shift);
+	}
+}
+
+static __rte_always_inline uint16_t
+cn20k_nix_prepare_mseg_vec_noff(struct cn20k_eth_txq *txq, struct rte_mbuf *m,
+				struct rte_mbuf **extm, uint64_t *cmd, uint64x2_t *cmd0,
+				uint64x2_t *cmd1, uint64x2_t *cmd2, uint64x2_t *cmd3,
+				const uint32_t flags)
+{
+	uint16_t segdw;
+
+	vst1q_u64(cmd, *cmd0); /* Send hdr */
+	if (flags & NIX_TX_NEED_EXT_HDR) {
+		vst1q_u64(cmd + 2, *cmd2); /* ext hdr */
+		vst1q_u64(cmd + 4, *cmd1); /* sg */
+	} else {
+		vst1q_u64(cmd + 2, *cmd1); /* sg */
+	}
+
+	segdw = cn20k_nix_prepare_mseg(txq, m, extm, cmd, flags);
+
+	if (flags & NIX_TX_OFFLOAD_TSTAMP_F)
+		vst1q_u64(cmd + segdw * 2 - 2, *cmd3);
+
+	return segdw;
+}
+
+static __rte_always_inline void
+cn20k_nix_prepare_mseg_vec_list(struct rte_mbuf *m, uint64_t *cmd, union nix_send_hdr_w0_u *sh,
+				union nix_send_sg_s *sg, const uint32_t flags)
+{
+	struct rte_mbuf *m_next;
+	uint64_t ol_flags, len;
+	uint64_t *slist, sg_u;
+	uint16_t nb_segs;
+	uint64_t dlen;
+	int i = 1;
+
+	len = m->pkt_len;
+	ol_flags = m->ol_flags;
+	/* For security we would have already populated the right length */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F && ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD)
+		len = sh->total;
+	sh->total = len;
+	/* Clear sg->u header before use */
+	sg->u &= 0xFC00000000000000;
+	sg_u = sg->u;
+	slist = &cmd[0];
+
+	dlen = m->data_len;
+	len -= dlen;
+	sg_u = sg_u | ((uint64_t)dlen);
+
+	/* Mark mempool object as "put" since it is freed by NIX */
+	RTE_MEMPOOL_CHECK_COOKIES(m->pool, (void **)&m, 1, 0);
+
+	nb_segs = m->nb_segs - 1;
+	m_next = m->next;
+	m->next = NULL;
+	m->nb_segs = 1;
+	m = m_next;
+	/* Fill mbuf segments */
+	do {
+		m_next = m->next;
+		dlen = m->data_len;
+		len -= dlen;
+		sg_u = sg_u | ((uint64_t)dlen << (i << 4));
+		*slist = rte_mbuf_data_iova(m);
+		slist++;
+		i++;
+		nb_segs--;
+		if (i > 2 && nb_segs) {
+			i = 0;
+			/* Next SG subdesc */
+			*(uint64_t *)slist = sg_u & 0xFC00000000000000;
+			sg->u = sg_u;
+			sg->segs = 3;
+			sg = (union nix_send_sg_s *)slist;
+			sg_u = sg->u;
+			slist++;
+		}
+		m->next = NULL;
+		/* Mark mempool object as "put" since it is freed by NIX */
+		RTE_MEMPOOL_CHECK_COOKIES(m->pool, (void **)&m, 1, 0);
+
+		m = m_next;
+	} while (nb_segs);
+
+	/* Add remaining bytes of security data to last seg */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F && ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD && len) {
+		uint8_t shft = ((i - 1) << 4);
+
+		dlen = ((sg_u >> shft) & 0xFFFF) + len;
+		sg_u = sg_u & ~(0xFFFFULL << shft);
+		sg_u |= dlen << shft;
+	}
+	sg->u = sg_u;
+	sg->segs = i;
+}
+
+static __rte_always_inline void
+cn20k_nix_prepare_mseg_vec(struct rte_mbuf *m, uint64_t *cmd, uint64x2_t *cmd0, uint64x2_t *cmd1,
+			   const uint8_t segdw, const uint32_t flags)
+{
+	union nix_send_hdr_w0_u sh;
+	union nix_send_sg_s sg;
+
+	if (m->nb_segs == 1) {
+		/* Mark mempool object as "put" since it is freed by NIX */
+		RTE_MEMPOOL_CHECK_COOKIES(m->pool, (void **)&m, 1, 0);
+		return;
+	}
+
+	sh.u = vgetq_lane_u64(cmd0[0], 0);
+	sg.u = vgetq_lane_u64(cmd1[0], 0);
+
+	cn20k_nix_prepare_mseg_vec_list(m, cmd, &sh, &sg, flags);
+
+	sh.sizem1 = segdw - 1;
+	cmd0[0] = vsetq_lane_u64(sh.u, cmd0[0], 0);
+	cmd1[0] = vsetq_lane_u64(sg.u, cmd1[0], 0);
+}
+
 #define NIX_DESCS_PER_LOOP 4
 
+static __rte_always_inline uint8_t
+cn20k_nix_prep_lmt_mseg_vector(struct cn20k_eth_txq *txq, struct rte_mbuf **mbufs,
+			       struct rte_mbuf **extm, uint64x2_t *cmd0, uint64x2_t *cmd1,
+			       uint64x2_t *cmd2, uint64x2_t *cmd3, uint8_t *segdw,
+			       uint64_t *lmt_addr, __uint128_t *data128, uint8_t *shift,
+			       const uint16_t flags)
+{
+	uint8_t j, off, lmt_used = 0;
+
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) {
+		off = 0;
+		for (j = 0; j < NIX_DESCS_PER_LOOP; j++) {
+			if (off + segdw[j] > 8) {
+				*data128 |= ((__uint128_t)off - 1) << *shift;
+				*shift += 3;
+				lmt_used++;
+				lmt_addr += 16;
+				off = 0;
+			}
+			off += cn20k_nix_prepare_mseg_vec_noff(txq, mbufs[j], extm,
+							       lmt_addr + off * 2, &cmd0[j],
+							       &cmd1[j], &cmd2[j], &cmd3[j], flags);
+		}
+		*data128 |= ((__uint128_t)off - 1) << *shift;
+		*shift += 3;
+		lmt_used++;
+		return lmt_used;
+	}
+
+	if (!(flags & NIX_TX_NEED_EXT_HDR) && !(flags & NIX_TX_OFFLOAD_TSTAMP_F)) {
+		/* No segments in 4 consecutive packets. */
+		if ((segdw[0] + segdw[1] + segdw[2] + segdw[3]) <= 8) {
+			vst1q_u64(lmt_addr, cmd0[0]);
+			vst1q_u64(lmt_addr + 2, cmd1[0]);
+			vst1q_u64(lmt_addr + 4, cmd0[1]);
+			vst1q_u64(lmt_addr + 6, cmd1[1]);
+			vst1q_u64(lmt_addr + 8, cmd0[2]);
+			vst1q_u64(lmt_addr + 10, cmd1[2]);
+			vst1q_u64(lmt_addr + 12, cmd0[3]);
+			vst1q_u64(lmt_addr + 14, cmd1[3]);
+
+			*data128 |= ((__uint128_t)7) << *shift;
+			*shift += 3;
+
+			/* Mark mempool object as "put" since it is freed by NIX */
+			RTE_MEMPOOL_CHECK_COOKIES(mbufs[0]->pool, (void **)&mbufs[0], 1, 0);
+			RTE_MEMPOOL_CHECK_COOKIES(mbufs[1]->pool, (void **)&mbufs[1], 1, 0);
+			RTE_MEMPOOL_CHECK_COOKIES(mbufs[2]->pool, (void **)&mbufs[2], 1, 0);
+			RTE_MEMPOOL_CHECK_COOKIES(mbufs[3]->pool, (void **)&mbufs[3], 1, 0);
+			return 1;
+		}
+	}
+
+	for (j = 0; j < NIX_DESCS_PER_LOOP;) {
+		/* Fit consecutive packets in same LMTLINE. */
+		if ((segdw[j] + segdw[j + 1]) <= 8) {
+			if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+				/* TSTAMP takes 4 each, no segs. */
+				vst1q_u64(lmt_addr, cmd0[j]);
+				vst1q_u64(lmt_addr + 2, cmd2[j]);
+				vst1q_u64(lmt_addr + 4, cmd1[j]);
+				vst1q_u64(lmt_addr + 6, cmd3[j]);
+
+				vst1q_u64(lmt_addr + 8, cmd0[j + 1]);
+				vst1q_u64(lmt_addr + 10, cmd2[j + 1]);
+				vst1q_u64(lmt_addr + 12, cmd1[j + 1]);
+				vst1q_u64(lmt_addr + 14, cmd3[j + 1]);
+
+				/* Mark mempool object as "put" since it is freed by NIX */
+				RTE_MEMPOOL_CHECK_COOKIES(mbufs[j]->pool, (void **)&mbufs[j], 1, 0);
+				RTE_MEMPOOL_CHECK_COOKIES(mbufs[j + 1]->pool,
+							  (void **)&mbufs[j + 1], 1, 0);
+			} else if (flags & NIX_TX_NEED_EXT_HDR) {
+				/* EXT header take 3 each, space for 2 segs.*/
+				cn20k_nix_prepare_mseg_vec(mbufs[j], lmt_addr + 6, &cmd0[j],
+							   &cmd1[j], segdw[j], flags);
+				vst1q_u64(lmt_addr, cmd0[j]);
+				vst1q_u64(lmt_addr + 2, cmd2[j]);
+				vst1q_u64(lmt_addr + 4, cmd1[j]);
+				off = segdw[j] - 3;
+				off <<= 1;
+				cn20k_nix_prepare_mseg_vec(mbufs[j + 1], lmt_addr + 12 + off,
+							   &cmd0[j + 1], &cmd1[j + 1], segdw[j + 1],
+							   flags);
+				vst1q_u64(lmt_addr + 6 + off, cmd0[j + 1]);
+				vst1q_u64(lmt_addr + 8 + off, cmd2[j + 1]);
+				vst1q_u64(lmt_addr + 10 + off, cmd1[j + 1]);
+			} else {
+				cn20k_nix_prepare_mseg_vec(mbufs[j], lmt_addr + 4, &cmd0[j],
+							   &cmd1[j], segdw[j], flags);
+				vst1q_u64(lmt_addr, cmd0[j]);
+				vst1q_u64(lmt_addr + 2, cmd1[j]);
+				off = segdw[j] - 2;
+				off <<= 1;
+				cn20k_nix_prepare_mseg_vec(mbufs[j + 1], lmt_addr + 8 + off,
+							   &cmd0[j + 1], &cmd1[j + 1], segdw[j + 1],
+							   flags);
+				vst1q_u64(lmt_addr + 4 + off, cmd0[j + 1]);
+				vst1q_u64(lmt_addr + 6 + off, cmd1[j + 1]);
+			}
+			*data128 |= ((__uint128_t)(segdw[j] + segdw[j + 1]) - 1) << *shift;
+			*shift += 3;
+			j += 2;
+		} else {
+			if ((flags & NIX_TX_NEED_EXT_HDR) && (flags & NIX_TX_OFFLOAD_TSTAMP_F)) {
+				cn20k_nix_prepare_mseg_vec(mbufs[j], lmt_addr + 6, &cmd0[j],
+							   &cmd1[j], segdw[j], flags);
+				vst1q_u64(lmt_addr, cmd0[j]);
+				vst1q_u64(lmt_addr + 2, cmd2[j]);
+				vst1q_u64(lmt_addr + 4, cmd1[j]);
+				off = segdw[j] - 4;
+				off <<= 1;
+				vst1q_u64(lmt_addr + 6 + off, cmd3[j]);
+			} else if (flags & NIX_TX_NEED_EXT_HDR) {
+				cn20k_nix_prepare_mseg_vec(mbufs[j], lmt_addr + 6, &cmd0[j],
+							   &cmd1[j], segdw[j], flags);
+				vst1q_u64(lmt_addr, cmd0[j]);
+				vst1q_u64(lmt_addr + 2, cmd2[j]);
+				vst1q_u64(lmt_addr + 4, cmd1[j]);
+			} else {
+				cn20k_nix_prepare_mseg_vec(mbufs[j], lmt_addr + 4, &cmd0[j],
+							   &cmd1[j], segdw[j], flags);
+				vst1q_u64(lmt_addr, cmd0[j]);
+				vst1q_u64(lmt_addr + 2, cmd1[j]);
+			}
+			*data128 |= ((__uint128_t)(segdw[j]) - 1) << *shift;
+			*shift += 3;
+			j++;
+		}
+		lmt_used++;
+		lmt_addr += 16;
+	}
+
+	return lmt_used;
+}
+
 static __rte_always_inline void
 cn20k_nix_lmt_next(uint8_t dw, uintptr_t laddr, uint8_t *lnum, uint8_t *loff, uint8_t *shift,
 		   __uint128_t *data128, uintptr_t *next)
@@ -1740,12 +2033,36 @@ cn20k_nix_xmit_store(struct cn20k_eth_txq *txq, struct rte_mbuf *mbuf, struct rt
 		     uint8_t segdw, uintptr_t laddr, uint64x2_t cmd0, uint64x2_t cmd1,
 		     uint64x2_t cmd2, uint64x2_t cmd3, const uint16_t flags)
 {
-	RTE_SET_USED(txq);
-	RTE_SET_USED(mbuf);
-	RTE_SET_USED(extm);
-	RTE_SET_USED(segdw);
+	uint8_t off;
 
-	if (flags & NIX_TX_NEED_EXT_HDR) {
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) {
+		cn20k_nix_prepare_mseg_vec_noff(txq, mbuf, extm, LMT_OFF(laddr, 0, 0), &cmd0, &cmd1,
+						&cmd2, &cmd3, flags);
+		return;
+	}
+	if (flags & NIX_TX_MULTI_SEG_F) {
+		if ((flags & NIX_TX_NEED_EXT_HDR) && (flags & NIX_TX_OFFLOAD_TSTAMP_F)) {
+			cn20k_nix_prepare_mseg_vec(mbuf, LMT_OFF(laddr, 0, 48), &cmd0, &cmd1, segdw,
+						   flags);
+			vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
+			vst1q_u64(LMT_OFF(laddr, 0, 16), cmd2);
+			vst1q_u64(LMT_OFF(laddr, 0, 32), cmd1);
+			off = segdw - 4;
+			off <<= 4;
+			vst1q_u64(LMT_OFF(laddr, 0, 48 + off), cmd3);
+		} else if (flags & NIX_TX_NEED_EXT_HDR) {
+			cn20k_nix_prepare_mseg_vec(mbuf, LMT_OFF(laddr, 0, 48), &cmd0, &cmd1, segdw,
+						   flags);
+			vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
+			vst1q_u64(LMT_OFF(laddr, 0, 16), cmd2);
+			vst1q_u64(LMT_OFF(laddr, 0, 32), cmd1);
+		} else {
+			cn20k_nix_prepare_mseg_vec(mbuf, LMT_OFF(laddr, 0, 32), &cmd0, &cmd1, segdw,
+						   flags);
+			vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
+			vst1q_u64(LMT_OFF(laddr, 0, 16), cmd1);
+		}
+	} else if (flags & NIX_TX_NEED_EXT_HDR) {
 		/* Store the prepared send desc to LMT lines */
 		if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
 			vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
@@ -1814,6 +2131,12 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 		pkts = RTE_ALIGN_FLOOR(pkts, NIX_DESCS_PER_LOOP);
 	}
 
+	/* Perform header writes before barrier for TSO */
+	if (flags & NIX_TX_OFFLOAD_TSO_F) {
+		for (i = 0; i < pkts; i++)
+			cn20k_nix_xmit_prepare_tso(tx_pkts[i], flags);
+	}
+
 	if (!(flags & NIX_TX_VWQE_F)) {
 		senddesc01_w0 = vld1q_dup_u64(&txq->send_hdr_w0);
 	} else {
@@ -1866,7 +2189,7 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 	/* Number of packets to prepare depends on offloads enabled. */
 	burst = left > cn20k_nix_pkts_per_vec_brst(flags) ? cn20k_nix_pkts_per_vec_brst(flags) :
 							    left;
-	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+	if (flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F)) {
 		wd.data128 = 0;
 		shift = 16;
 	}
@@ -1885,6 +2208,54 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 			break;
 		}
 
+		if (flags & NIX_TX_MULTI_SEG_F) {
+			uint8_t j;
+
+			for (j = 0; j < NIX_DESCS_PER_LOOP; j++) {
+				struct rte_mbuf *m = tx_pkts[j];
+
+				cn20k_nix_tx_mbuf_validate(m, flags);
+
+				/* Get dwords based on nb_segs. */
+				if (!(flags & NIX_TX_OFFLOAD_MBUF_NOFF_F &&
+				      flags & NIX_TX_MULTI_SEG_F))
+					segdw[j] = NIX_NB_SEGS_TO_SEGDW(m->nb_segs);
+				else
+					segdw[j] = cn20k_nix_mbuf_sg_dwords(m);
+
+				/* Add dwords based on offloads. */
+				segdw[j] += 1 + /* SEND HDR */
+					    !!(flags & NIX_TX_NEED_EXT_HDR) +
+					    !!(flags & NIX_TX_OFFLOAD_TSTAMP_F);
+			}
+
+			/* Check if there are enough LMTLINES for this loop.
+			 * Consider previous line to be partial.
+			 */
+			if (lnum + 4 >= 32) {
+				uint8_t ldwords_con = 0, lneeded = 0;
+
+				if ((loff >> 4) + segdw[0] > 8) {
+					lneeded += 1;
+					ldwords_con = segdw[0];
+				} else {
+					ldwords_con = (loff >> 4) + segdw[0];
+				}
+
+				for (j = 1; j < NIX_DESCS_PER_LOOP; j++) {
+					ldwords_con += segdw[j];
+					if (ldwords_con > 8) {
+						lneeded += 1;
+						ldwords_con = segdw[j];
+					}
+				}
+				lneeded += 1;
+				if (lnum + lneeded > 32) {
+					burst = i;
+					break;
+				}
+			}
+		}
 		/* Clear lower 32bit of SEND_HDR_W0 and SEND_SG_W0 */
 		senddesc01_w0 = vbicq_u64(senddesc01_w0, vdupq_n_u64(0x800FFFFFFFF));
 		sgdesc01_w0 = vbicq_u64(sgdesc01_w0, vdupq_n_u64(0xFFFFFFFF));
@@ -1907,6 +2278,12 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 			sendmem23_w1 = sendmem01_w1;
 		}
 
+		if (flags & NIX_TX_OFFLOAD_TSO_F) {
+			/* Clear the LSO enable bit. */
+			sendext01_w0 = vbicq_u64(sendext01_w0, vdupq_n_u64(BIT_ULL(14)));
+			sendext23_w0 = sendext01_w0;
+		}
+
 		/* Move mbufs to iova */
 		mbuf0 = (uint64_t *)tx_pkts[0];
 		mbuf1 = (uint64_t *)tx_pkts[1];
@@ -2512,7 +2889,49 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 			cmd3[3] = vzip2q_u64(sendmem23_w0, sendmem23_w1);
 		}
 
-		if ((flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) && !(flags & NIX_TX_OFFLOAD_SECURITY_F)) {
+		if (flags & NIX_TX_OFFLOAD_TSO_F) {
+			const uint64_t lso_fmt = txq->lso_tun_fmt;
+			uint64_t sx_w0[NIX_DESCS_PER_LOOP];
+			uint64_t sd_w1[NIX_DESCS_PER_LOOP];
+
+			/* Extract SD W1 as we need to set L4 types. */
+			vst1q_u64(sd_w1, senddesc01_w1);
+			vst1q_u64(sd_w1 + 2, senddesc23_w1);
+
+			/* Extract SX W0 as we need to set LSO fields. */
+			vst1q_u64(sx_w0, sendext01_w0);
+			vst1q_u64(sx_w0 + 2, sendext23_w0);
+
+			/* Extract ol_flags. */
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			/* Prepare individual mbufs. */
+			cn20k_nix_prepare_tso(tx_pkts[0], (union nix_send_hdr_w1_u *)&sd_w1[0],
+					      (union nix_send_ext_w0_u *)&sx_w0[0],
+					      vgetq_lane_u64(xtmp128, 0), flags, lso_fmt);
+
+			cn20k_nix_prepare_tso(tx_pkts[1], (union nix_send_hdr_w1_u *)&sd_w1[1],
+					      (union nix_send_ext_w0_u *)&sx_w0[1],
+					      vgetq_lane_u64(xtmp128, 1), flags, lso_fmt);
+
+			cn20k_nix_prepare_tso(tx_pkts[2], (union nix_send_hdr_w1_u *)&sd_w1[2],
+					      (union nix_send_ext_w0_u *)&sx_w0[2],
+					      vgetq_lane_u64(ytmp128, 0), flags, lso_fmt);
+
+			cn20k_nix_prepare_tso(tx_pkts[3], (union nix_send_hdr_w1_u *)&sd_w1[3],
+					      (union nix_send_ext_w0_u *)&sx_w0[3],
+					      vgetq_lane_u64(ytmp128, 1), flags, lso_fmt);
+
+			senddesc01_w1 = vld1q_u64(sd_w1);
+			senddesc23_w1 = vld1q_u64(sd_w1 + 2);
+
+			sendext01_w0 = vld1q_u64(sx_w0);
+			sendext23_w0 = vld1q_u64(sx_w0 + 2);
+		}
+
+		if ((flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) && !(flags & NIX_TX_MULTI_SEG_F) &&
+		    !(flags & NIX_TX_OFFLOAD_SECURITY_F)) {
 			/* Set don't free bit if reference count > 1 */
 			cn20k_nix_prefree_seg_vec(tx_pkts, &extm, txq, &senddesc01_w0,
 						  &senddesc23_w0, &senddesc01_w1, &senddesc23_w1);
@@ -2626,6 +3045,15 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 			cn20k_nix_xmit_store(txq, tx_pkts[3], &extm, segdw[3], next, cmd0[3],
 					     cmd1[3], cmd2[3], cmd3[3], flags);
 
+		} else if (flags & NIX_TX_MULTI_SEG_F) {
+			uint8_t j;
+
+			segdw[4] = 8;
+			j = cn20k_nix_prep_lmt_mseg_vector(txq, tx_pkts, &extm, cmd0, cmd1, cmd2,
+							   cmd3, segdw,
+							   (uint64_t *)LMT_OFF(laddr, lnum, 0),
+							   &wd.data128, &shift, flags);
+			lnum += j;
 		} else if (flags & NIX_TX_NEED_EXT_HDR) {
 			/* Store the prepared send desc to LMT lines */
 			if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
@@ -2684,7 +3112,7 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 		wd.data128 = wd.data128 | (((__uint128_t)(((loff >> 4) - 1) & 0x7) << shift));
 	}
 
-	if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+	if (flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F))
 		wd.data[0] >>= 16;
 
 	if ((flags & NIX_TX_VWQE_F) && !(ws[3] & BIT_ULL(35)))
@@ -2704,13 +3132,13 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 
 	/* Trigger LMTST */
 	if (lnum > 16) {
-		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F))
+		if (!(flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F)))
 			wd.data[0] = cn20k_nix_tx_steor_vec_data(flags);
 
 		pa = io_addr | (wd.data[0] & 0x7) << 4;
 		wd.data[0] &= ~0x7ULL;
 
-		if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+		if (flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F))
 			wd.data[0] <<= 16;
 
 		wd.data[0] |= (15ULL << 12);
@@ -2721,32 +3149,38 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 		/* STEOR0 */
 		roc_lmt_submit_steorl(wd.data[0], pa);
 
-		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F))
+		if (!(flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F)))
 			wd.data[1] = cn20k_nix_tx_steor_vec_data(flags);
 
 		pa = io_addr | (wd.data[1] & 0x7) << 4;
 		wd.data[1] &= ~0x7ULL;
 
-		if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+		if (flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F))
 			wd.data[1] <<= 16;
 
 		wd.data[1] |= ((uint64_t)(lnum - 17)) << 12;
 		wd.data[1] |= (uint64_t)(lmt_id + 16);
 
 		if (flags & NIX_TX_VWQE_F) {
-			cn20k_nix_vwqe_wait_fc(txq,
-					       burst - (cn20k_nix_pkts_per_vec_brst(flags) >> 1));
+			if (flags & NIX_TX_MULTI_SEG_F) {
+				if (burst - (cn20k_nix_pkts_per_vec_brst(flags) >> 1) > 0)
+					cn20k_nix_vwqe_wait_fc(txq,
+						burst - (cn20k_nix_pkts_per_vec_brst(flags) >> 1));
+			} else {
+				cn20k_nix_vwqe_wait_fc(txq,
+						burst - (cn20k_nix_pkts_per_vec_brst(flags) >> 1));
+			}
 		}
 		/* STEOR1 */
 		roc_lmt_submit_steorl(wd.data[1], pa);
 	} else if (lnum) {
-		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F))
+		if (!(flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F)))
 			wd.data[0] = cn20k_nix_tx_steor_vec_data(flags);
 
 		pa = io_addr | (wd.data[0] & 0x7) << 4;
 		wd.data[0] &= ~0x7ULL;
 
-		if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+		if (flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F))
 			wd.data[0] <<= 16;
 
 		wd.data[0] |= ((uint64_t)(lnum - 1)) << 12;
@@ -2767,8 +3201,13 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 	if (left)
 		goto again;
 
-	if (unlikely(scalar))
-		pkts += cn20k_nix_xmit_pkts(tx_queue, ws, tx_pkts, scalar, cmd, flags);
+	if (unlikely(scalar)) {
+		if (flags & NIX_TX_MULTI_SEG_F)
+			pkts += cn20k_nix_xmit_pkts_mseg(tx_queue, ws, tx_pkts, scalar, cmd, flags);
+		else
+			pkts += cn20k_nix_xmit_pkts(tx_queue, ws, tx_pkts, scalar, cmd, flags);
+	}
+
 	return pkts;
 }
 
@@ -3014,10 +3453,12 @@ NIX_TX_FASTPATH_MODES
 	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
 					     uint16_t pkts)                                        \
 	{                                                                                          \
-		RTE_SET_USED(tx_queue);                                                            \
-		RTE_SET_USED(tx_pkts);                                                             \
-		RTE_SET_USED(pkts);                                                                \
-		return 0;                                                                          \
+		uint64_t cmd[(sz) + CNXK_NIX_TX_MSEG_SG_DWORDS - 2];                               \
+		/* For TSO inner checksum is a must */                                             \
+		if (((flags) & NIX_TX_OFFLOAD_TSO_F) && !((flags) & NIX_TX_OFFLOAD_L3_L4_CSUM_F))  \
+			return 0;                                                                  \
+		return cn20k_nix_xmit_pkts_vector(tx_queue, NULL, tx_pkts, pkts, cmd,              \
+						  (flags) | NIX_TX_MULTI_SEG_F);                   \
 	}
 
 uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_all_offload(void *tx_queue,
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v2 00/18] add Marvell cn20k SOC support for mempool and net
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
                     ` (17 preceding siblings ...)
  2024-09-26 16:01   ` [PATCH v2 18/18] net/cnxk: support Tx multi-seg in " Nithin Dabilpuram
@ 2024-10-01 11:01   ` Jerin Jacob
  18 siblings, 0 replies; 75+ messages in thread
From: Jerin Jacob @ 2024-10-01 11:01 UTC (permalink / raw)
  To: Nithin Dabilpuram; +Cc: jerinj, dev

On Fri, Sep 27, 2024 at 9:19 AM Nithin Dabilpuram
<ndabilpuram@marvell.com> wrote:
>
> This series adds support for Marvell cn20k SOC for mempool and
> net PMD's.
>
> This series also adds few net/cnxk PMD updates to expose IPsec
> features supported by HW that are very custom in nature and
> some enhancements for cn10k.
>
>
> Ashwin Sekhar T K (4):
>   mempool/cnxk: add cn20k PCI device ids
>   common/cnxk: accommodate change in aura field width
>   common/cnxk: use new NPA aq enq mbox for cn20k
>   mempool/cnxk: initialize mempool ops for cn20k
>
> Nithin Dabilpuram (9):
>   net/cnxk: add cn20k base control path support
>   net/cnxk: support Rx function select for cn20k
>   net/cnxk: support Tx function select for cn20k
>   net/cnxk: support Rx burst scalar for cn20k
>   net/cnxk: support Rx burst vector for cn20k
>   net/cnxk: support Tx burst scalar for cn20k
>   net/cnxk: support Tx multi-seg in cn20k
>   net/cnxk: support Tx burst vector for cn20k
>   net/cnxk: support Tx multi-seg in vector for cn20k
>
> Satha Rao (5):
>   common/cnxk: add cn20k NIX register definitions
>   common/cnxk: support NIX queue config for cn20k
>   common/cnxk: support bandwidth profile for cn20k
>   common/cnxk: support NIX debug for cn20k
>   common/cnxk: add RSS support for cn20k
>
> v2:
> - Moved out 15 patches to seperate series
> - Updated release notes

Good to merge next version.

1)Please consider changing release note update as following
+* **Updated Marvell cnxk mempool driver.**
+
+  * Added mempool driver support for CN20K SoC.
+
+* **Updated Marvell cnxk net driver.**
+
+  * Added ethdev driver support for CN20K SoC.
+

2) Please fix following genuine check path issues.



CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#109: FILE: drivers/net/cnxk/cn20k_ethdev.c:75:
+       txq->tx_compl.ptr = (struct rte_mbuf **)plt_zmalloc(

WARNING:CONSTANT_COMPARISON: Comparisons should place the constant on
the right side of the test
#183: FILE: drivers/net/cnxk/cn20k_ethdev.c:149:
+               PLT_STATIC_ASSERT(ROC_NIX_INL_SA_BASE_ALIGN == BIT_ULL(16));




WARNING:MACRO_ARG_UNUSED: Argument 'flags' is not used in function-like macro
#341: FILE: drivers/net/cnxk/cn20k_rx.h:219:
+#define NIX_RX_RECV(fn, flags)
                             \
+       uint16_t __rte_noinline __rte_hot fn(void *rx_queue, struct
rte_mbuf **rx_pkts,            \
+                                            uint16_t pkts)
                            \
+       {
                            \
+               RTE_SET_USED(rx_queue);
                            \
+               RTE_SET_USED(rx_pkts);
                            \
+               RTE_SET_USED(pkts);
                            \
+               return 0;
                            \
+       }

WARNING:MACRO_WITH_FLOW_CONTROL: Macros with flow control statements
should be avoided
#341: FILE: drivers/net/cnxk/cn20k_rx.h:219:
+#define NIX_RX_RECV(fn, flags)
                             \
+       uint16_t __rte_noinline __rte_hot fn(void *rx_queue, struct
rte_mbuf **rx_pkts,            \
+                                            uint16_t pkts)
                            \
+       {
                            \
+               RTE_SET_USED(rx_queue);
                            \
+               RTE_SET_USED(rx_pkts);
                            \
+               RTE_SET_USED(pkts);
                            \
+               return 0;
                            \
+       }

CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'flags' may be better as
'(flags)' to avoid precedence issues
#351: FILE: drivers/net/cnxk/cn20k_rx.h:229:
+#define NIX_RX_RECV_MSEG(fn, flags) NIX_RX_RECV(fn, flags | NIX_RX_MULTI_SEG_F)




CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#1470: FILE: drivers/net/cnxk/rx/cn20k/rx_all_offload.c:27:
+       return cn20k_nix_recv_pkts_vector(

CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#1482: FILE: drivers/net/cnxk/rx/cn20k/rx_all_offload.c:39:
+       return cn20k_nix_recv_pkts(

CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#1492: FILE: drivers/net/cnxk/rx/cn20k/rx_all_offload.c:49:
+       return cn20k_nix_recv_pkts_vector(

WARNING:MACRO_ARG_UNUSED: Argument 'sz' is not used in function-like macro
#343: FILE: drivers/net/cnxk/cn20k_tx.h:211:
+#define T(name, sz, flags)
                             \
+       uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_##name(
                            \
+               void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t
pkts);                         \
+       uint16_t __rte_noinline __rte_hot
cn20k_nix_xmit_pkts_mseg_##name(                         \
+               void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t
pkts);                         \
+       uint16_t __rte_noinline __rte_hot
cn20k_nix_xmit_pkts_vec_##name(                          \
+               void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t
pkts);                         \
+       uint16_t __rte_noinline __rte_hot
cn20k_nix_xmit_pkts_vec_mseg_##name(                     \
+               void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts);

WARNING:MACRO_ARG_UNUSED: Argument 'flags' is not used in function-like macro
#343: FILE: drivers/net/cnxk/cn20k_tx.h:211:
+#define T(name, sz, flags)
                             \
+       uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_##name(
                            \
+               void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t
pkts);                         \
+       uint16_t __rte_noinline __rte_hot
cn20k_nix_xmit_pkts_mseg_##name(                         \
+               void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t
pkts);                         \
+       uint16_t __rte_noinline __rte_hot
cn20k_nix_xmit_pkts_vec_##name(                          \
+               void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t
pkts);                         \
+       uint16_t __rte_noinline __rte_hot
cn20k_nix_xmit_pkts_vec_mseg_##name(                     \
+               void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts);

WARNING:TRAILING_SEMICOLON: macros should not use a trailing semicolon
#343: FILE: drivers/net/cnxk/cn20k_tx.h:211:
+#define T(name, sz, flags)
                             \
+       uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_##name(
                            \
+               void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t
pkts);                         \
+       uint16_t __rte_noinline __rte_hot
cn20k_nix_xmit_pkts_mseg_##name(                         \
+               void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t
pkts);                         \
+       uint16_t __rte_noinline __rte_hot
cn20k_nix_xmit_pkts_vec_##name(                          \
+               void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t
pkts);                         \
+       uint16_t __rte_noinline __rte_hot
cn20k_nix_xmit_pkts_vec_mseg_##name(                     \
+               void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts);

WARNING:MACRO_ARG_UNUSED: Argument 'sz' is not used in function-like macro
#356: FILE: drivers/net/cnxk/cn20k_tx.h:224:
+#define NIX_TX_XMIT(fn, sz, flags)
                             \
+       uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct
rte_mbuf **tx_pkts,            \
+                                            uint16_t pkts)
                            \
+       {
                            \
+               RTE_SET_USED(tx_queue);
                            \
+               RTE_SET_USED(tx_pkts);
                            \
+               RTE_SET_USED(pkts);
                            \
+               return 0;
                            \
+       }

WARNING:MACRO_ARG_UNUSED: Argument 'flags' is not used in function-like macro
#356: FILE: drivers/net/cnxk/cn20k_tx.h:224:
+#define NIX_TX_XMIT(fn, sz, flags)
                             \
+       uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct
rte_mbuf **tx_pkts,            \
+                                            uint16_t pkts)
                            \
+       {
                            \
+               RTE_SET_USED(tx_queue);
                            \
+               RTE_SET_USED(tx_pkts);
                            \
+               RTE_SET_USED(pkts);
                            \
+               return 0;
                            \
+       }




CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#1382: FILE: drivers/net/cnxk/tx/cn20k/tx_all_offload.c:18:
+       return cn20k_nix_xmit_pkts_mseg(

CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#1395: FILE: drivers/net/cnxk/tx/cn20k/tx_all_offload.c:31:
+       return cn20k_nix_xmit_pkts_vector(




CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#145: FILE: drivers/net/cnxk/cn20k_rx.h:547:
+                                       rte_prefetch0(

CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#147: FILE: drivers/net/cnxk/cn20k_rx.h:549:
+                                       rte_prefetch0(

CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#149: FILE: drivers/net/cnxk/cn20k_rx.h:551:
+                                       rte_prefetch0(

CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#151: FILE: drivers/net/cnxk/cn20k_rx.h:553:
+                                       rte_prefetch0(

total: 0 errors, 0 warnings, 6 checks, 475 lines checked

### [PATCH] net/cnxk: support Tx burst scalar for cn20k




CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#234: FILE: drivers/net/cnxk/cn20k_tx.h:543:
+       cmd23 = vsetq_lane_u64(

^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v3 00/18] add Marvell cn20k SOC support for mempool and net
  2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
                   ` (34 preceding siblings ...)
  2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
@ 2024-10-01 12:40 ` Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 01/18] mempool/cnxk: add cn20k PCI device ids Nithin Dabilpuram
                     ` (18 more replies)
  35 siblings, 19 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-10-01 12:40 UTC (permalink / raw)
  To: jerinj; +Cc: dev, Nithin Dabilpuram

This series adds support for Marvell cn20k SOC for mempool and
net PMD's.

This series also adds few net/cnxk PMD updates to expose IPsec
features supported by HW that are very custom in nature and
some enhancements for cn10k.


Ashwin Sekhar T K (4):
  mempool/cnxk: add cn20k PCI device ids
  common/cnxk: accommodate change in aura field width
  common/cnxk: use new NPA aq enq mbox for cn20k
  mempool/cnxk: initialize mempool ops for cn20k

Nithin Dabilpuram (9):
  net/cnxk: add cn20k base control path support
  net/cnxk: support Rx function select for cn20k
  net/cnxk: support Tx function select for cn20k
  net/cnxk: support Rx burst scalar for cn20k
  net/cnxk: support Rx burst vector for cn20k
  net/cnxk: support Tx burst scalar for cn20k
  net/cnxk: support Tx multi-seg in cn20k
  net/cnxk: support Tx burst vector for cn20k
  net/cnxk: support Tx multi-seg in vector for cn20k

Satha Rao (5):
  common/cnxk: add cn20k NIX register definitions
  common/cnxk: support NIX queue config for cn20k
  common/cnxk: support bandwidth profile for cn20k
  common/cnxk: support NIX debug for cn20k
  common/cnxk: add RSS support for cn20k

v3:
- Updated release notes
- Fixed checkpatch issues that are not false positive.

v2:
- Moved out 15 patches to seperate series
- Updated release notes

 doc/guides/rel_notes/release_24_11.rst        |    8 +
 drivers/common/cnxk/cnxk_telemetry_nix.c      |  260 +-
 drivers/common/cnxk/hw/nix.h                  |  524 ++-
 drivers/common/cnxk/hw/npa.h                  |  164 +-
 drivers/common/cnxk/hw/rvu.h                  |    7 +-
 drivers/common/cnxk/roc_mbox.h                |   84 +
 drivers/common/cnxk/roc_nix.c                 |   15 +-
 drivers/common/cnxk/roc_nix_bpf.c             |  528 ++-
 drivers/common/cnxk/roc_nix_debug.c           |  243 +-
 drivers/common/cnxk/roc_nix_fc.c              |  106 +-
 drivers/common/cnxk/roc_nix_inl.c             |    2 +
 drivers/common/cnxk/roc_nix_priv.h            |    4 +-
 drivers/common/cnxk/roc_nix_queue.c           |  638 ++-
 drivers/common/cnxk/roc_nix_rss.c             |   74 +-
 drivers/common/cnxk/roc_nix_stats.c           |   55 +-
 drivers/common/cnxk/roc_nix_tm.c              |   22 +-
 drivers/common/cnxk/roc_nix_tm_ops.c          |   29 +-
 drivers/common/cnxk/roc_npa.c                 |  100 +-
 drivers/common/cnxk/roc_npa.h                 |   24 +-
 drivers/common/cnxk/roc_npa_debug.c           |   17 +-
 drivers/mempool/cnxk/cnxk_mempool.c           |    2 +
 drivers/mempool/cnxk/cnxk_mempool_ops.c       |    2 +-
 drivers/net/cnxk/cn20k_ethdev.c               |  943 +++++
 drivers/net/cnxk/cn20k_ethdev.h               |   15 +
 drivers/net/cnxk/cn20k_rx.h                   | 1100 ++++++
 drivers/net/cnxk/cn20k_rx_select.c            |  160 +
 drivers/net/cnxk/cn20k_rxtx.h                 |  245 ++
 drivers/net/cnxk/cn20k_tx.h                   | 3469 +++++++++++++++++
 drivers/net/cnxk/cn20k_tx_select.c            |  122 +
 drivers/net/cnxk/cnxk_ethdev_dp.h             |    3 +
 drivers/net/cnxk/meson.build                  |   92 +-
 drivers/net/cnxk/rx/cn20k/rx_0_15.c           |   20 +
 drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c      |   20 +
 drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c       |   20 +
 drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c  |   20 +
 drivers/net/cnxk/rx/cn20k/rx_112_127.c        |   20 +
 drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c   |   20 +
 drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c    |   20 +
 .../net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c   |   20 +
 drivers/net/cnxk/rx/cn20k/rx_16_31.c          |   20 +
 drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c     |   20 +
 drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c      |   20 +
 drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c |   20 +
 drivers/net/cnxk/rx/cn20k/rx_32_47.c          |   20 +
 drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c     |   20 +
 drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c      |   20 +
 drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c |   20 +
 drivers/net/cnxk/rx/cn20k/rx_48_63.c          |   20 +
 drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c     |   20 +
 drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c      |   20 +
 drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c |   20 +
 drivers/net/cnxk/rx/cn20k/rx_64_79.c          |   20 +
 drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c     |   20 +
 drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c      |   20 +
 drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c |   20 +
 drivers/net/cnxk/rx/cn20k/rx_80_95.c          |   20 +
 drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c     |   20 +
 drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c      |   20 +
 drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c |   20 +
 drivers/net/cnxk/rx/cn20k/rx_96_111.c         |   20 +
 drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c    |   20 +
 drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c     |   20 +
 .../net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c    |   20 +
 drivers/net/cnxk/rx/cn20k/rx_all_offload.c    |   55 +
 drivers/net/cnxk/tx/cn20k/tx_0_15.c           |   18 +
 drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c      |   18 +
 drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c       |   18 +
 drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c  |   18 +
 drivers/net/cnxk/tx/cn20k/tx_112_127.c        |   18 +
 drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c   |   18 +
 drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c    |   18 +
 .../net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c   |   18 +
 drivers/net/cnxk/tx/cn20k/tx_16_31.c          |   18 +
 drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c     |   18 +
 drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c      |   18 +
 drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c |   18 +
 drivers/net/cnxk/tx/cn20k/tx_32_47.c          |   18 +
 drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c     |   18 +
 drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c      |   18 +
 drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c |   18 +
 drivers/net/cnxk/tx/cn20k/tx_48_63.c          |   18 +
 drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c     |   18 +
 drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c      |   18 +
 drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c |   18 +
 drivers/net/cnxk/tx/cn20k/tx_64_79.c          |   18 +
 drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c     |   18 +
 drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c      |   18 +
 drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c |   18 +
 drivers/net/cnxk/tx/cn20k/tx_80_95.c          |   18 +
 drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c     |   18 +
 drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c      |   18 +
 drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c |   18 +
 drivers/net/cnxk/tx/cn20k/tx_96_111.c         |   18 +
 drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c    |   18 +
 drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c     |   18 +
 .../net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c    |   18 +
 drivers/net/cnxk/tx/cn20k/tx_all_offload.c    |   37 +
 97 files changed, 9973 insertions(+), 392 deletions(-)
 create mode 100644 drivers/net/cnxk/cn20k_ethdev.c
 create mode 100644 drivers/net/cnxk/cn20k_ethdev.h
 create mode 100644 drivers/net/cnxk/cn20k_rx.h
 create mode 100644 drivers/net/cnxk/cn20k_rx_select.c
 create mode 100644 drivers/net/cnxk/cn20k_rxtx.h
 create mode 100644 drivers/net/cnxk/cn20k_tx.h
 create mode 100644 drivers/net/cnxk/cn20k_tx_select.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_all_offload.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_all_offload.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v3 01/18] mempool/cnxk: add cn20k PCI device ids
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
@ 2024-10-01 12:40   ` Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 02/18] common/cnxk: accommodate change in aura field width Nithin Dabilpuram
                     ` (17 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-10-01 12:40 UTC (permalink / raw)
  To: jerinj, Ashwin Sekhar T K, Pavan Nikhilesh; +Cc: dev

From: Ashwin Sekhar T K <asekhar@marvell.com>

Add cn20k PCI device ids.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
---
 doc/guides/rel_notes/release_24_11.rst | 4 ++++
 drivers/mempool/cnxk/cnxk_mempool.c    | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index 0ff70d9057..edcfcaa25a 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -55,6 +55,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Updated Marvell cnxk mempool driver.**
+
+  * Added mempool driver support for CN20K SoC.
+
 
 Removed Items
 -------------
diff --git a/drivers/mempool/cnxk/cnxk_mempool.c b/drivers/mempool/cnxk/cnxk_mempool.c
index 1181b6f265..6ff11d8004 100644
--- a/drivers/mempool/cnxk/cnxk_mempool.c
+++ b/drivers/mempool/cnxk/cnxk_mempool.c
@@ -161,6 +161,7 @@ npa_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 }
 
 static const struct rte_pci_id npa_pci_map[] = {
+	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN20KA, PCI_DEVID_CNXK_RVU_NPA_PF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KA, PCI_DEVID_CNXK_RVU_NPA_PF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KAS, PCI_DEVID_CNXK_RVU_NPA_PF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KB, PCI_DEVID_CNXK_RVU_NPA_PF),
@@ -172,6 +173,7 @@ static const struct rte_pci_id npa_pci_map[] = {
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN9KD, PCI_DEVID_CNXK_RVU_NPA_PF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN9KE, PCI_DEVID_CNXK_RVU_NPA_PF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CNF9KA, PCI_DEVID_CNXK_RVU_NPA_PF),
+	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN20KA, PCI_DEVID_CNXK_RVU_NPA_VF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KA, PCI_DEVID_CNXK_RVU_NPA_VF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KAS, PCI_DEVID_CNXK_RVU_NPA_VF),
 	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN10KB, PCI_DEVID_CNXK_RVU_NPA_VF),
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v3 02/18] common/cnxk: accommodate change in aura field width
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 01/18] mempool/cnxk: add cn20k PCI device ids Nithin Dabilpuram
@ 2024-10-01 12:40   ` Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 03/18] common/cnxk: use new NPA aq enq mbox for cn20k Nithin Dabilpuram
                     ` (16 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-10-01 12:40 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Ashwin Sekhar T K

From: Ashwin Sekhar T K <asekhar@marvell.com>

Aura field width has changed from 20 bits to 17 bits in
cn20k. Adjust the bit fields accordingly for register
reads/writes.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
---
 drivers/common/cnxk/roc_npa.h | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/common/cnxk/roc_npa.h b/drivers/common/cnxk/roc_npa.h
index 4ad5f044b5..fbf75b2fca 100644
--- a/drivers/common/cnxk/roc_npa.h
+++ b/drivers/common/cnxk/roc_npa.h
@@ -16,6 +16,7 @@
 #else
 #include "roc_io_generic.h"
 #endif
+#include "roc_model.h"
 #include "roc_npa_dp.h"
 
 #define ROC_AURA_OP_LIMIT_MASK (BIT_ULL(36) - 1)
@@ -68,11 +69,12 @@ roc_npa_aura_op_alloc(uint64_t aura_handle, const int drop)
 static inline uint64_t
 roc_npa_aura_op_cnt_get(uint64_t aura_handle)
 {
-	uint64_t wdata;
+	uint64_t wdata, shift;
 	int64_t *addr;
 	uint64_t reg;
 
-	wdata = roc_npa_aura_handle_to_aura(aura_handle) << 44;
+	shift = roc_model_is_cn20k() ? 47 : 44;
+	wdata = roc_npa_aura_handle_to_aura(aura_handle) << shift;
 	addr = (int64_t *)(roc_npa_aura_handle_to_base(aura_handle) +
 			   NPA_LF_AURA_OP_CNT);
 	reg = roc_atomic64_add_nosync(wdata, addr);
@@ -87,11 +89,13 @@ static inline void
 roc_npa_aura_op_cnt_set(uint64_t aura_handle, const int sign, uint64_t count)
 {
 	uint64_t reg = count & (BIT_ULL(36) - 1);
+	uint64_t shift;
 
 	if (sign)
 		reg |= BIT_ULL(43); /* CNT_ADD */
 
-	reg |= (roc_npa_aura_handle_to_aura(aura_handle) << 44);
+	shift = roc_model_is_cn20k() ? 47 : 44;
+	reg |= (roc_npa_aura_handle_to_aura(aura_handle) << shift);
 
 	plt_write64(reg, roc_npa_aura_handle_to_base(aura_handle) +
 				 NPA_LF_AURA_OP_CNT);
@@ -100,11 +104,12 @@ roc_npa_aura_op_cnt_set(uint64_t aura_handle, const int sign, uint64_t count)
 static inline uint64_t
 roc_npa_aura_op_limit_get(uint64_t aura_handle)
 {
-	uint64_t wdata;
+	uint64_t wdata, shift;
 	int64_t *addr;
 	uint64_t reg;
 
-	wdata = roc_npa_aura_handle_to_aura(aura_handle) << 44;
+	shift = roc_model_is_cn20k() ? 47 : 44;
+	wdata = roc_npa_aura_handle_to_aura(aura_handle) << shift;
 	addr = (int64_t *)(roc_npa_aura_handle_to_base(aura_handle) +
 			   NPA_LF_AURA_OP_LIMIT);
 	reg = roc_atomic64_add_nosync(wdata, addr);
@@ -119,8 +124,10 @@ static inline void
 roc_npa_aura_op_limit_set(uint64_t aura_handle, uint64_t limit)
 {
 	uint64_t reg = limit & ROC_AURA_OP_LIMIT_MASK;
+	uint64_t shift;
 
-	reg |= (roc_npa_aura_handle_to_aura(aura_handle) << 44);
+	shift = roc_model_is_cn20k() ? 47 : 44;
+	reg |= (roc_npa_aura_handle_to_aura(aura_handle) << shift);
 
 	plt_write64(reg, roc_npa_aura_handle_to_base(aura_handle) +
 				 NPA_LF_AURA_OP_LIMIT);
@@ -129,11 +136,12 @@ roc_npa_aura_op_limit_set(uint64_t aura_handle, uint64_t limit)
 static inline uint64_t
 roc_npa_aura_op_available(uint64_t aura_handle)
 {
-	uint64_t wdata;
+	uint64_t wdata, shift;
 	uint64_t reg;
 	int64_t *addr;
 
-	wdata = roc_npa_aura_handle_to_aura(aura_handle) << 44;
+	shift = roc_model_is_cn20k() ? 47 : 44;
+	wdata = roc_npa_aura_handle_to_aura(aura_handle) << shift;
 	addr = (int64_t *)(roc_npa_aura_handle_to_base(aura_handle) +
 			   NPA_LF_POOL_OP_AVAILABLE);
 	reg = roc_atomic64_add_nosync(wdata, addr);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v3 03/18] common/cnxk: use new NPA aq enq mbox for cn20k
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 01/18] mempool/cnxk: add cn20k PCI device ids Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 02/18] common/cnxk: accommodate change in aura field width Nithin Dabilpuram
@ 2024-10-01 12:40   ` Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 04/18] mempool/cnxk: initialize mempool ops " Nithin Dabilpuram
                     ` (15 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-10-01 12:40 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Ashwin Sekhar T K

From: Ashwin Sekhar T K <asekhar@marvell.com>

A new mbox npa_cn20k_aq_enq_req has been added
for cn20k. Use this mbox for NPA configurations.

Note that the size of these new mbox request and
response remains same when compared to the older
mboxes. Also the new contexts npa_cn20k_aura_s/
npa_cn20k_pool_s which has been added for cn20k are
also same in size as older npa_aura_s/npa_pool_s.
So, we will be able to typecast these structures
into each other for most cases. Only the fields
that have changed in width/positions need to be
taken care of.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
---
 drivers/common/cnxk/hw/npa.h         | 164 ++++++++++++++++++++++++---
 drivers/common/cnxk/roc_mbox.h       |  32 ++++++
 drivers/common/cnxk/roc_nix_debug.c  |   9 +-
 drivers/common/cnxk/roc_nix_fc.c     |  54 ++++++---
 drivers/common/cnxk/roc_nix_tm_ops.c |  15 ++-
 drivers/common/cnxk/roc_npa.c        | 100 ++++++++++++++--
 drivers/common/cnxk/roc_npa_debug.c  |  17 ++-
 7 files changed, 339 insertions(+), 52 deletions(-)

diff --git a/drivers/common/cnxk/hw/npa.h b/drivers/common/cnxk/hw/npa.h
index 891a1b2b5f..4fd1f9a64b 100644
--- a/drivers/common/cnxk/hw/npa.h
+++ b/drivers/common/cnxk/hw/npa.h
@@ -216,10 +216,10 @@ struct npa_aura_op_wdata_s {
 	uint64_t drop : 1;
 };
 
-/* NPA aura context structure */
+/* NPA aura context structure [CN9K, CN10K] */
 struct npa_aura_s {
 	uint64_t pool_addr : 64; /* W0 */
-	uint64_t ena : 1;
+	uint64_t ena : 1; /* W1 */
 	uint64_t rsvd_66_65 : 2;
 	uint64_t pool_caching : 1;
 	uint64_t pool_way_mask : 16;
@@ -233,24 +233,24 @@ struct npa_aura_s {
 	uint64_t shift : 6;
 	uint64_t rsvd_119_118 : 2;
 	uint64_t avg_level : 8;
-	uint64_t count : 36;
+	uint64_t count : 36; /* W2 */
 	uint64_t rsvd_167_164 : 4;
 	uint64_t nix0_bpid : 9;
 	uint64_t rsvd_179_177 : 3;
 	uint64_t nix1_bpid : 9;
 	uint64_t rsvd_191_189 : 3;
-	uint64_t limit : 36;
+	uint64_t limit : 36; /* W3 */
 	uint64_t rsvd_231_228 : 4;
 	uint64_t bp : 8;
 	uint64_t rsvd_242_240 : 3;
-	uint64_t fc_be : 1; /* [CN10K, .) */
+	uint64_t fc_be : 1; /* [CN10K] */
 	uint64_t fc_ena : 1;
 	uint64_t fc_up_crossing : 1;
 	uint64_t fc_stype : 2;
 	uint64_t fc_hyst_bits : 4;
 	uint64_t rsvd_255_252 : 4;
 	uint64_t fc_addr : 64; /* W4 */
-	uint64_t pool_drop : 8;
+	uint64_t pool_drop : 8; /* W5 */
 	uint64_t update_time : 16;
 	uint64_t err_int : 8;
 	uint64_t err_int_ena : 8;
@@ -262,17 +262,17 @@ struct npa_aura_s {
 	uint64_t rsvd_371 : 1;
 	uint64_t err_qint_idx : 7;
 	uint64_t rsvd_383_379 : 5;
-	uint64_t thresh : 36;
+	uint64_t thresh : 36; /* W6 */
 	uint64_t rsvd_423_420 : 4;
-	uint64_t fc_msh_dst : 11; /* [CN10K, .) */
+	uint64_t fc_msh_dst : 11; /* [CN10K] */
 	uint64_t rsvd_447_435 : 13;
 	uint64_t rsvd_511_448 : 64; /* W7 */
 };
 
-/* NPA pool context structure */
+/* NPA pool context structure [CN9K, CN10K] */
 struct npa_pool_s {
 	uint64_t stack_base : 64; /* W0 */
-	uint64_t ena : 1;
+	uint64_t ena : 1; /* W1 */
 	uint64_t nat_align : 1;
 	uint64_t rsvd_67_66 : 2;
 	uint64_t stack_caching : 1;
@@ -282,11 +282,11 @@ struct npa_pool_s {
 	uint64_t rsvd_103_100 : 4;
 	uint64_t buf_size : 11;
 	uint64_t rsvd_127_115 : 13;
-	uint64_t stack_max_pages : 32;
+	uint64_t stack_max_pages : 32; /* W2 */
 	uint64_t stack_pages : 32;
-	uint64_t op_pc : 48;
+	uint64_t op_pc : 48; /* W3 */
 	uint64_t rsvd_255_240 : 16;
-	uint64_t stack_offset : 4;
+	uint64_t stack_offset : 4; /* W4 */
 	uint64_t rsvd_263_260 : 4;
 	uint64_t shift : 6;
 	uint64_t rsvd_271_270 : 2;
@@ -296,14 +296,14 @@ struct npa_pool_s {
 	uint64_t fc_stype : 2;
 	uint64_t fc_hyst_bits : 4;
 	uint64_t fc_up_crossing : 1;
-	uint64_t fc_be : 1; /* [CN10K, .) */
+	uint64_t fc_be : 1; /* [CN10K] */
 	uint64_t rsvd_299_298 : 2;
 	uint64_t update_time : 16;
 	uint64_t rsvd_319_316 : 4;
 	uint64_t fc_addr : 64;	 /* W5 */
 	uint64_t ptr_start : 64; /* W6 */
 	uint64_t ptr_end : 64;	 /* W7 */
-	uint64_t rsvd_535_512 : 24;
+	uint64_t rsvd_535_512 : 24; /* W8 */
 	uint64_t err_int : 8;
 	uint64_t err_int_ena : 8;
 	uint64_t thresh_int : 1;
@@ -314,9 +314,9 @@ struct npa_pool_s {
 	uint64_t rsvd_563 : 1;
 	uint64_t err_qint_idx : 7;
 	uint64_t rsvd_575_571 : 5;
-	uint64_t thresh : 36;
+	uint64_t thresh : 36; /* W9 */
 	uint64_t rsvd_615_612 : 4;
-	uint64_t fc_msh_dst : 11; /* [CN10K, .) */
+	uint64_t fc_msh_dst : 11; /* [CN10K] */
 	uint64_t rsvd_639_627 : 13;
 	uint64_t rsvd_703_640 : 64;  /* W10 */
 	uint64_t rsvd_767_704 : 64;  /* W11 */
@@ -326,6 +326,136 @@ struct npa_pool_s {
 	uint64_t rsvd_1023_960 : 64; /* W15 */
 };
 
+/* NPA aura context structure [CN20K] */
+struct npa_cn20k_aura_s {
+	uint64_t pool_addr : 64; /* W0 */
+	uint64_t ena : 1;   /* W1 */
+	uint64_t rsvd_66_65 : 2;
+	uint64_t pool_caching : 1;
+	uint64_t rsvd_68 : 16;
+	uint64_t avg_con : 9;
+	uint64_t rsvd_93 : 1;
+	uint64_t pool_drop_ena : 1;
+	uint64_t aura_drop_ena : 1;
+	uint64_t bp_ena : 1;
+	uint64_t rsvd_103_97 : 7;
+	uint64_t aura_drop : 8;
+	uint64_t shift : 6;
+	uint64_t rsvd_119_118 : 2;
+	uint64_t avg_level : 8;
+	uint64_t count : 36; /* W2 */
+	uint64_t rsvd_167_164 : 4;
+	uint64_t bpid : 12;
+	uint64_t rsvd_191_180 : 12;
+	uint64_t limit : 36; /* W3 */
+	uint64_t rsvd_231_228 : 4;
+	uint64_t bp : 7;
+	uint64_t rsvd_243_239 : 5;
+	uint64_t fc_ena : 1;
+	uint64_t fc_up_crossing : 1;
+	uint64_t fc_stype : 2;
+	uint64_t fc_hyst_bits : 4;
+	uint64_t rsvd_255_252 : 4;
+	uint64_t fc_addr : 64;  /* W4 */
+	uint64_t pool_drop : 8; /* W5 */
+	uint64_t update_time : 16;
+	uint64_t err_int : 8;
+	uint64_t err_int_ena : 8;
+	uint64_t thresh_int : 1;
+	uint64_t thresh_int_ena : 1;
+	uint64_t thresh_up : 1;
+	uint64_t rsvd_363 : 1;
+	uint64_t thresh_qint_idx : 7;
+	uint64_t rsvd_371 : 1;
+	uint64_t err_qint_idx : 7;
+	uint64_t rsvd_383_379 : 5;
+	uint64_t thresh : 36; /* W6*/
+	uint64_t rsvd_423_420 : 4;
+	uint64_t fc_msh_dst : 11;
+	uint64_t rsvd_438_435 : 4;
+	uint64_t op_dpc_ena : 1;
+	uint64_t op_dpc_set : 6;
+	uint64_t stream_ctx : 1;
+	uint64_t unified_ctx : 1;
+	uint64_t rsvd_511_448 : 64; /* W7 */
+};
+
+/* NPA pool context structure [CN20K] */
+struct npa_cn20k_pool_s {
+	uint64_t stack_base : 64; /* W0 */
+	uint64_t ena : 1; /* W1 */
+	uint64_t nat_align : 1;
+	uint64_t rsvd_67_66 : 2;
+	uint64_t stack_caching : 1;
+	uint64_t rsvd_87_69 : 19;
+	uint64_t buf_offset : 12;
+	uint64_t rsvd_103_100 : 4;
+	uint64_t buf_size : 12;
+	uint64_t rsvd_119_116 : 4;
+	uint64_t ref_cnt_prof : 3;
+	uint64_t rsvd_127_123 : 5;
+	uint64_t stack_max_pages : 32; /* W2 */
+	uint64_t stack_pages : 32;
+	uint64_t bp_0 : 7; /* W3 */
+	uint64_t bp_1 : 7;
+	uint64_t bp_2 : 7;
+	uint64_t bp_3 : 7;
+	uint64_t bp_4 : 7;
+	uint64_t bp_5 : 7;
+	uint64_t bp_6 : 7;
+	uint64_t bp_7 : 7;
+	uint64_t bp_ena_0 : 1;
+	uint64_t bp_ena_1 : 1;
+	uint64_t bp_ena_2 : 1;
+	uint64_t bp_ena_3 : 1;
+	uint64_t bp_ena_4 : 1;
+	uint64_t bp_ena_5 : 1;
+	uint64_t bp_ena_6 : 1;
+	uint64_t bp_ena_7 : 1;
+	uint64_t stack_offset : 4; /* W4 */
+	uint64_t rsvd_263_260 : 4;
+	uint64_t shift : 6;
+	uint64_t rsvd_271_270 : 2;
+	uint64_t avg_level : 8;
+	uint64_t avg_con : 9;
+	uint64_t fc_ena : 1;
+	uint64_t fc_stype : 2;
+	uint64_t fc_hyst_bits : 4;
+	uint64_t fc_up_crossing : 1;
+	uint64_t rsvd_299_297 : 3;
+	uint64_t update_time : 16;
+	uint64_t rsvd_319_316 : 4;
+	uint64_t fc_addr : 64;   /* W5 */
+	uint64_t ptr_start : 64; /* W6 */
+	uint64_t ptr_end : 64;   /* W7 */
+	uint64_t bpid_0 : 12; /* W8 */
+	uint64_t rsvd_535_524 : 12;
+	uint64_t err_int : 8;
+	uint64_t err_int_ena : 8;
+	uint64_t thresh_int : 1;
+	uint64_t thresh_int_ena : 1;
+	uint64_t thresh_up : 1;
+	uint64_t rsvd_555 : 1;
+	uint64_t thresh_qint_idx : 7;
+	uint64_t rsvd_563 : 1;
+	uint64_t err_qint_idx : 7;
+	uint64_t rsvd_575_571 : 5;
+	uint64_t thresh : 36; /* W9 */
+	uint64_t rsvd_615_612 : 4;
+	uint64_t fc_msh_dst : 11;
+	uint64_t rsvd_630_627 : 4;
+	uint64_t op_dpc_ena : 1;
+	uint64_t op_dpc_set : 6;
+	uint64_t stream_ctx : 1;
+	uint64_t rsvd_639 : 1;
+	uint64_t rsvd_703_640 : 64;  /* W10 */
+	uint64_t rsvd_767_704 : 64;  /* W11 */
+	uint64_t rsvd_831_768 : 64;  /* W12 */
+	uint64_t rsvd_895_832 : 64;  /* W13 */
+	uint64_t rsvd_959_896 : 64;  /* W14 */
+	uint64_t rsvd_1023_960 : 64; /* W15 */
+};
+
 /* NPA queue interrupt context hardware structure */
 struct npa_qint_hw_s {
 	uint32_t count : 22;
diff --git a/drivers/common/cnxk/roc_mbox.h b/drivers/common/cnxk/roc_mbox.h
index f1a3371ef9..9a9dcbdbda 100644
--- a/drivers/common/cnxk/roc_mbox.h
+++ b/drivers/common/cnxk/roc_mbox.h
@@ -119,6 +119,8 @@ struct mbox_msghdr {
 	M(NPA_AQ_ENQ, 0x402, npa_aq_enq, npa_aq_enq_req, npa_aq_enq_rsp)       \
 	M(NPA_HWCTX_DISABLE, 0x403, npa_hwctx_disable, hwctx_disable_req,      \
 	  msg_rsp)                                                             \
+	M(NPA_CN20K_AQ_ENQ, 0x404, npa_cn20k_aq_enq, npa_cn20k_aq_enq_req,     \
+	  npa_cn20k_aq_enq_rsp)                                                \
 	/* SSO/SSOW mbox IDs (range 0x600 - 0x7FF) */                          \
 	M(SSO_LF_ALLOC, 0x600, sso_lf_alloc, sso_lf_alloc_req,                 \
 	  sso_lf_alloc_rsp)                                                    \
@@ -1325,6 +1327,36 @@ struct npa_aq_enq_rsp {
 	};
 };
 
+struct npa_cn20k_aq_enq_req {
+	struct mbox_msghdr hdr;
+	uint32_t __io aura_id;
+	uint8_t __io ctype;
+	uint8_t __io op;
+	union {
+		/* Valid when op == WRITE/INIT and ctype == AURA */
+		__io struct npa_cn20k_aura_s aura;
+		/* Valid when op == WRITE/INIT and ctype == POOL */
+		__io struct npa_cn20k_pool_s pool;
+	};
+	/* Mask data when op == WRITE (1=write, 0=don't write) */
+	union {
+		/* Valid when op == WRITE and ctype == AURA */
+		__io struct npa_cn20k_aura_s aura_mask;
+		/* Valid when op == WRITE and ctype == POOL */
+		__io struct npa_cn20k_pool_s pool_mask;
+	};
+};
+
+struct npa_cn20k_aq_enq_rsp {
+	struct mbox_msghdr hdr;
+	union {
+		/* Valid when op == READ and ctype == AURA */
+		__io struct npa_cn20k_aura_s aura;
+		/* Valid when op == READ and ctype == POOL */
+		__io struct npa_cn20k_pool_s pool;
+	};
+};
+
 /* Disable all contexts of type 'ctype' */
 struct hwctx_disable_req {
 	struct mbox_msghdr hdr;
diff --git a/drivers/common/cnxk/roc_nix_debug.c b/drivers/common/cnxk/roc_nix_debug.c
index 26546f9297..2e91470c09 100644
--- a/drivers/common/cnxk/roc_nix_debug.c
+++ b/drivers/common/cnxk/roc_nix_debug.c
@@ -690,6 +690,7 @@ int
 roc_nix_queues_ctx_dump(struct roc_nix *roc_nix, FILE *file)
 {
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	struct npa_cn20k_aq_enq_req *npa_aq_cn20k;
 	int rc = -1, q, rq = nix->nb_rx_queues;
 	struct npa_aq_enq_rsp *npa_rsp;
 	struct npa_aq_enq_req *npa_aq;
@@ -772,8 +773,12 @@ roc_nix_queues_ctx_dump(struct roc_nix *roc_nix, FILE *file)
 			continue;
 		}
 
-		/* Dump SQB Aura minimal info */
-		npa_aq = mbox_alloc_msg_npa_aq_enq(mbox_get(npa_lf->mbox));
+		if (roc_model_is_cn20k()) {
+			npa_aq_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox_get(npa_lf->mbox));
+			npa_aq = (struct npa_aq_enq_req *)npa_aq_cn20k; /* Common fields */
+		} else {
+			npa_aq = mbox_alloc_msg_npa_aq_enq(mbox_get(npa_lf->mbox));
+		}
 		if (npa_aq == NULL) {
 			rc = -ENOSPC;
 			mbox_put(npa_lf->mbox);
diff --git a/drivers/common/cnxk/roc_nix_fc.c b/drivers/common/cnxk/roc_nix_fc.c
index 12bfb9816b..2f72e67993 100644
--- a/drivers/common/cnxk/roc_nix_fc.c
+++ b/drivers/common/cnxk/roc_nix_fc.c
@@ -158,6 +158,8 @@ static int
 nix_fc_rq_config_get(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 {
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	struct npa_cn20k_aq_enq_req *npa_req_cn20k;
+	struct npa_cn20k_aq_enq_rsp *npa_rsp_cn20k;
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = mbox_get(dev->mbox);
 	struct nix_aq_enq_rsp *rsp;
@@ -195,24 +197,44 @@ nix_fc_rq_config_get(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 	if (rc)
 		goto exit;
 
-	npa_req = mbox_alloc_msg_npa_aq_enq(mbox);
-	if (!npa_req) {
-		rc = -ENOSPC;
-		goto exit;
+	if (roc_model_is_cn20k()) {
+		npa_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		if (!npa_req_cn20k) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		npa_req_cn20k->aura_id = rsp->rq.lpb_aura;
+		npa_req_cn20k->ctype = NPA_AQ_CTYPE_AURA;
+		npa_req_cn20k->op = NPA_AQ_INSTOP_READ;
+
+		rc = mbox_process_msg(mbox, (void *)&npa_rsp_cn20k);
+		if (rc)
+			goto exit;
+
+		fc_cfg->cq_cfg.cq_drop = npa_rsp_cn20k->aura.bp;
+		fc_cfg->cq_cfg.enable = npa_rsp_cn20k->aura.bp_ena;
+		fc_cfg->type = ROC_NIX_FC_RQ_CFG;
+	} else {
+		npa_req = mbox_alloc_msg_npa_aq_enq(mbox);
+		if (!npa_req) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		npa_req->aura_id = rsp->rq.lpb_aura;
+		npa_req->ctype = NPA_AQ_CTYPE_AURA;
+		npa_req->op = NPA_AQ_INSTOP_READ;
+
+		rc = mbox_process_msg(mbox, (void *)&npa_rsp);
+		if (rc)
+			goto exit;
+
+		fc_cfg->cq_cfg.cq_drop = npa_rsp->aura.bp;
+		fc_cfg->cq_cfg.enable = npa_rsp->aura.bp_ena;
+		fc_cfg->type = ROC_NIX_FC_RQ_CFG;
 	}
 
-	npa_req->aura_id = rsp->rq.lpb_aura;
-	npa_req->ctype = NPA_AQ_CTYPE_AURA;
-	npa_req->op = NPA_AQ_INSTOP_READ;
-
-	rc = mbox_process_msg(mbox, (void *)&npa_rsp);
-	if (rc)
-		goto exit;
-
-	fc_cfg->cq_cfg.cq_drop = npa_rsp->aura.bp;
-	fc_cfg->cq_cfg.enable = npa_rsp->aura.bp_ena;
-	fc_cfg->type = ROC_NIX_FC_RQ_CFG;
-
 exit:
 	mbox_put(mbox);
 	return rc;
diff --git a/drivers/common/cnxk/roc_nix_tm_ops.c b/drivers/common/cnxk/roc_nix_tm_ops.c
index 9f3870a311..8144675f89 100644
--- a/drivers/common/cnxk/roc_nix_tm_ops.c
+++ b/drivers/common/cnxk/roc_nix_tm_ops.c
@@ -8,6 +8,7 @@
 int
 roc_nix_tm_sq_aura_fc(struct roc_nix_sq *sq, bool enable)
 {
+	struct npa_cn20k_aq_enq_req *req_cn20k;
 	struct npa_aq_enq_req *req;
 	struct npa_aq_enq_rsp *rsp;
 	uint64_t aura_handle;
@@ -25,7 +26,12 @@ roc_nix_tm_sq_aura_fc(struct roc_nix_sq *sq, bool enable)
 	mbox = mbox_get(lf->mbox);
 	/* Set/clear sqb aura fc_ena */
 	aura_handle = sq->aura_handle;
-	req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		req = (struct npa_aq_enq_req *)req_cn20k;
+	} else {
+		req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (req == NULL)
 		goto exit;
 
@@ -52,7 +58,12 @@ roc_nix_tm_sq_aura_fc(struct roc_nix_sq *sq, bool enable)
 
 	/* Read back npa aura ctx */
 	if (enable) {
-		req = mbox_alloc_msg_npa_aq_enq(mbox);
+		if (roc_model_is_cn20k()) {
+			req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+			req = (struct npa_aq_enq_req *)req_cn20k;
+		} else {
+			req = mbox_alloc_msg_npa_aq_enq(mbox);
+		}
 		if (req == NULL) {
 			rc = -ENOSPC;
 			goto exit;
diff --git a/drivers/common/cnxk/roc_npa.c b/drivers/common/cnxk/roc_npa.c
index 6c14c49901..934d7361a9 100644
--- a/drivers/common/cnxk/roc_npa.c
+++ b/drivers/common/cnxk/roc_npa.c
@@ -76,6 +76,7 @@ static int
 npa_aura_pool_init(struct mbox *m_box, uint32_t aura_id, struct npa_aura_s *aura,
 		   struct npa_pool_s *pool)
 {
+	struct npa_cn20k_aq_enq_req *aura_init_req_cn20k, *pool_init_req_cn20k;
 	struct npa_aq_enq_req *aura_init_req, *pool_init_req;
 	struct npa_aq_enq_rsp *aura_init_rsp, *pool_init_rsp;
 	struct mbox_dev *mdev = &m_box->dev[0];
@@ -83,7 +84,12 @@ npa_aura_pool_init(struct mbox *m_box, uint32_t aura_id, struct npa_aura_s *aura
 	struct mbox *mbox;
 
 	mbox = mbox_get(m_box);
-	aura_init_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		aura_init_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		aura_init_req = (struct npa_aq_enq_req *)aura_init_req_cn20k;
+	} else {
+		aura_init_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (aura_init_req == NULL)
 		goto exit;
 	aura_init_req->aura_id = aura_id;
@@ -91,6 +97,12 @@ npa_aura_pool_init(struct mbox *m_box, uint32_t aura_id, struct npa_aura_s *aura
 	aura_init_req->op = NPA_AQ_INSTOP_INIT;
 	mbox_memcpy(&aura_init_req->aura, aura, sizeof(*aura));
 
+	if (roc_model_is_cn20k()) {
+		pool_init_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		pool_init_req = (struct npa_aq_enq_req *)pool_init_req_cn20k;
+	} else {
+		pool_init_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	pool_init_req = mbox_alloc_msg_npa_aq_enq(mbox);
 	if (pool_init_req == NULL)
 		goto exit;
@@ -121,13 +133,19 @@ npa_aura_pool_init(struct mbox *m_box, uint32_t aura_id, struct npa_aura_s *aura
 static int
 npa_aura_init(struct mbox *m_box, uint32_t aura_id, struct npa_aura_s *aura)
 {
+	struct npa_cn20k_aq_enq_req *aura_init_req_cn20k;
 	struct npa_aq_enq_req *aura_init_req;
 	struct npa_aq_enq_rsp *aura_init_rsp;
 	struct mbox *mbox;
 	int rc = -ENOSPC;
 
 	mbox = mbox_get(m_box);
-	aura_init_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		aura_init_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		aura_init_req = (struct npa_aq_enq_req *)aura_init_req_cn20k;
+	} else {
+		aura_init_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (aura_init_req == NULL)
 		goto exit;
 	aura_init_req->aura_id = aura_id;
@@ -151,6 +169,7 @@ npa_aura_init(struct mbox *m_box, uint32_t aura_id, struct npa_aura_s *aura)
 static int
 npa_aura_pool_fini(struct mbox *m_box, uint32_t aura_id, uint64_t aura_handle)
 {
+	struct npa_cn20k_aq_enq_req *aura_req_cn20k, *pool_req_cn20k;
 	struct npa_aq_enq_req *aura_req, *pool_req;
 	struct npa_aq_enq_rsp *aura_rsp, *pool_rsp;
 	struct mbox_dev *mdev = &m_box->dev[0];
@@ -168,7 +187,12 @@ npa_aura_pool_fini(struct mbox *m_box, uint32_t aura_id, uint64_t aura_handle)
 	} while (ptr);
 
 	mbox = mbox_get(m_box);
-	pool_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		pool_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		pool_req = (struct npa_aq_enq_req *)pool_req_cn20k;
+	} else {
+		pool_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (pool_req == NULL)
 		goto exit;
 	pool_req->aura_id = aura_id;
@@ -177,7 +201,12 @@ npa_aura_pool_fini(struct mbox *m_box, uint32_t aura_id, uint64_t aura_handle)
 	pool_req->pool.ena = 0;
 	pool_req->pool_mask.ena = ~pool_req->pool_mask.ena;
 
-	aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		aura_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		aura_req = (struct npa_aq_enq_req *)aura_req_cn20k;
+	} else {
+		aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (aura_req == NULL)
 		goto exit;
 	aura_req->aura_id = aura_id;
@@ -185,8 +214,18 @@ npa_aura_pool_fini(struct mbox *m_box, uint32_t aura_id, uint64_t aura_handle)
 	aura_req->op = NPA_AQ_INSTOP_WRITE;
 	aura_req->aura.ena = 0;
 	aura_req->aura_mask.ena = ~aura_req->aura_mask.ena;
-	aura_req->aura.bp_ena = 0;
-	aura_req->aura_mask.bp_ena = ~aura_req->aura_mask.bp_ena;
+	if (roc_model_is_cn20k()) {
+		__io struct npa_cn20k_aura_s *aura_cn20k, *aura_mask_cn20k;
+
+		/* The bit positions/width of bp_ena has changed in cn20k */
+		aura_cn20k = (__io struct npa_cn20k_aura_s *)&aura_req->aura;
+		aura_cn20k->bp_ena = 0;
+		aura_mask_cn20k = (__io struct npa_cn20k_aura_s *)&aura_req->aura_mask;
+		aura_mask_cn20k->bp_ena = ~aura_mask_cn20k->bp_ena;
+	} else {
+		aura_req->aura.bp_ena = 0;
+		aura_req->aura_mask.bp_ena = ~aura_req->aura_mask.bp_ena;
+	}
 
 	rc = mbox_process(mbox);
 	if (rc < 0)
@@ -204,6 +243,12 @@ npa_aura_pool_fini(struct mbox *m_box, uint32_t aura_id, uint64_t aura_handle)
 		goto exit;
 	}
 
+	if (roc_model_is_cn20k()) {
+		/* In cn20k, NPA does not use NDC */
+		rc = 0;
+		goto exit;
+	}
+
 	/* Sync NDC-NPA for LF */
 	ndc_req = mbox_alloc_msg_ndc_sync_op(mbox);
 	if (ndc_req == NULL) {
@@ -226,6 +271,7 @@ npa_aura_pool_fini(struct mbox *m_box, uint32_t aura_id, uint64_t aura_handle)
 static int
 npa_aura_fini(struct mbox *m_box, uint32_t aura_id)
 {
+	struct npa_cn20k_aq_enq_req *aura_req_cn20k;
 	struct npa_aq_enq_req *aura_req;
 	struct npa_aq_enq_rsp *aura_rsp;
 	struct ndc_sync_op *ndc_req;
@@ -236,7 +282,12 @@ npa_aura_fini(struct mbox *m_box, uint32_t aura_id)
 	plt_delay_us(10);
 
 	mbox = mbox_get(m_box);
-	aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		aura_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		aura_req = (struct npa_aq_enq_req *)aura_req_cn20k;
+	} else {
+		aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (aura_req == NULL)
 		goto exit;
 	aura_req->aura_id = aura_id;
@@ -254,6 +305,12 @@ npa_aura_fini(struct mbox *m_box, uint32_t aura_id)
 		goto exit;
 	}
 
+	if (roc_model_is_cn20k()) {
+		/* In cn20k, NPA does not use NDC */
+		rc = 0;
+		goto exit;
+	}
+
 	/* Sync NDC-NPA for LF */
 	ndc_req = mbox_alloc_msg_ndc_sync_op(mbox);
 	if (ndc_req == NULL) {
@@ -335,6 +392,7 @@ roc_npa_pool_op_pc_reset(uint64_t aura_handle)
 int
 roc_npa_aura_drop_set(uint64_t aura_handle, uint64_t limit, bool ena)
 {
+	struct npa_cn20k_aq_enq_req *aura_req_cn20k;
 	struct npa_aq_enq_req *aura_req;
 	struct npa_lf *lf;
 	struct mbox *mbox;
@@ -344,7 +402,12 @@ roc_npa_aura_drop_set(uint64_t aura_handle, uint64_t limit, bool ena)
 	if (lf == NULL)
 		return NPA_ERR_DEVICE_NOT_BOUNDED;
 	mbox = mbox_get(lf->mbox);
-	aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		aura_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		aura_req = (struct npa_aq_enq_req *)aura_req_cn20k;
+	} else {
+		aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (aura_req == NULL) {
 		rc = -ENOMEM;
 		goto exit;
@@ -723,6 +786,7 @@ roc_npa_aura_create(uint64_t *aura_handle, uint32_t block_count,
 int
 roc_npa_aura_limit_modify(uint64_t aura_handle, uint16_t aura_limit)
 {
+	struct npa_cn20k_aq_enq_req *aura_req_cn20k;
 	struct npa_aq_enq_req *aura_req;
 	struct npa_lf *lf;
 	struct mbox *mbox;
@@ -733,7 +797,12 @@ roc_npa_aura_limit_modify(uint64_t aura_handle, uint16_t aura_limit)
 		return NPA_ERR_DEVICE_NOT_BOUNDED;
 
 	mbox = mbox_get(lf->mbox);
-	aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		aura_req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		aura_req = (struct npa_aq_enq_req *)aura_req_cn20k;
+	} else {
+		aura_req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (aura_req == NULL) {
 		rc = -ENOMEM;
 		goto exit;
@@ -834,12 +903,13 @@ int
 roc_npa_pool_range_update_check(uint64_t aura_handle)
 {
 	uint64_t aura_id = roc_npa_aura_handle_to_aura(aura_handle);
-	struct npa_lf *lf;
-	struct npa_aura_lim *lim;
+	struct npa_cn20k_aq_enq_req *req_cn20k;
 	__io struct npa_pool_s *pool;
 	struct npa_aq_enq_req *req;
 	struct npa_aq_enq_rsp *rsp;
+	struct npa_aura_lim *lim;
 	struct mbox *mbox;
+	struct npa_lf *lf;
 	int rc;
 
 	lf = idev_npa_obj_get();
@@ -849,7 +919,12 @@ roc_npa_pool_range_update_check(uint64_t aura_handle)
 	lim = lf->aura_lim;
 
 	mbox = mbox_get(lf->mbox);
-	req = mbox_alloc_msg_npa_aq_enq(mbox);
+	if (roc_model_is_cn20k()) {
+		req_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+		req = (struct npa_aq_enq_req *)req_cn20k;
+	} else {
+		req = mbox_alloc_msg_npa_aq_enq(mbox);
+	}
 	if (req == NULL) {
 		rc = -ENOSPC;
 		goto exit;
@@ -903,6 +978,7 @@ int
 roc_npa_aura_bp_configure(uint64_t aura_handle, uint16_t bpid, uint8_t bp_intf, uint8_t bp_thresh,
 			  bool enable)
 {
+	/* TODO: Add support for CN20K */
 	uint32_t aura_id = roc_npa_aura_handle_to_aura(aura_handle);
 	struct npa_lf *lf = idev_npa_obj_get();
 	struct npa_aq_enq_req *req;
diff --git a/drivers/common/cnxk/roc_npa_debug.c b/drivers/common/cnxk/roc_npa_debug.c
index 173d32cd9b..9a16f481a8 100644
--- a/drivers/common/cnxk/roc_npa_debug.c
+++ b/drivers/common/cnxk/roc_npa_debug.c
@@ -89,8 +89,9 @@ npa_aura_dump(__io struct npa_aura_s *aura)
 int
 roc_npa_ctx_dump(void)
 {
-	struct npa_aq_enq_req *aq;
+	struct npa_cn20k_aq_enq_req *aq_cn20k;
 	struct npa_aq_enq_rsp *rsp;
+	struct npa_aq_enq_req *aq;
 	struct mbox *mbox;
 	struct npa_lf *lf;
 	uint32_t q;
@@ -106,7 +107,12 @@ roc_npa_ctx_dump(void)
 		if (plt_bitmap_get(lf->npa_bmp, q))
 			continue;
 
-		aq = mbox_alloc_msg_npa_aq_enq(mbox);
+		if (roc_model_is_cn20k()) {
+			aq_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+			aq = (struct npa_aq_enq_req *)aq_cn20k;
+		} else {
+			aq = mbox_alloc_msg_npa_aq_enq(mbox);
+		}
 		if (aq == NULL) {
 			rc = -ENOSPC;
 			goto exit;
@@ -129,7 +135,12 @@ roc_npa_ctx_dump(void)
 		if (plt_bitmap_get(lf->npa_bmp, q))
 			continue;
 
-		aq = mbox_alloc_msg_npa_aq_enq(mbox);
+		if (roc_model_is_cn20k()) {
+			aq_cn20k = mbox_alloc_msg_npa_cn20k_aq_enq(mbox);
+			aq = (struct npa_aq_enq_req *)aq_cn20k;
+		} else {
+			aq = mbox_alloc_msg_npa_aq_enq(mbox);
+		}
 		if (aq == NULL) {
 			rc = -ENOSPC;
 			goto exit;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v3 04/18] mempool/cnxk: initialize mempool ops for cn20k
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
                     ` (2 preceding siblings ...)
  2024-10-01 12:40   ` [PATCH v3 03/18] common/cnxk: use new NPA aq enq mbox for cn20k Nithin Dabilpuram
@ 2024-10-01 12:40   ` Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 05/18] common/cnxk: add cn20k NIX register definitions Nithin Dabilpuram
                     ` (14 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-10-01 12:40 UTC (permalink / raw)
  To: jerinj, Ashwin Sekhar T K, Pavan Nikhilesh; +Cc: dev

From: Ashwin Sekhar T K <asekhar@marvell.com>

Initialize mempool ops for cn20k.

Signed-off-by: Ashwin Sekhar T K <asekhar@marvell.com>
---
 drivers/mempool/cnxk/cnxk_mempool_ops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mempool/cnxk/cnxk_mempool_ops.c b/drivers/mempool/cnxk/cnxk_mempool_ops.c
index a1aeaee746..bb35e2d1d2 100644
--- a/drivers/mempool/cnxk/cnxk_mempool_ops.c
+++ b/drivers/mempool/cnxk/cnxk_mempool_ops.c
@@ -192,7 +192,7 @@ cnxk_mempool_plt_init(void)
 
 	if (roc_model_is_cn9k()) {
 		rte_mbuf_set_platform_mempool_ops("cn9k_mempool_ops");
-	} else if (roc_model_is_cn10k()) {
+	} else if (roc_model_is_cn10k() || roc_model_is_cn20k()) {
 		rte_mbuf_set_platform_mempool_ops("cn10k_mempool_ops");
 		rc = cn10k_mempool_plt_init();
 	}
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v3 05/18] common/cnxk: add cn20k NIX register definitions
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
                     ` (3 preceding siblings ...)
  2024-10-01 12:40   ` [PATCH v3 04/18] mempool/cnxk: initialize mempool ops " Nithin Dabilpuram
@ 2024-10-01 12:40   ` Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 06/18] common/cnxk: support NIX queue config for cn20k Nithin Dabilpuram
                     ` (13 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-10-01 12:40 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

From: Satha Rao <skoteshwar@marvell.com>

Add cn20k NIX register definitions.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
---
 drivers/common/cnxk/hw/nix.h   | 524 +++++++++++++++++++++++++++++----
 drivers/common/cnxk/hw/rvu.h   |   7 +-
 drivers/common/cnxk/roc_mbox.h |  52 ++++
 drivers/common/cnxk/roc_nix.c  |  15 +-
 4 files changed, 533 insertions(+), 65 deletions(-)

diff --git a/drivers/common/cnxk/hw/nix.h b/drivers/common/cnxk/hw/nix.h
index 1720eb3815..dd629a2080 100644
--- a/drivers/common/cnxk/hw/nix.h
+++ b/drivers/common/cnxk/hw/nix.h
@@ -32,7 +32,7 @@
 #define NIX_AF_RX_CFG			(0xd0ull)
 #define NIX_AF_AVG_DELAY		(0xe0ull)
 #define NIX_AF_CINT_DELAY		(0xf0ull)
-#define NIX_AF_VWQE_TIMER		(0xf8ull) /* [CN10K, .) */
+#define NIX_AF_VWQE_TIMER		(0xf8ull) /* [CN10K, CN20K) */
 #define NIX_AF_RX_MCAST_BASE		(0x100ull)
 #define NIX_AF_RX_MCAST_CFG		(0x110ull)
 #define NIX_AF_RX_MCAST_BUF_BASE	(0x120ull)
@@ -82,9 +82,11 @@
 #define NIX_AF_RX_DEF_IIP6_DSCP		(0x2f0ull) /* [CN10K, .) */
 #define NIX_AF_RX_DEF_OIP6_DSCP		(0x2f8ull) /* [CN10K, .) */
 #define NIX_AF_RX_IPSEC_GEN_CFG		(0x300ull)
-#define NIX_AF_RX_IPSEC_VWQE_GEN_CFG	(0x310ull) /* [CN10K, .) */
-#define NIX_AF_RX_CPTX_INST_QSEL(a)	(0x320ull | (uint64_t)(a) << 3)
-#define NIX_AF_RX_CPTX_CREDIT(a)	(0x360ull | (uint64_t)(a) << 3)
+#define NIX_AF_RX_IPSEC_VWQE_GEN_CFG	(0x310ull) /* [CN10K, CN20K) */
+#define NIX_AF_RX_CPTX_INST_QSEL(a)	(0x340ull | (uint64_t)(a) << 16) /* [CN20K, .) */
+#define NIX_AF_RX_CPTX_CREDIT(a)	(0x380ull | (uint64_t)(a) << 16) /* [CN20K, .) */
+#define NIX_AF_CN9K_RX_CPTX_INST_QSEL(a)(0x320ull | (uint64_t)(a) << 3) /* [CN9K, CN20K) */
+#define NIX_AF_CN9K_RX_CPTX_CREDIT(a)	(0x360ull | (uint64_t)(a) << 3) /* [CN9K, CN20K) */
 #define NIX_AF_NDC_RX_SYNC		(0x3e0ull)
 #define NIX_AF_NDC_TX_SYNC		(0x3f0ull)
 #define NIX_AF_AQ_CFG			(0x400ull)
@@ -100,12 +102,14 @@
 #define NIX_AF_RX_LINKX_CFG(a)		(0x540ull | (uint64_t)(a) << 16)
 #define NIX_AF_RX_SW_SYNC		(0x550ull)
 #define NIX_AF_RX_LINKX_WRR_CFG(a)	(0x560ull | (uint64_t)(a) << 16)
+#define NIX_AF_RQM_ECO                  (0x5a0ull)
 #define NIX_AF_SEB_CFG			(0x5f0ull) /* [CN10K, .) */
 #define NIX_AF_EXPR_TX_FIFO_STATUS	(0x640ull) /* [CN9K, CN10K) */
 #define NIX_AF_NORM_TX_FIFO_STATUS	(0x648ull)
 #define NIX_AF_SDP_TX_FIFO_STATUS	(0x650ull)
 #define NIX_AF_TX_NPC_CAPTURE_CONFIG	(0x660ull)
 #define NIX_AF_TX_NPC_CAPTURE_INFO	(0x668ull)
+#define NIX_AF_SEB_COALESCE_DBGX(a)             (0x670ull | (uint64_t)(a) << 3)
 #define NIX_AF_TX_NPC_CAPTURE_RESPX(a)	(0x680ull | (uint64_t)(a) << 3)
 #define NIX_AF_SEB_ACTIVE_CYCLES_PCX(a) (0x6c0ull | (uint64_t)(a) << 3)
 #define NIX_AF_SMQX_CFG(a)		(0x700ull | (uint64_t)(a) << 16)
@@ -115,6 +119,7 @@
 #define NIX_AF_SMQX_NXT_HEAD(a)		(0x740ull | (uint64_t)(a) << 16)
 #define NIX_AF_SQM_ACTIVE_CYCLES_PC	(0x770ull)
 #define NIX_AF_SQM_SCLK_CNT		(0x780ull) /* [CN10K, .) */
+#define NIX_AF_DWRR_MTUX(a)             (0x790ull | (uint64_t)(a) << 16)
 #define NIX_AF_DWRR_SDP_MTU		(0x790ull) /* [CN10K, .) */
 #define NIX_AF_DWRR_RPM_MTU		(0x7a0ull) /* [CN10K, .) */
 #define NIX_AF_PSE_CHANNEL_LEVEL	(0x800ull)
@@ -131,6 +136,7 @@
 #define NIX_AF_TX_LINKX_HW_XOFF(a)	(0xa30ull | (uint64_t)(a) << 16)
 #define NIX_AF_SDP_LINK_CREDIT		(0xa40ull)
 #define NIX_AF_SDP_LINK_CDT_ADJ		(0xa50ull) /* [CN10K, .) */
+#define NIX_AF_LINK_CDT_ADJ_ERR		(0xaa0ull) /* [CN10K, .) */
 /* [CN9K, CN10K) */
 #define NIX_AF_SDP_SW_XOFFX(a)	    (0xa60ull | (uint64_t)(a) << 3)
 #define NIX_AF_SDP_HW_XOFFX(a)	    (0xac0ull | (uint64_t)(a) << 3)
@@ -226,7 +232,7 @@
 #define NIX_AF_TL4X_CIR(a)		 (0x1220ull | (uint64_t)(a) << 16)
 #define NIX_AF_TL4X_PIR(a)		 (0x1230ull | (uint64_t)(a) << 16)
 #define NIX_AF_TL4X_SCHED_STATE(a)	 (0x1240ull | (uint64_t)(a) << 16)
-#define NIX_AF_TL4X_SHAPE_STATE(a)	 (0x1250ull | (uint64_t)(a) << 16)
+#define NIX_AF_TL4X_SHAPE_STATE_PIR(a)	 (0x1250ull | (uint64_t)(a) << 16)
 #define NIX_AF_TL4X_SW_XOFF(a)		 (0x1270ull | (uint64_t)(a) << 16)
 #define NIX_AF_TL4X_TOPOLOGY(a)		 (0x1280ull | (uint64_t)(a) << 16)
 #define NIX_AF_TL4X_PARENT(a)		 (0x1288ull | (uint64_t)(a) << 16)
@@ -272,6 +278,18 @@
 #define NIX_AF_CINT_TIMERX(a)	    (0x1a40ull | (uint64_t)(a) << 18)
 #define NIX_AF_LSO_FORMATX_FIELDX(a, b)                                        \
 	(0x1b00ull | (uint64_t)(a) << 16 | (uint64_t)(b) << 3)
+/* [CN10K, .) */
+#define NIX_AF_SPI_TO_SA_KEYX_WAYX(a, b)    (0x1c00ull | (uint64_t)(a) << 16 | (uint64_t)(b) << 3)
+#define NIX_AF_SPI_TO_SA_VALUEX_WAYX(a, b)  (0x1c40ull | (uint64_t)(a) << 16 | (uint64_t)(b) << 3)
+#define NIX_AF_SPI_TO_SA_CFG		    (0x1c80ull)
+#define NIX_AF_SPI_TO_SA_CFG1		    (0x1c88ull)
+#define NIX_AF_SPI_TO_SA_HASH_KEY	    (0x1c90ull)
+#define NIX_AF_SPI_TO_SA_HASH_VALUE	    (0x1ca0ull)
+/* CN20K, .) */
+#define NIX_AF_RX_IPSEC_VLAN_CFGX(a)	    (0x1d00ull | (uint64_t)(a) << 3)
+#define NIX_AF_RX_IPSEC_QMAPX_DSCPX(a, b)   (0x1e00ull | (uint64_t)(a) << 6 | (uint64_t)(b) << 3)
+#define NIX_AF_RX_SSO_GRPX_BP_CFG(a)	    (0x2000ull | (uint64_t)(a) << 3)
+#define NIX_AF_RX_SSO_GRPX_BP_LEVEL(a)	    (0x3000ull | (uint64_t)(a) << 3)
 #define NIX_AF_LFX_CFG(a) (0x4000ull | (uint64_t)(a) << 17)
 /* [CN10K, .) */
 #define NIX_AF_LINKX_CFG(a)		 (0x4010ull | (uint64_t)(a) << 17)
@@ -348,6 +366,7 @@
 #define NIX_LF_TX_STATX(a)	 (0x300ull | (uint64_t)(a) << 3)
 #define NIX_LF_RX_STATX(a)	 (0x400ull | (uint64_t)(a) << 3)
 #define NIX_LF_OP_SENDX(a)	 (0x800ull | (uint64_t)(a) << 3)
+#define NIX_LF_PTP_CLOCK	 (0x8f8ull) /* [CN20K, .) */
 #define NIX_LF_RQ_OP_INT	 (0x900ull)
 #define NIX_LF_RQ_OP_OCTS	 (0x910ull)
 #define NIX_LF_RQ_OP_PKTS	 (0x920ull)
@@ -355,7 +374,7 @@
 #define NIX_LF_RQ_OP_DROP_PKTS	 (0x940ull)
 #define NIX_LF_RQ_OP_RE_PKTS	 (0x950ull)
 #define NIX_LF_OP_IPSEC_DYNO_CNT (0x980ull)
-#define NIX_LF_OP_VWQE_FLUSH	 (0x9a0ull) /* [CN10K, .) */
+#define NIX_LF_OP_VWQE_FLUSH	 (0x9a0ull) /* [CN10K, CN20K) */
 #define NIX_LF_PL_OP_BAND_PROF	 (0x9c0ull) /* [CN10K, .) */
 #define NIX_LF_SQ_OP_INT	 (0xa00ull)
 #define NIX_LF_SQ_OP_OCTS	 (0xa10ull)
@@ -368,6 +387,9 @@
 #define NIX_LF_CQ_OP_INT	 (0xb00ull)
 #define NIX_LF_CQ_OP_DOOR	 (0xb30ull)
 #define NIX_LF_CQ_OP_STATUS	 (0xb40ull)
+#define NIX_LF_SSO_BP_OP_DOOR	 (0xb50ull) /* [CN20K, .) */
+#define NIX_LF_SSO_BP_OP_LEVEL	 (0xb58ull) /* [CN20K, .) */
+#define NIX_LF_SSO_BP_OP_INT	 (0xb60ull) /* [CN20K, .) */
 #define NIX_LF_QINTX_CNT(a)	 (0xc00ull | (uint64_t)(a) << 12)
 #define NIX_LF_QINTX_INT(a)	 (0xc10ull | (uint64_t)(a) << 12)
 #define NIX_LF_QINTX_ENA_W1S(a)	 (0xc20ull | (uint64_t)(a) << 12)
@@ -389,6 +411,8 @@
 
 /* Enum offsets */
 
+#define NIX_SSOERRINT_DOOR_ERR	(0x0ull) /*[CN20K, .) */
+
 #define NIX_STAT_LF_TX_TX_UCAST (0x0ull)
 #define NIX_STAT_LF_TX_TX_BCAST (0x1ull)
 #define NIX_STAT_LF_TX_TX_MCAST (0x2ull)
@@ -572,6 +596,7 @@
 #define NIX_SEND_STATUS_NPC_VTAG_SIZE_ERR  (0x26ull)
 #define NIX_SEND_STATUS_SEND_MEM_FAULT	   (0x27ull)
 #define NIX_SEND_STATUS_SEND_STATS_ERR	   (0x28ull)
+#define NIX_SEND_STATUS_SEND_HDR_DROP	   (0x29ull) /* [CN20K, .) */
 
 #define NIX_SENDSTATSALG_NOP			     (0x0ull)
 #define NIX_SENDSTATSALG_ADD_PKT_CNT		     (0x1ull)
@@ -606,6 +631,7 @@
 #define NIX_SUBDC_WORK		(0x7ull)
 #define NIX_SUBDC_SG2		(0x8ull) /* [CN10K, .) */
 #define NIX_SUBDC_AGE_AND_STATS (0x9ull) /* [CN10K, .) */
+#define NIX_SUBDC_COMPID	(0xaull) /* [CN20K, .) */
 #define NIX_SUBDC_SOD		(0xfull)
 
 #define NIX_STYPE_STF (0x0ull)
@@ -644,6 +670,18 @@
 #define NIX_LSOALG_ADD_PAYLEN (0x2ull)
 #define NIX_LSOALG_ADD_OFFSET (0x3ull)
 #define NIX_LSOALG_TCP_FLAGS  (0x4ull)
+#define NIX_LSOALG_ALT_FLAGS  (0x5ull) /* [CN20K, .) */
+
+#define NIX_METER_CFG_RFC_2698 (0x0ull) /* [CN20K, .) */
+#define NIX_METER_CFG_RFC_2697 (0x1ull) /* [CN20K, .) */
+#define NIX_METER_CFG_RFC_4115 (0x2ull) /* [CN20K, .) */
+
+#define NIX_NDC_RX_PORT_AQ	(0x0ull)
+#define NIX_NDC_RX_PORT_C	(0x1ull)
+#define NIX_NDC_RX_PORT_CINT	(0x2ull)
+#define NIX_NDC_RX_PORT_MC	(0x3ull)
+#define NIX_NDC_RX_PORT_PKT	(0x4ull)
+#define NIX_NDC_RX_PORT_RQ	(0x5ull)
 
 #define NIX_MNQERR_SQ_CTX_FAULT	    (0x0ull)
 #define NIX_MNQERR_SQ_CTX_POISON    (0x1ull)
@@ -732,12 +770,14 @@
 #define NIX_RX_PERRCODE_IL4_PORT       (0x23ull)
 
 #define NIX_SA_ALG_NON_MS     (0x0ull) /* [CN10K, .) */
-#define NIX_SA_ALG_MS_CISCO   (0x1ull) /* [CN10K, .) */
-#define NIX_SA_ALG_MS_VIPTELA (0x2ull) /* [CN10K, .) */
+#define NIX_SA_ALG_MS_31_28   (0x1ull) /* [CN10K, .) */
+#define NIX_SA_ALG_MS_27_25   (0x2ull) /* [CN10K, .) */
+#define NIX_SA_ALG_MS_28_25   (0x3ull) /* [CN10K, .) */
 
 #define NIX_SENDCRCALG_CRC32  (0x0ull)
 #define NIX_SENDCRCALG_CRC32C (0x1ull)
 #define NIX_SENDCRCALG_ONES16 (0x2ull)
+#define NIX_SENDCRCALG_INVCRC (0x3ull) /* [CN10K, .) */
 
 #define NIX_SENDL3TYPE_NONE	 (0x0ull)
 #define NIX_SENDL3TYPE_IP4	 (0x2ull)
@@ -761,7 +801,7 @@
 #define NIX_XQE_TYPE_RX_IPSECS (0x2ull)
 #define NIX_XQE_TYPE_RX_IPSECH (0x3ull)
 #define NIX_XQE_TYPE_RX_IPSECD (0x4ull)
-#define NIX_XQE_TYPE_RX_VWQE   (0x5ull) /* [CN10K, .) */
+#define NIX_XQE_TYPE_RX_VWQE   (0x5ull) /* [CN10K, CN20K) */
 #define NIX_XQE_TYPE_RES_6     (0x6ull)
 #define NIX_XQE_TYPE_RES_7     (0x7ull)
 #define NIX_XQE_TYPE_SEND      (0x8ull)
@@ -825,6 +865,11 @@
 #define NIX_AQ_CTYPE_DYNO      (0x5ull)
 #define NIX_AQ_CTYPE_BAND_PROF (0x6ull) /* [CN10K, .) */
 
+#define NIX_CQERRINT_DOOR_ERR  (0x0ull)
+#define NIX_CQERRINT_WR_FULL   (0x1ull)
+#define NIX_CQERRINT_CQE_FAULT (0x2ull)
+#define NIX_CQERRINT_CPT_DROP  (0x3ull) /* [CN10KB, .) */
+
 #define NIX_COLORRESULT_GREEN	 (0x0ull)
 #define NIX_COLORRESULT_YELLOW	 (0x1ull)
 #define NIX_COLORRESULT_RED_SEND (0x2ull)
@@ -846,11 +891,6 @@
 #define NIX_CHAN_RPMX_LMACX_CHX(a, b, c)                                       \
 	(0x800ull | ((uint64_t)(a) << 8) | ((uint64_t)(b) << 4) | (uint64_t)(c))
 
-/* The mask is to extract lower 10-bits of channel number
- * which CPT will pass to X2P.
- */
-#define NIX_CHAN_CPT_X2P_MASK (0x3ffull)
-
 #define NIX_INTF_SDP  (0x4ull)
 #define NIX_INTF_CGX0 (0x0ull) /* [CN9K, CN10K) */
 #define NIX_INTF_CGX1 (0x1ull) /* [CN9K, CN10K) */
@@ -861,11 +901,6 @@
 #define NIX_INTF_LBK0 (0x3ull)
 #define NIX_INTF_CPT0 (0x5ull) /* [CN10K, .) */
 
-#define NIX_CQERRINT_DOOR_ERR  (0x0ull)
-#define NIX_CQERRINT_WR_FULL   (0x1ull)
-#define NIX_CQERRINT_CQE_FAULT (0x2ull)
-#define NIX_CQERRINT_CPT_DROP  (0x3ull) /* [CN10KB, .) */
-
 #define NIX_LINK_SDP (0xdull) /* [CN10K, .) */
 #define NIX_LINK_CPT (0xeull) /* [CN10K, .) */
 #define NIX_LINK_MC  (0xfull) /* [CN10K, .) */
@@ -894,7 +929,7 @@ struct nix_age_and_send_stats_s {
 	uint64_t threshold : 29;
 	uint64_t latency_drop : 1;
 	uint64_t aging : 1;
-	uint64_t wmem : 1;
+	uint64_t coas_en : 1;
 	uint64_t ooffset : 12;
 	uint64_t ioffset : 12;
 	uint64_t sel : 1;
@@ -907,8 +942,8 @@ struct nix_age_and_send_stats_s {
 struct nix_aq_inst_s {
 	uint64_t op : 4;
 	uint64_t ctype : 4;
-	uint64_t lf : 7;
-	uint64_t rsvd_23_15 : 9;
+	uint64_t lf : 9;
+	uint64_t rsvd_23_17 : 7;
 	uint64_t cindex : 20;
 	uint64_t rsvd_62_44 : 19;
 	uint64_t doneint : 1;
@@ -927,7 +962,7 @@ struct nix_aq_res_s {
 
 /* NIX bandwidth profile structure */
 struct nix_band_prof_s {
-	uint64_t pc_mode : 2;
+	uint64_t pc_mode : 2; /* W0 */
 	uint64_t icolor : 2;
 	uint64_t tnl_ena : 1;
 	uint64_t rsvd_7_5 : 3;
@@ -942,7 +977,7 @@ struct nix_band_prof_s {
 	uint64_t peir_mantissa : 8;
 	uint64_t pebs_mantissa : 8;
 	uint64_t cir_mantissa : 8;
-	uint64_t cbs_mantissa : 8;
+	uint64_t cbs_mantissa : 8; /* W1 */
 	uint64_t lmode : 1;
 	uint64_t l_sellect : 3;
 	uint64_t rdiv : 4;
@@ -953,37 +988,37 @@ struct nix_band_prof_s {
 	uint64_t yc_action : 2;
 	uint64_t rc_action : 2;
 	uint64_t meter_algo : 2;
-	uint64_t band_prof_id : 7;
-	uint64_t rsvd_118_111 : 8;
+	uint64_t band_prof_id : 11;
+	uint64_t rsvd_118_115 : 4;
 	uint64_t hl_en : 1;
 	uint64_t rsvd_127_120 : 8;
-	uint64_t ts : 48;
+	uint64_t ts : 48; /* W2 */
 	uint64_t rsvd_191_176 : 16;
-	uint64_t pe_accum : 32;
+	uint64_t pe_accum : 32; /* W3 */
 	uint64_t c_accum : 32;
-	uint64_t green_pkt_pass : 48;
+	uint64_t green_pkt_pass : 48; /* W4 */
 	uint64_t rsvd_319_304 : 16;
-	uint64_t yellow_pkt_pass : 48;
+	uint64_t yellow_pkt_pass : 48; /* W5 */
 	uint64_t rsvd_383_368 : 16;
-	uint64_t red_pkt_pass : 48;
+	uint64_t red_pkt_pass : 48; /* W6 */
 	uint64_t rsvd_447_432 : 16;
-	uint64_t green_octs_pass : 48;
+	uint64_t green_octs_pass : 48; /* W7 */
 	uint64_t rsvd_511_496 : 16;
-	uint64_t yellow_octs_pass : 48;
+	uint64_t yellow_octs_pass : 48; /* W8 */
 	uint64_t rsvd_575_560 : 16;
-	uint64_t red_octs_pass : 48;
+	uint64_t red_octs_pass : 48; /* W9 */
 	uint64_t rsvd_639_624 : 16;
-	uint64_t green_pkt_drop : 48;
+	uint64_t green_pkt_drop : 48; /* W10 */
 	uint64_t rsvd_703_688 : 16;
-	uint64_t yellow_pkt_drop : 48;
+	uint64_t yellow_pkt_drop : 48; /* W11 */
 	uint64_t rsvd_767_752 : 16;
-	uint64_t red_pkt_drop : 48;
+	uint64_t red_pkt_drop : 48; /* W12 */
 	uint64_t rsvd_831_816 : 16;
-	uint64_t green_octs_drop : 48;
+	uint64_t green_octs_drop : 48; /* W13 */
 	uint64_t rsvd_895_880 : 16;
-	uint64_t yellow_octs_drop : 48;
+	uint64_t yellow_octs_drop : 48; /* W14 */
 	uint64_t rsvd_959_944 : 16;
-	uint64_t red_octs_drop : 48;
+	uint64_t red_octs_drop : 48; /* W15 */
 	uint64_t rsvd_1023_1008 : 16;
 };
 
@@ -1005,11 +1040,55 @@ struct nix_cint_hw_s {
 struct nix_cqe_hdr_s {
 	uint64_t tag : 32;
 	uint64_t q : 20;
-	uint64_t rsvd_57_52 : 6;
+	uint64_t long_send_comp : 1;
+	uint64_t rsvd_57_53 : 5;
 	uint64_t node : 2;
 	uint64_t cqe_type : 4;
 };
 
+/* [CN20K, .) NIX Completion queue context structure */
+struct nix_cn20k_cq_ctx_s {
+	uint64_t base : 64; /* W0 */
+	uint64_t lbp_ena : 1; /* W1 */
+	uint64_t lbpid_low : 3;
+	uint64_t bp_ena : 1;
+	uint64_t lbpid_med : 3;
+	uint64_t bpid : 9;
+	uint64_t lbpid_high : 3;
+	uint64_t qint_idx : 7;
+	uint64_t cq_err : 1;
+	uint64_t cint_idx : 7;
+	uint64_t avg_con : 9;
+	uint64_t wrptr : 20;
+	uint64_t tail : 20; /* W2 */
+	uint64_t head : 20;
+	uint64_t avg_level : 8;
+	uint64_t update_time : 16;
+	uint64_t bp : 8; /* W3 */
+	uint64_t drop : 8;
+	uint64_t drop_ena : 1;
+	uint64_t ena : 1;
+	uint64_t cpt_drop_err_en  : 1;
+	uint64_t reserved_211_211 : 1;
+	uint64_t msh_dst : 11;
+	uint64_t msh_valid : 1;
+	uint64_t stash_thresh : 4;
+	uint64_t lbp_frac : 4;
+	uint64_t caching : 1;
+	uint64_t stashing : 1;
+	uint64_t reserved_234_235 : 2;
+	uint64_t qsize : 4;
+	uint64_t cq_err_int : 8;
+	uint64_t cq_err_int_ena   : 8;
+	uint64_t bpid_ext : 2; /* W4 */
+	uint64_t reserved_258_259 : 2;
+	uint64_t lbpid_ext : 2;
+	uint64_t reserved_262_319 : 58;
+	uint64_t reserved_320_383 : 64; /* W5 */
+	uint64_t reserved_384_447 : 64; /* W6 */
+	uint64_t reserved_448_511 : 64; /* W7 */
+};
+
 /* NIX completion queue context structure */
 struct nix_cq_ctx_s {
 	uint64_t base : 64; /* W0 */
@@ -1083,6 +1162,184 @@ struct nix_qint_hw_s {
 	uint32_t ena : 1;
 };
 
+/* [CN20K, .) NIX receive queue context structure */
+struct nix_cn20k_rq_ctx_hw_s {
+	uint64_t ena : 1; /* W0 */
+	uint64_t sso_ena : 1;
+	uint64_t ipsech_ena : 1;
+	uint64_t ena_wqwd : 1;
+	uint64_t cq : 20;
+	uint64_t rsvd_34_24 : 11;
+	uint64_t port_il4_dis : 1;
+	uint64_t port_ol4_dis : 1;
+	uint64_t lenerr_dis : 1;
+	uint64_t csum_il4_dis : 1;
+	uint64_t csum_ol4_dis : 1;
+	uint64_t len_il4_dis : 1;
+	uint64_t len_il3_dis : 1;
+	uint64_t len_ol4_dis : 1;
+	uint64_t len_ol3_dis : 1;
+	uint64_t wqe_aura : 20;
+	uint64_t spb_aura : 20; /* W1 */
+	uint64_t lpb_aura : 20;
+	uint64_t sso_grp : 10;
+	uint64_t sso_tt : 2;
+	uint64_t pb_caching : 2;
+	uint64_t wqe_caching : 1;
+	uint64_t xqe_drop_ena : 1;
+	uint64_t spb_drop_ena : 1;
+	uint64_t lpb_drop_ena : 1;
+	uint64_t pb_stashing : 1;
+	uint64_t ipsecd_drop_en : 1;
+	uint64_t chi_ena : 1;
+	uint64_t rsvd_127_125 : 3;
+	uint64_t band_prof_id_l : 10; /* W2 */
+	uint64_t sso_drop_ena : 1;
+	uint64_t policer_ena : 1;
+	uint64_t spb_sizem1 : 6;
+	uint64_t wqe_skip : 2;
+	uint64_t spb_high_sizem1 : 3;
+	uint64_t spb_ena : 1;
+	uint64_t lpb_sizem1 : 12;
+	uint64_t first_skip : 7;
+	uint64_t sso_bp_ena : 1;
+	uint64_t later_skip : 6;
+	uint64_t xqe_imm_size : 6;
+	uint64_t band_prof_id_h : 4;
+	uint64_t rsvd_189_188 : 2;
+	uint64_t xqe_imm_copy : 1;
+	uint64_t xqe_hdr_split : 1;
+	uint64_t xqe_drop : 8; /* W3 */
+	uint64_t xqe_pass : 8;
+	uint64_t wqe_pool_drop : 8;
+	uint64_t wqe_pool_pass : 8;
+	uint64_t spb_aura_drop : 8;
+	uint64_t spb_aura_pass : 8;
+	uint64_t spb_pool_drop : 8;
+	uint64_t spb_pool_pass : 8;
+	uint64_t lpb_aura_drop : 8; /* W4 */
+	uint64_t lpb_aura_pass : 8;
+	uint64_t lpb_pool_drop : 8;
+	uint64_t lpb_pool_pass : 8;
+	uint64_t rsvd_319_288 : 32;
+	uint64_t ltag : 24; /* W5 */
+	uint64_t good_utag : 8;
+	uint64_t bad_utag : 8;
+	uint64_t flow_tagw : 6;
+	uint64_t rsvd_366  : 1;
+	uint64_t rsvd_367  : 1;
+	uint64_t rsvd_375_368 : 8;
+	uint64_t rsvd_379_376 : 4;
+	uint64_t rsvd_381_380 : 2;
+	uint64_t rsvd_383_382 : 2;
+	uint64_t octs : 48; /* W6 */
+	uint64_t rsvd_447_432 : 16;
+	uint64_t pkts : 48; /* W7 */
+	uint64_t rsvd_511_496 : 16;
+	uint64_t drop_octs : 48; /* W8 */
+	uint64_t rsvd_575_560 : 16;
+	uint64_t drop_pkts : 48; /* W9 */
+	uint64_t rsvd_639_624 : 16;
+	uint64_t re_pkts : 48; /* W10 */
+	uint64_t rsvd_702_688 : 15;
+	uint64_t ena_copy : 1;
+	uint64_t rsvd_739_704 : 36; /* W11 */
+	uint64_t rq_int : 8;
+	uint64_t rq_int_ena : 8;
+	uint64_t qint_idx : 7;
+	uint64_t rsvd_767_763 : 5;
+	uint64_t rsvd_831_768 : 64;  /* W12 */
+	uint64_t rsvd_895_832 : 64;  /* W13 */
+	uint64_t rsvd_959_896 : 64;  /* W14 */
+	uint64_t rsvd_1023_960 : 64; /* W15 */
+};
+
+/* [CN20K, .) NIX Receive queue context structure */
+struct nix_cn20k_rq_ctx_s {
+	uint64_t ena : 1; /* W0 */
+	uint64_t sso_ena : 1;
+	uint64_t ipsech_ena : 1;
+	uint64_t ena_wqwd : 1;
+	uint64_t cq : 20;
+	uint64_t reserved_24_34 : 11;
+	uint64_t port_il4_dis : 1;
+	uint64_t port_ol4_dis : 1;
+	uint64_t lenerr_dis : 1;
+	uint64_t csum_il4_dis : 1;
+	uint64_t csum_ol4_dis : 1;
+	uint64_t len_il4_dis : 1;
+	uint64_t len_il3_dis : 1;
+	uint64_t len_ol4_dis : 1;
+	uint64_t len_ol3_dis : 1;
+	uint64_t wqe_aura : 20;
+	uint64_t spb_aura : 20; /* W1 */
+	uint64_t lpb_aura : 20;
+	uint64_t sso_grp : 10;
+	uint64_t sso_tt : 2;
+	uint64_t pb_caching : 2;
+	uint64_t wqe_caching : 1;
+	uint64_t xqe_drop_ena : 1;
+	uint64_t spb_drop_ena : 1;
+	uint64_t lpb_drop_ena : 1;
+	uint64_t pb_stashing : 1;
+	uint64_t ipsecd_drop_en : 1;
+	uint64_t chi_ena : 1;
+	uint64_t reserved_125_127 : 3;
+	uint64_t band_prof_id_l : 10; /* W2 */
+	uint64_t sso_fc_ena : 1;
+	uint64_t policer_ena : 1;
+	uint64_t spb_sizem1 : 6;
+	uint64_t wqe_skip : 2;
+	uint64_t spb_high_sizem1 : 3;
+	uint64_t spb_ena : 1;
+	uint64_t lpb_sizem1 : 12;
+	uint64_t first_skip : 7;
+	uint64_t sso_bp_ena : 1;
+	uint64_t later_skip : 6;
+	uint64_t xqe_imm_size : 6;
+	uint64_t band_prof_id_h : 4;
+	uint64_t reserved_188_189 : 2;
+	uint64_t xqe_imm_copy : 1;
+	uint64_t xqe_hdr_split : 1;
+	uint64_t xqe_drop : 8; /* W3 */
+	uint64_t xqe_pass : 8;
+	uint64_t wqe_pool_drop : 8;
+	uint64_t wqe_pool_pass : 8;
+	uint64_t spb_aura_drop : 8;
+	uint64_t spb_aura_pass : 8;
+	uint64_t spb_pool_drop : 8;
+	uint64_t spb_pool_pass : 8;
+	uint64_t lpb_aura_drop : 8; /* W4 */
+	uint64_t lpb_aura_pass : 8;
+	uint64_t lpb_pool_drop : 8;
+	uint64_t lpb_pool_pass : 8;
+	uint64_t reserved_288_291 : 4;
+	uint64_t rq_int : 8;
+	uint64_t rq_int_ena : 8;
+	uint64_t qint_idx : 7;
+	uint64_t reserved_315_319 : 5;
+	uint64_t ltag : 24; /* W5 */
+	uint64_t good_utag : 8;
+	uint64_t bad_utag : 8;
+	uint64_t flow_tagw : 6;
+	uint64_t reserved_366_383 : 18;
+	uint64_t octs : 48; /* W6 */
+	uint64_t reserved_432_447 : 16;
+	uint64_t pkts : 48; /* W7 */
+	uint64_t reserved_496_511 : 16;
+	uint64_t drop_octs : 48; /* W8 */
+	uint64_t reserved_560_575 : 16;
+	uint64_t drop_pkts : 48; /* W9 */
+	uint64_t reserved_624_639 : 16;
+	uint64_t re_pkts : 48; /* W10 */
+	uint64_t reserved_688_703 : 16;
+	uint64_t reserved_704_767 : 64; /* W11 */
+	uint64_t reserved_768_831 : 64; /* W12 */
+	uint64_t reserved_832_895 : 64; /* W13 */
+	uint64_t reserved_896_959 : 64; /* W14 */
+	uint64_t reserved_960_1023 : 64; /* W15 */
+};
+
 /* [CN10K, .) NIX receive queue context structure */
 struct nix_cn10k_rq_ctx_hw_s {
 	uint64_t ena : 1;
@@ -1493,13 +1750,13 @@ union nix_rx_parse_u {
 		uint64_t lhptr : 8;
 		uint64_t vtag0_ptr : 8;
 		uint64_t vtag1_ptr : 8;
-		uint64_t flow_key_alg : 5;
-		uint64_t rsvd_341 : 1;
+		uint64_t flow_key_alg : 6;
 		uint64_t rsvd_349_342 : 8;
 		uint64_t rsvd_353_350 : 4;
 		uint64_t rsvd_359_354 : 6;
 		uint64_t color : 2;
-		uint64_t rsvd_381_362 : 20;
+		uint64_t mcs_mdata    : 14;
+		uint64_t rsvd_381_376 : 6;
 		uint64_t rsvd_382 : 1;
 		uint64_t rsvd_383 : 1;
 		uint64_t rsvd_447_384 : 64; /* W6 */
@@ -1652,7 +1909,9 @@ union nix_send_ext_w1_u {
 		uint64_t vlan0_ins_ena : 1;
 		uint64_t vlan1_ins_ena : 1;
 		uint64_t init_color : 2;
-		uint64_t rsvd_127_116 : 12;
+		uint64_t flow_id       : 7;
+		uint64_t flow_override : 1;
+		uint64_t rsvd_127_124 : 4;
 	};
 	struct {
 		uint64_t vlan0_ins_ptr : 8;
@@ -1675,7 +1934,7 @@ union nix_send_hdr_w0_u {
 	uint64_t u;
 	struct {
 		uint64_t total : 18;
-		uint64_t rsvd_18 : 1;
+		uint64_t cpt_error : 1;
 		uint64_t df : 1;
 		uint64_t aura : 20;
 		uint64_t sizem1 : 3;
@@ -1718,7 +1977,8 @@ struct nix_send_jump_s {
 	uint64_t rsvd_13_7 : 7;
 	uint64_t ld_type : 2;
 	uint64_t aura : 20;
-	uint64_t rsvd_58_36 : 23;
+	uint64_t refcnt_en  : 1;
+	uint64_t rsvd_58_37 : 22;
 	uint64_t f : 1;
 	uint64_t subdc : 4;
 	uint64_t addr : 64; /* W1 */
@@ -1729,7 +1989,10 @@ union nix_send_mem_w0_u {
 	uint64_t u;
 	struct {
 		uint64_t offset : 16;
-		uint64_t rsvd_51_16 : 36;
+		uint64_t base_ns     : 32;
+		uint64_t step_type   : 1;
+		uint64_t rsvd_50_49  : 2;
+		uint64_t coas_en     : 1;
 		uint64_t per_lso_seg : 1;
 		uint64_t wmem : 1;
 		uint64_t dsz : 2;
@@ -1760,7 +2023,8 @@ union nix_send_sg2_s {
 		uint64_t i1 : 1;
 		uint64_t fabs : 1;
 		uint64_t foff : 8;
-		uint64_t rsvd_57_46 : 12;
+		uint64_t refcnt_en1 : 1;
+		uint64_t rsvd_57_47 : 11;
 		uint64_t ld_type : 2;
 		uint64_t subdc : 4;
 	};
@@ -1773,7 +2037,10 @@ union nix_send_sg_s {
 		uint64_t seg2_size : 16;
 		uint64_t seg3_size : 16;
 		uint64_t segs : 2;
-		uint64_t rsvd_54_50 : 5;
+		uint64_t rsvd_51_50 : 2;
+		uint64_t refcnt_en1 : 1;
+		uint64_t refcnt_en2 : 1;
+		uint64_t refcnt_en3 : 1;
 		uint64_t i1 : 1;
 		uint64_t i2 : 1;
 		uint64_t i3 : 1;
@@ -1792,6 +2059,133 @@ struct nix_send_work_s {
 	uint64_t addr : 64; /* W1 */
 };
 
+/* [CN20K, .) NIX sq context hardware structure */
+struct nix_cn20k_sq_ctx_hw_s {
+	uint64_t ena : 1;
+	uint64_t substream : 20;
+	uint64_t max_sqe_size : 2;
+	uint64_t sqe_way_mask : 16;
+	uint64_t sqb_aura : 20;
+	uint64_t gbl_rsvd1 : 5;
+	uint64_t cq_id : 20; /* W1 */
+	uint64_t cq_ena : 1;
+	uint64_t qint_idx : 6;
+	uint64_t gbl_rsvd2 : 1;
+	uint64_t sq_int : 8;
+	uint64_t sq_int_ena : 8;
+	uint64_t xoff : 1;
+	uint64_t sqe_stype : 2;
+	uint64_t gbl_rsvd : 17;
+	uint64_t head_sqb : 64; /* W2 */
+	uint64_t head_offset : 6; /* W3 */
+	uint64_t sqb_dequeue_count : 16;
+	uint64_t default_chan : 12;
+	uint64_t sdp_mcast : 1;
+	uint64_t sso_ena : 1;
+	uint64_t dse_rsvd1 : 28;
+	uint64_t sqb_enqueue_count : 16; /* W4 */
+	uint64_t tail_offset : 6;
+	uint64_t lmt_dis : 1;
+	uint64_t smq_rr_weight : 14;
+	uint64_t dnq_rsvd1 : 27;
+	uint64_t tail_sqb : 64; /* W5 */
+	uint64_t next_sqb : 64; /* W6 */
+	uint64_t smq : 11; /* W7 */
+	uint64_t smq_pend : 1;
+	uint64_t smq_next_sq : 20;
+	uint64_t smq_next_sq_vld : 1;
+	uint64_t mnq_dis : 1;
+	uint64_t scm1_rsvd2 : 30;
+	uint64_t smenq_sqb : 64; /* W8 */
+	uint64_t smenq_offset : 6; /* W9 */
+	uint64_t cq_limit : 8;
+	uint64_t smq_rr_count : 32;
+	uint64_t scm_lso_rem : 18;
+	uint64_t smq_lso_segnum : 8; /* W10 */
+	uint64_t vfi_lso_total : 18;
+	uint64_t vfi_lso_sizem1 : 3;
+	uint64_t vfi_lso_sb : 8;
+	uint64_t vfi_lso_mps : 14;
+	uint64_t vfi_lso_vlan0_ins_ena : 1;
+	uint64_t vfi_lso_vlan1_ins_ena : 1;
+	uint64_t vfi_lso_vld : 1;
+	uint64_t smenq_next_sqb_vld : 1;
+	uint64_t scm_dq_rsvd1 : 9;
+	uint64_t smenq_next_sqb : 64; /* W11 */
+	uint64_t age_drop_octs : 32; /* W12 */
+	uint64_t age_drop_pkts : 32;
+	uint64_t drop_pkts : 48; /* W13 */
+	uint64_t drop_octs_lsw : 16;
+	uint64_t drop_octs_msw : 32; /* W14 */
+	uint64_t pkts_lsw : 32;
+	uint64_t pkts_msw : 16; /* W15 */
+	uint64_t octs : 48;
+};
+
+/* [CN20K, .) NIX Send queue context structure */
+struct nix_cn20k_sq_ctx_s {
+	uint64_t ena : 1; /* W0 */
+	uint64_t qint_idx : 6;
+	uint64_t substream : 20;
+	uint64_t sdp_mcast :  1;
+	uint64_t cq : 20;
+	uint64_t sqe_way_mask : 16;
+	uint64_t smq : 11; /* W1 */
+	uint64_t cq_ena : 1;
+	uint64_t xoff : 1;
+	uint64_t sso_ena : 1;
+	uint64_t smq_rr_weight : 14;
+	uint64_t default_chan : 12;
+	uint64_t sqb_count : 16;
+	uint64_t reserved_120_120 : 1;
+	uint64_t smq_rr_count_lb : 7;
+	uint64_t smq_rr_count_ub : 25; /* W2 */
+	uint64_t sqb_aura : 20;
+	uint64_t sq_int : 8;
+	uint64_t sq_int_ena : 8;
+	uint64_t sqe_stype : 2;
+	uint64_t reserved_191_191 : 1;
+	uint64_t max_sqe_size : 2; /* W3 */
+	uint64_t cq_limit : 8;
+	uint64_t lmt_dis : 1;
+	uint64_t mnq_dis : 1;
+	uint64_t smq_next_sq : 20;
+	uint64_t smq_lso_segnum :  8;
+	uint64_t tail_offset :  6;
+	uint64_t smenq_offset :  6;
+	uint64_t head_offset :  6;
+	uint64_t smenq_next_sqb_vld :  1;
+	uint64_t smq_pend :  1;
+	uint64_t smq_next_sq_vld :  1;
+	uint64_t reserved_253_255 :  3;
+	uint64_t next_sqb : 64; /* W4 */
+	uint64_t tail_sqb : 64; /* W5 */
+	uint64_t smenq_sqb : 64; /* W6 */
+	uint64_t smenq_next_sqb : 64; /* W7 */
+	uint64_t head_sqb : 64; /* W8 */
+	uint64_t reserved_576_583 : 8; /* W9 */
+	uint64_t vfi_lso_total : 18;
+	uint64_t vfi_lso_sizem1 : 3;
+	uint64_t vfi_lso_sb : 8;
+	uint64_t vfi_lso_mps : 14;
+	uint64_t vfi_lso_vlan0_ins_ena : 1;
+	uint64_t vfi_lso_vlan1_ins_ena : 1;
+	uint64_t vfi_lso_vld : 1;
+	uint64_t reserved_630_639 : 10;
+	uint64_t scm_lso_rem : 18; /* W10 */
+	uint64_t reserved_658_703 : 46;
+	uint64_t octs : 48; /* W11 */
+	uint64_t reserved_752_767 : 16;
+	uint64_t pkts : 48; /* W12 */
+	uint64_t reserved_816_831 : 16;
+	uint64_t aged_drop_octs : 32; /* W13 */
+	uint64_t aged_drop_pkts : 32;
+	uint64_t drop_octs : 48; /* W14 */
+	uint64_t reserved_944_959 : 16;
+	uint64_t drop_pkts : 48; /* W15 */
+	uint64_t reserved_1008_1023 : 16;
+};
+
 /* [CN10K, .) NIX sq context hardware structure */
 struct nix_cn10k_sq_ctx_hw_s {
 	uint64_t ena : 1;
@@ -2234,17 +2628,24 @@ struct nix_lso_format {
 #define NIX_CN9K_TM_RR_QUANTUM_MAX (BIT_ULL(24) - 1)
 #define NIX_TM_RR_WEIGHT_MAX	   (BIT_ULL(14) - 1)
 
-/* [CN9K, CN10K) */
-#define NIX_CN9K_TXSCH_LVL_SMQ_MAX 512
-
-/* [CN10K, .) */
-#define NIX_TXSCH_LVL_SMQ_MAX 832
-
 /* [CN9K, .) */
-#define NIX_TXSCH_LVL_TL4_MAX 512
-#define NIX_TXSCH_LVL_TL3_MAX 256
-#define NIX_TXSCH_LVL_TL2_MAX 256
 #define NIX_TXSCH_LVL_TL1_MAX 28
+#define NIX_TXSCH_LVL_TL2_MAX 256
+
+/* CN9K */
+#define NIX_CN9K_TXSCH_LVL_TL3_MAX 256
+#define NIX_CN9K_TXSCH_LVL_TL4_MAX 512
+#define NIX_CN9K_TXSCH_LVL_SMQ_MAX 512
+
+/* CN10K */
+#define NIX_CN10K_TXSCH_LVL_TL3_MAX 256
+#define NIX_CN10K_TXSCH_LVL_TL4_MAX 512
+#define NIX_CN10K_TXSCH_LVL_SMQ_MAX 832
+
+/* [CN20K, .) */
+#define NIX_TXSCH_LVL_TL3_MAX 512
+#define NIX_TXSCH_LVL_TL4_MAX 1280
+#define NIX_TXSCH_LVL_SMQ_MAX 2048
 
 #define NIX_CQ_OP_STAT_OP_ERR 63
 #define NIX_CQ_OP_STAT_CQ_ERR 46
@@ -2265,4 +2666,9 @@ struct nix_lso_format {
 #define NIX_SENDSTAT_IOFFSET_MASK 0xFFF
 #define NIX_SENDSTAT_OOFFSET_MASK 0xFFF
 
+/* The mask is to extract lower 10-bits of channel number
+ * which CPT will pass to X2P.
+ */
+#define NIX_CHAN_CPT_X2P_MASK (0x3ffull)
+
 #endif /* __NIX_HW_H__ */
diff --git a/drivers/common/cnxk/hw/rvu.h b/drivers/common/cnxk/hw/rvu.h
index ee6cf30c5d..ed2ba996e0 100644
--- a/drivers/common/cnxk/hw/rvu.h
+++ b/drivers/common/cnxk/hw/rvu.h
@@ -67,7 +67,9 @@
 #define RVU_PF_VFX_PFVF_MBOXX(a, b)                                            \
 	(0x0ull | (uint64_t)(a) << 12 | (uint64_t)(b) << 3)
 #define RVU_PF_VF_BAR4_ADDR		 (0x10ull)
-#define RVU_PF_BLOCK_ADDRX_DISC(a)	 (0x200ull | (uint64_t)(a) << 3)
+
+#define RVU_PF_DISC			 (0x0ull)  /* [CN20K, .) */
+#define RVU_PF_BLOCK_ADDRX_DISC(a)	 (0x200ull | (uint64_t)(a) << 3)  /* [CN9K, CN20K) */
 #define RVU_PF_VFME_STATUSX(a)		 (0x800ull | (uint64_t)(a) << 3)
 #define RVU_PF_VFTRPENDX(a)		 (0x820ull | (uint64_t)(a) << 3)
 #define RVU_PF_VFTRPEND_W1SX(a)		 (0x840ull | (uint64_t)(a) << 3)
@@ -91,7 +93,8 @@
 #define RVU_PF_MSIX_VECX_ADDR(a)	 (0x80000ull | (uint64_t)(a) << 4)
 #define RVU_PF_MSIX_VECX_CTL(a)		 (0x80008ull | (uint64_t)(a) << 4)
 #define RVU_PF_MSIX_PBAX(a)		 (0xf0000ull | (uint64_t)(a) << 3)
-#define RVU_VF_VFPF_MBOXX(a)		 (0x0ull | (uint64_t)(a) << 3)
+#define RVU_VF_DISC			 (0x0ull)  /* [CN20K, .) */
+#define RVU_VF_VFPF_MBOXX(a)		 (0x0ull | (uint64_t)(a) << 3) /* [CN9K, CN20K) */
 #define RVU_VF_INT			 (0x20ull)
 #define RVU_VF_INT_W1S			 (0x28ull)
 #define RVU_VF_INT_ENA_W1S		 (0x30ull)
diff --git a/drivers/common/cnxk/roc_mbox.h b/drivers/common/cnxk/roc_mbox.h
index 9a9dcbdbda..dd65946e9e 100644
--- a/drivers/common/cnxk/roc_mbox.h
+++ b/drivers/common/cnxk/roc_mbox.h
@@ -309,6 +309,7 @@ struct mbox_msghdr {
 	M(NIX_MCAST_GRP_UPDATE, 0x802d, nix_mcast_grp_update, nix_mcast_grp_update_req,            \
 	  nix_mcast_grp_update_rsp)                                                                \
 	M(NIX_GET_LF_STATS,    0x802e, nix_get_lf_stats, nix_get_lf_stats_req, nix_lf_stats_rsp)   \
+	M(NIX_CN20K_AQ_ENQ, 0x802f, nix_cn20k_aq_enq, nix_cn20k_aq_enq_req, nix_cn20k_aq_enq_rsp)  \
 	/* MCS mbox IDs (range 0xa000 - 0xbFFF) */                                                 \
 	M(MCS_ALLOC_RESOURCES, 0xa000, mcs_alloc_resources, mcs_alloc_rsrc_req,                    \
 	  mcs_alloc_rsrc_rsp)                                                                      \
@@ -1442,6 +1443,57 @@ struct nix_lf_free_req {
 	uint64_t __io flags;
 };
 
+/* CN20x NIX AQ enqueue msg */
+struct nix_cn20k_aq_enq_req {
+	struct mbox_msghdr hdr;
+	uint32_t __io qidx;
+	uint8_t __io ctype;
+	uint8_t __io op;
+	union {
+		/* Valid when op == WRITE/INIT and ctype == NIX_AQ_CTYPE_RQ */
+		__io struct nix_cn20k_rq_ctx_s rq;
+		/* Valid when op == WRITE/INIT and ctype == NIX_AQ_CTYPE_SQ */
+		__io struct nix_cn20k_sq_ctx_s sq;
+		/* Valid when op == WRITE/INIT and ctype == NIX_AQ_CTYPE_CQ */
+		__io struct nix_cn20k_cq_ctx_s cq;
+		/* Valid when op == WRITE/INIT and ctype == NIX_AQ_CTYPE_RSS */
+		__io struct nix_rsse_s rss;
+		/* Valid when op == WRITE/INIT and ctype == NIX_AQ_CTYPE_MCE */
+		__io struct nix_rx_mce_s mce;
+		/* Valid when op == WRITE/INIT and
+		 * ctype == NIX_AQ_CTYPE_BAND_PROF
+		 */
+		__io struct nix_band_prof_s prof;
+	};
+	/* Mask data when op == WRITE (1=write, 0=don't write) */
+	union {
+		/* Valid when op == WRITE and ctype == NIX_AQ_CTYPE_RQ */
+		__io struct nix_cn20k_rq_ctx_s rq_mask;
+		/* Valid when op == WRITE and ctype == NIX_AQ_CTYPE_SQ */
+		__io struct nix_cn20k_sq_ctx_s sq_mask;
+		/* Valid when op == WRITE and ctype == NIX_AQ_CTYPE_CQ */
+		__io struct nix_cn20k_cq_ctx_s cq_mask;
+		/* Valid when op == WRITE and ctype == NIX_AQ_CTYPE_RSS */
+		__io struct nix_rsse_s rss_mask;
+		/* Valid when op == WRITE and ctype == NIX_AQ_CTYPE_MCE */
+		__io struct nix_rx_mce_s mce_mask;
+		/* Valid when op == WRITE and ctype == NIX_AQ_CTYPE_BAND_PROF */
+		__io struct nix_band_prof_s prof_mask;
+	};
+};
+
+struct nix_cn20k_aq_enq_rsp {
+	struct mbox_msghdr hdr;
+	union {
+		__io struct nix_cn20k_rq_ctx_s rq;
+		__io struct nix_cn20k_sq_ctx_s sq;
+		__io struct nix_cn20k_cq_ctx_s cq;
+		__io struct nix_rsse_s rss;
+		__io struct nix_rx_mce_s mce;
+		__io struct nix_band_prof_s prof;
+	};
+};
+
 /* CN10x NIX AQ enqueue msg */
 struct nix_cn10k_aq_enq_req {
 	struct mbox_msghdr hdr;
diff --git a/drivers/common/cnxk/roc_nix.c b/drivers/common/cnxk/roc_nix.c
index 041621dfaa..e4d7e11121 100644
--- a/drivers/common/cnxk/roc_nix.c
+++ b/drivers/common/cnxk/roc_nix.c
@@ -398,15 +398,22 @@ sdp_lbk_id_update(struct plt_pci_device *pci_dev, struct nix *nix)
 uint64_t
 nix_get_blkaddr(struct dev *dev)
 {
+	uint64_t blkaddr;
 	uint64_t reg;
 
 	/* Reading the discovery register to know which NIX is the LF
 	 * attached to.
 	 */
-	reg = plt_read64(dev->bar2 +
-			 RVU_PF_BLOCK_ADDRX_DISC(RVU_BLOCK_ADDR_NIX0));
-
-	return reg & 0x1FFULL ? RVU_BLOCK_ADDR_NIX0 : RVU_BLOCK_ADDR_NIX1;
+	if (roc_model_is_cn9k() || roc_model_is_cn10k()) {
+		reg = plt_read64(dev->bar2 + RVU_PF_BLOCK_ADDRX_DISC(RVU_BLOCK_ADDR_NIX0));
+		blkaddr = reg & 0x1FFULL ? RVU_BLOCK_ADDR_NIX0 : RVU_BLOCK_ADDR_NIX1;
+	} else {
+		reg = plt_read64(dev->bar2 + RVU_PF_DISC);
+		blkaddr = reg & BIT_ULL(RVU_BLOCK_ADDR_NIX0) ? RVU_BLOCK_ADDR_NIX0 :
+			RVU_BLOCK_ADDR_NIX1;
+		blkaddr = RVU_BLOCK_ADDR_NIX0;
+	}
+	return blkaddr;
 }
 
 int
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v3 06/18] common/cnxk: support NIX queue config for cn20k
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
                     ` (4 preceding siblings ...)
  2024-10-01 12:40   ` [PATCH v3 05/18] common/cnxk: add cn20k NIX register definitions Nithin Dabilpuram
@ 2024-10-01 12:40   ` Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 07/18] common/cnxk: support bandwidth profile " Nithin Dabilpuram
                     ` (12 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-10-01 12:40 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

From: Satha Rao <skoteshwar@marvell.com>

Add support to setup NIX RQ, SQ, CQ for cn20k.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
---
 drivers/common/cnxk/roc_nix_fc.c     |  52 ++-
 drivers/common/cnxk/roc_nix_inl.c    |   2 +
 drivers/common/cnxk/roc_nix_priv.h   |   1 +
 drivers/common/cnxk/roc_nix_queue.c  | 532 ++++++++++++++++++++++++++-
 drivers/common/cnxk/roc_nix_stats.c  |  55 ++-
 drivers/common/cnxk/roc_nix_tm.c     |  22 +-
 drivers/common/cnxk/roc_nix_tm_ops.c |  14 +-
 7 files changed, 650 insertions(+), 28 deletions(-)

diff --git a/drivers/common/cnxk/roc_nix_fc.c b/drivers/common/cnxk/roc_nix_fc.c
index 2f72e67993..0676363c58 100644
--- a/drivers/common/cnxk/roc_nix_fc.c
+++ b/drivers/common/cnxk/roc_nix_fc.c
@@ -127,7 +127,7 @@ nix_fc_cq_config_get(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 		aq->qidx = fc_cfg->cq_cfg.rq;
 		aq->ctype = NIX_AQ_CTYPE_CQ;
 		aq->op = NIX_AQ_INSTOP_READ;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -136,6 +136,18 @@ nix_fc_cq_config_get(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 			goto exit;
 		}
 
+		aq->qidx = fc_cfg->cq_cfg.rq;
+		aq->ctype = NIX_AQ_CTYPE_CQ;
+		aq->op = NIX_AQ_INSTOP_READ;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
 		aq->qidx = fc_cfg->cq_cfg.rq;
 		aq->ctype = NIX_AQ_CTYPE_CQ;
 		aq->op = NIX_AQ_INSTOP_READ;
@@ -179,7 +191,7 @@ nix_fc_rq_config_get(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 		aq->qidx = fc_cfg->rq_cfg.rq;
 		aq->ctype = NIX_AQ_CTYPE_RQ;
 		aq->op = NIX_AQ_INSTOP_READ;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -188,6 +200,18 @@ nix_fc_rq_config_get(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 			goto exit;
 		}
 
+		aq->qidx = fc_cfg->rq_cfg.rq;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_READ;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
 		aq->qidx = fc_cfg->rq_cfg.rq;
 		aq->ctype = NIX_AQ_CTYPE_RQ;
 		aq->op = NIX_AQ_INSTOP_READ;
@@ -270,7 +294,7 @@ nix_fc_cq_config_set(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 
 		aq->cq.bp_ena = !!(fc_cfg->cq_cfg.enable);
 		aq->cq_mask.bp_ena = ~(aq->cq_mask.bp_ena);
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -290,6 +314,28 @@ nix_fc_cq_config_set(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg)
 			aq->cq_mask.bp = ~(aq->cq_mask.bp);
 		}
 
+		aq->cq.bp_ena = !!(fc_cfg->cq_cfg.enable);
+		aq->cq_mask.bp_ena = ~(aq->cq_mask.bp_ena);
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = fc_cfg->cq_cfg.rq;
+		aq->ctype = NIX_AQ_CTYPE_CQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+
+		if (fc_cfg->cq_cfg.enable) {
+			aq->cq.bpid = nix->bpid[fc_cfg->cq_cfg.tc];
+			aq->cq_mask.bpid = ~(aq->cq_mask.bpid);
+			aq->cq.bp = fc_cfg->cq_cfg.cq_drop;
+			aq->cq_mask.bp = ~(aq->cq_mask.bp);
+		}
+
 		aq->cq.bp_ena = !!(fc_cfg->cq_cfg.enable);
 		aq->cq_mask.bp_ena = ~(aq->cq_mask.bp_ena);
 	}
diff --git a/drivers/common/cnxk/roc_nix_inl.c b/drivers/common/cnxk/roc_nix_inl.c
index a984ac56d9..a759052973 100644
--- a/drivers/common/cnxk/roc_nix_inl.c
+++ b/drivers/common/cnxk/roc_nix_inl.c
@@ -1385,6 +1385,8 @@ roc_nix_inl_dev_rq_get(struct roc_nix_rq *rq, bool enable)
 	mbox = mbox_get(dev->mbox);
 	if (roc_model_is_cn9k())
 		rc = nix_rq_cn9k_cfg(dev, inl_rq, inl_dev->qints, false, enable);
+	else if (roc_model_is_cn10k())
+		rc = nix_rq_cn10k_cfg(dev, inl_rq, inl_dev->qints, false, enable);
 	else
 		rc = nix_rq_cfg(dev, inl_rq, inl_dev->qints, false, enable);
 	if (rc) {
diff --git a/drivers/common/cnxk/roc_nix_priv.h b/drivers/common/cnxk/roc_nix_priv.h
index 275ffc8ea3..ade42c1878 100644
--- a/drivers/common/cnxk/roc_nix_priv.h
+++ b/drivers/common/cnxk/roc_nix_priv.h
@@ -409,6 +409,7 @@ int nix_tm_sq_sched_conf(struct nix *nix, struct nix_tm_node *node,
 
 int nix_rq_cn9k_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints,
 		    bool cfg, bool ena);
+int nix_rq_cn10k_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cfg, bool ena);
 int nix_rq_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cfg,
 	       bool ena);
 int nix_rq_ena_dis(struct dev *dev, struct roc_nix_rq *rq, bool enable);
diff --git a/drivers/common/cnxk/roc_nix_queue.c b/drivers/common/cnxk/roc_nix_queue.c
index f5441e0e6b..bb1b70424f 100644
--- a/drivers/common/cnxk/roc_nix_queue.c
+++ b/drivers/common/cnxk/roc_nix_queue.c
@@ -69,7 +69,7 @@ nix_rq_ena_dis(struct dev *dev, struct roc_nix_rq *rq, bool enable)
 
 		aq->rq.ena = enable;
 		aq->rq_mask.ena = ~(aq->rq_mask.ena);
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -82,6 +82,21 @@ nix_rq_ena_dis(struct dev *dev, struct roc_nix_rq *rq, bool enable)
 		aq->ctype = NIX_AQ_CTYPE_RQ;
 		aq->op = NIX_AQ_INSTOP_WRITE;
 
+		aq->rq.ena = enable;
+		aq->rq_mask.ena = ~(aq->rq_mask.ena);
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = rq->qid;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+
 		aq->rq.ena = enable;
 		aq->rq_mask.ena = ~(aq->rq_mask.ena);
 	}
@@ -150,7 +165,7 @@ roc_nix_rq_is_sso_enable(struct roc_nix *roc_nix, uint32_t qid)
 			goto exit;
 
 		sso_enable = rsp->rq.sso_ena;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_rsp *rsp;
 		struct nix_cn10k_aq_enq_req *aq;
 
@@ -164,6 +179,25 @@ roc_nix_rq_is_sso_enable(struct roc_nix *roc_nix, uint32_t qid)
 		aq->ctype = NIX_AQ_CTYPE_RQ;
 		aq->op = NIX_AQ_INSTOP_READ;
 
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc)
+			goto exit;
+
+		sso_enable = rsp->rq.sso_ena;
+	} else {
+		struct nix_cn20k_aq_enq_rsp *rsp;
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = qid;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_READ;
+
 		rc = mbox_process_msg(mbox, (void *)&rsp);
 		if (rc)
 			goto exit;
@@ -222,7 +256,7 @@ nix_rq_aura_buf_type_update(struct roc_nix_rq *rq, bool set)
 		if (rsp->rq.spb_ena)
 			spb_aura = roc_npa_aura_handle_gen(rsp->rq.spb_aura, aura_base);
 		mbox_put(mbox);
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_rsp *rsp;
 		struct nix_cn10k_aq_enq_req *aq;
 
@@ -249,6 +283,32 @@ nix_rq_aura_buf_type_update(struct roc_nix_rq *rq, bool set)
 		if (rsp->rq.vwqe_ena)
 			vwqe_aura = roc_npa_aura_handle_gen(rsp->rq.wqe_aura, aura_base);
 
+		mbox_put(mbox);
+	} else {
+		struct nix_cn20k_aq_enq_rsp *rsp;
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox_get(mbox));
+		if (!aq) {
+			mbox_put(mbox);
+			return -ENOSPC;
+		}
+
+		aq->qidx = rq->qid;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_READ;
+
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc) {
+			mbox_put(mbox);
+			return rc;
+		}
+
+		/* Get aura handle from aura */
+		lpb_aura = roc_npa_aura_handle_gen(rsp->rq.lpb_aura, aura_base);
+		if (rsp->rq.spb_ena)
+			spb_aura = roc_npa_aura_handle_gen(rsp->rq.spb_aura, aura_base);
+
 		mbox_put(mbox);
 	}
 
@@ -443,8 +503,7 @@ nix_rq_cn9k_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints,
 }
 
 int
-nix_rq_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cfg,
-	   bool ena)
+nix_rq_cn10k_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cfg, bool ena)
 {
 	struct nix_cn10k_aq_enq_req *aq;
 	struct mbox *mbox = dev->mbox;
@@ -667,6 +726,171 @@ nix_rq_cman_cfg(struct dev *dev, struct roc_nix_rq *rq)
 	return rc;
 }
 
+int
+nix_rq_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cfg, bool ena)
+{
+	struct nix_cn20k_aq_enq_req *aq;
+	struct mbox *mbox = dev->mbox;
+
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+	if (!aq)
+		return -ENOSPC;
+
+	aq->qidx = rq->qid;
+	aq->ctype = NIX_AQ_CTYPE_RQ;
+	aq->op = cfg ? NIX_AQ_INSTOP_WRITE : NIX_AQ_INSTOP_INIT;
+
+	if (rq->sso_ena) {
+		/* SSO mode */
+		aq->rq.sso_ena = 1;
+		aq->rq.sso_tt = rq->tt;
+		aq->rq.sso_grp = rq->hwgrp;
+		aq->rq.ena_wqwd = 1;
+		aq->rq.wqe_skip = rq->wqe_skip;
+		aq->rq.wqe_caching = 1;
+
+		aq->rq.good_utag = rq->tag_mask >> 24;
+		aq->rq.bad_utag = rq->tag_mask >> 24;
+		aq->rq.ltag = rq->tag_mask & BITMASK_ULL(24, 0);
+
+		if (rq->vwqe_ena)
+			aq->rq.wqe_aura = roc_npa_aura_handle_to_aura(rq->vwqe_aura_handle);
+	} else {
+		/* CQ mode */
+		aq->rq.sso_ena = 0;
+		aq->rq.good_utag = rq->tag_mask >> 24;
+		aq->rq.bad_utag = rq->tag_mask >> 24;
+		aq->rq.ltag = rq->tag_mask & BITMASK_ULL(24, 0);
+		aq->rq.cq = rq->cqid;
+	}
+
+	if (rq->ipsech_ena) {
+		aq->rq.ipsech_ena = 1;
+		aq->rq.ipsecd_drop_en = 1;
+		aq->rq.ena_wqwd = 1;
+		aq->rq.wqe_skip = rq->wqe_skip;
+		aq->rq.wqe_caching = 1;
+	}
+
+	aq->rq.lpb_aura = roc_npa_aura_handle_to_aura(rq->aura_handle);
+
+	/* Sizes must be aligned to 8 bytes */
+	if (rq->first_skip & 0x7 || rq->later_skip & 0x7 || rq->lpb_size & 0x7)
+		return -EINVAL;
+
+	/* Expressed in number of dwords */
+	aq->rq.first_skip = rq->first_skip / 8;
+	aq->rq.later_skip = rq->later_skip / 8;
+	aq->rq.flow_tagw = rq->flow_tag_width; /* 32-bits */
+	aq->rq.lpb_sizem1 = rq->lpb_size / 8;
+	aq->rq.lpb_sizem1 -= 1; /* Expressed in size minus one */
+	aq->rq.ena = ena;
+
+	if (rq->spb_ena) {
+		uint32_t spb_sizem1;
+
+		aq->rq.spb_ena = 1;
+		aq->rq.spb_aura =
+			roc_npa_aura_handle_to_aura(rq->spb_aura_handle);
+
+		if (rq->spb_size & 0x7 ||
+		    rq->spb_size > NIX_RQ_CN10K_SPB_MAX_SIZE)
+			return -EINVAL;
+
+		spb_sizem1 = rq->spb_size / 8; /* Expressed in no. of dwords */
+		spb_sizem1 -= 1;	       /* Expressed in size minus one */
+		aq->rq.spb_sizem1 = spb_sizem1 & 0x3F;
+		aq->rq.spb_high_sizem1 = (spb_sizem1 >> 6) & 0x7;
+	} else {
+		aq->rq.spb_ena = 0;
+	}
+
+	aq->rq.pb_caching = 0x2; /* First cache aligned block to LLC */
+	aq->rq.xqe_imm_size = 0; /* No pkt data copy to CQE */
+	aq->rq.rq_int_ena = 0;
+	/* Many to one reduction */
+	aq->rq.qint_idx = rq->qid % qints;
+	aq->rq.xqe_drop_ena = 0;
+	aq->rq.lpb_drop_ena = rq->lpb_drop_ena;
+	aq->rq.spb_drop_ena = rq->spb_drop_ena;
+
+	/* If RED enabled, then fill enable for all cases */
+	if (rq->red_pass && (rq->red_pass >= rq->red_drop)) {
+		aq->rq.spb_pool_pass = rq->spb_red_pass;
+		aq->rq.lpb_pool_pass = rq->red_pass;
+		aq->rq.wqe_pool_pass = rq->red_pass;
+		aq->rq.xqe_pass = rq->red_pass;
+
+		aq->rq.spb_pool_drop = rq->spb_red_drop;
+		aq->rq.lpb_pool_drop = rq->red_drop;
+		aq->rq.wqe_pool_drop = rq->red_drop;
+		aq->rq.xqe_drop = rq->red_drop;
+	}
+
+	if (cfg) {
+		if (rq->sso_ena) {
+			/* SSO mode */
+			aq->rq_mask.sso_ena = ~aq->rq_mask.sso_ena;
+			aq->rq_mask.sso_tt = ~aq->rq_mask.sso_tt;
+			aq->rq_mask.sso_grp = ~aq->rq_mask.sso_grp;
+			aq->rq_mask.ena_wqwd = ~aq->rq_mask.ena_wqwd;
+			aq->rq_mask.wqe_skip = ~aq->rq_mask.wqe_skip;
+			aq->rq_mask.wqe_caching = ~aq->rq_mask.wqe_caching;
+			aq->rq_mask.good_utag = ~aq->rq_mask.good_utag;
+			aq->rq_mask.bad_utag = ~aq->rq_mask.bad_utag;
+			aq->rq_mask.ltag = ~aq->rq_mask.ltag;
+			if (rq->vwqe_ena)
+				aq->rq_mask.wqe_aura = ~aq->rq_mask.wqe_aura;
+		} else {
+			/* CQ mode */
+			aq->rq_mask.sso_ena = ~aq->rq_mask.sso_ena;
+			aq->rq_mask.good_utag = ~aq->rq_mask.good_utag;
+			aq->rq_mask.bad_utag = ~aq->rq_mask.bad_utag;
+			aq->rq_mask.ltag = ~aq->rq_mask.ltag;
+			aq->rq_mask.cq = ~aq->rq_mask.cq;
+		}
+
+		if (rq->ipsech_ena)
+			aq->rq_mask.ipsech_ena = ~aq->rq_mask.ipsech_ena;
+
+		if (rq->spb_ena) {
+			aq->rq_mask.spb_aura = ~aq->rq_mask.spb_aura;
+			aq->rq_mask.spb_sizem1 = ~aq->rq_mask.spb_sizem1;
+			aq->rq_mask.spb_high_sizem1 =
+				~aq->rq_mask.spb_high_sizem1;
+		}
+
+		aq->rq_mask.spb_ena = ~aq->rq_mask.spb_ena;
+		aq->rq_mask.lpb_aura = ~aq->rq_mask.lpb_aura;
+		aq->rq_mask.first_skip = ~aq->rq_mask.first_skip;
+		aq->rq_mask.later_skip = ~aq->rq_mask.later_skip;
+		aq->rq_mask.flow_tagw = ~aq->rq_mask.flow_tagw;
+		aq->rq_mask.lpb_sizem1 = ~aq->rq_mask.lpb_sizem1;
+		aq->rq_mask.ena = ~aq->rq_mask.ena;
+		aq->rq_mask.pb_caching = ~aq->rq_mask.pb_caching;
+		aq->rq_mask.xqe_imm_size = ~aq->rq_mask.xqe_imm_size;
+		aq->rq_mask.rq_int_ena = ~aq->rq_mask.rq_int_ena;
+		aq->rq_mask.qint_idx = ~aq->rq_mask.qint_idx;
+		aq->rq_mask.xqe_drop_ena = ~aq->rq_mask.xqe_drop_ena;
+		aq->rq_mask.lpb_drop_ena = ~aq->rq_mask.lpb_drop_ena;
+		aq->rq_mask.spb_drop_ena = ~aq->rq_mask.spb_drop_ena;
+
+		if (rq->red_pass && (rq->red_pass >= rq->red_drop)) {
+			aq->rq_mask.spb_pool_pass = ~aq->rq_mask.spb_pool_pass;
+			aq->rq_mask.lpb_pool_pass = ~aq->rq_mask.lpb_pool_pass;
+			aq->rq_mask.wqe_pool_pass = ~aq->rq_mask.wqe_pool_pass;
+			aq->rq_mask.xqe_pass = ~aq->rq_mask.xqe_pass;
+
+			aq->rq_mask.spb_pool_drop = ~aq->rq_mask.spb_pool_drop;
+			aq->rq_mask.lpb_pool_drop = ~aq->rq_mask.lpb_pool_drop;
+			aq->rq_mask.wqe_pool_drop = ~aq->rq_mask.wqe_pool_drop;
+			aq->rq_mask.xqe_drop = ~aq->rq_mask.xqe_drop;
+		}
+	}
+
+	return 0;
+}
+
 int
 roc_nix_rq_init(struct roc_nix *roc_nix, struct roc_nix_rq *rq, bool ena)
 {
@@ -691,6 +915,8 @@ roc_nix_rq_init(struct roc_nix *roc_nix, struct roc_nix_rq *rq, bool ena)
 
 	if (is_cn9k)
 		rc = nix_rq_cn9k_cfg(dev, rq, nix->qints, false, ena);
+	else if (roc_model_is_cn10k())
+		rc = nix_rq_cn10k_cfg(dev, rq, nix->qints, false, ena);
 	else
 		rc = nix_rq_cfg(dev, rq, nix->qints, false, ena);
 
@@ -745,6 +971,8 @@ roc_nix_rq_modify(struct roc_nix *roc_nix, struct roc_nix_rq *rq, bool ena)
 	mbox = mbox_get(m_box);
 	if (is_cn9k)
 		rc = nix_rq_cn9k_cfg(dev, rq, nix->qints, true, ena);
+	else if (roc_model_is_cn10k())
+		rc = nix_rq_cn10k_cfg(dev, rq, nix->qints, true, ena);
 	else
 		rc = nix_rq_cfg(dev, rq, nix->qints, true, ena);
 
@@ -817,12 +1045,121 @@ roc_nix_rq_fini(struct roc_nix_rq *rq)
 	return 0;
 }
 
+static inline int
+roc_nix_cn20k_cq_init(struct roc_nix *roc_nix, struct roc_nix_cq *cq)
+{
+	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	struct mbox *mbox = (&nix->dev)->mbox;
+	volatile struct nix_cn20k_cq_ctx_s *cq_ctx;
+	uint16_t drop_thresh = NIX_CQ_THRESH_LEVEL;
+	uint16_t cpt_lbpid = nix->cpt_lbpid;
+	struct nix_cn20k_aq_enq_req *aq;
+	enum nix_q_size qsize;
+	size_t desc_sz;
+	int rc;
+
+	if (cq == NULL)
+		return NIX_ERR_PARAM;
+
+	qsize = nix_qsize_clampup(cq->nb_desc);
+	cq->nb_desc = nix_qsize_to_val(qsize);
+	cq->qmask = cq->nb_desc - 1;
+	cq->door = nix->base + NIX_LF_CQ_OP_DOOR;
+	cq->status = (int64_t *)(nix->base + NIX_LF_CQ_OP_STATUS);
+	cq->wdata = (uint64_t)cq->qid << 32;
+	cq->roc_nix = roc_nix;
+
+	/* CQE of W16 */
+	desc_sz = cq->nb_desc * NIX_CQ_ENTRY_SZ;
+	cq->desc_base = plt_zmalloc(desc_sz, NIX_CQ_ALIGN);
+	if (cq->desc_base == NULL) {
+		rc = NIX_ERR_NO_MEM;
+		goto fail;
+	}
+
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox_get(mbox));
+	if (!aq) {
+		mbox_put(mbox);
+		return -ENOSPC;
+	}
+
+	aq->qidx = cq->qid;
+	aq->ctype = NIX_AQ_CTYPE_CQ;
+	aq->op = NIX_AQ_INSTOP_INIT;
+	cq_ctx = &aq->cq;
+
+	cq_ctx->ena = 1;
+	cq_ctx->caching = 1;
+	cq_ctx->qsize = qsize;
+	cq_ctx->base = (uint64_t)cq->desc_base;
+	cq_ctx->avg_level = 0xff;
+	cq_ctx->cq_err_int_ena = BIT(NIX_CQERRINT_CQE_FAULT);
+	cq_ctx->cq_err_int_ena |= BIT(NIX_CQERRINT_DOOR_ERR);
+	if (roc_feature_nix_has_late_bp() && roc_nix_inl_inb_is_enabled(roc_nix)) {
+		cq_ctx->cq_err_int_ena |= BIT(NIX_CQERRINT_CPT_DROP);
+		cq_ctx->cpt_drop_err_en = 1;
+		/* Enable Late BP only when non zero CPT BPID */
+		if (cpt_lbpid) {
+			cq_ctx->lbp_ena = 1;
+			cq_ctx->lbpid_low = cpt_lbpid & 0x7;
+			cq_ctx->lbpid_med = (cpt_lbpid >> 3) & 0x7;
+			cq_ctx->lbpid_high = (cpt_lbpid >> 6) & 0x7;
+			cq_ctx->lbp_frac = NIX_CQ_LPB_THRESH_FRAC;
+		}
+		drop_thresh = NIX_CQ_SEC_THRESH_LEVEL;
+	}
+
+	/* Many to one reduction */
+	cq_ctx->qint_idx = cq->qid % nix->qints;
+	/* Map CQ0 [RQ0] to CINT0 and so on till max 64 irqs */
+	cq_ctx->cint_idx = cq->qid;
+
+	if (roc_errata_nix_has_cq_min_size_4k()) {
+		const float rx_cq_skid = NIX_CQ_FULL_ERRATA_SKID;
+		uint16_t min_rx_drop;
+
+		min_rx_drop = ceil(rx_cq_skid / (float)cq->nb_desc);
+		cq_ctx->drop = min_rx_drop;
+		cq_ctx->drop_ena = 1;
+		cq->drop_thresh = min_rx_drop;
+	} else {
+		cq->drop_thresh = drop_thresh;
+		/* Drop processing or red drop cannot be enabled due to
+		 * due to packets coming for second pass from CPT.
+		 */
+		if (!roc_nix_inl_inb_is_enabled(roc_nix)) {
+			cq_ctx->drop = cq->drop_thresh;
+			cq_ctx->drop_ena = 1;
+		}
+	}
+	cq_ctx->bp = cq->drop_thresh;
+
+	if (roc_feature_nix_has_cqe_stash()) {
+		if (cq_ctx->caching) {
+			cq_ctx->stashing = 1;
+			cq_ctx->stash_thresh = cq->stash_thresh;
+		}
+	}
+
+	rc = mbox_process(mbox);
+	mbox_put(mbox);
+	if (rc)
+		goto free_mem;
+
+	return nix_tel_node_add_cq(cq);
+
+free_mem:
+	plt_free(cq->desc_base);
+fail:
+	return rc;
+}
+
 int
 roc_nix_cq_init(struct roc_nix *roc_nix, struct roc_nix_cq *cq)
 {
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
 	struct mbox *mbox = (&nix->dev)->mbox;
-	volatile struct nix_cq_ctx_s *cq_ctx;
+	volatile struct nix_cq_ctx_s *cq_ctx = NULL;
 	uint16_t drop_thresh = NIX_CQ_THRESH_LEVEL;
 	uint16_t cpt_lbpid = nix->cpt_lbpid;
 	enum nix_q_size qsize;
@@ -832,6 +1169,9 @@ roc_nix_cq_init(struct roc_nix *roc_nix, struct roc_nix_cq *cq)
 	if (cq == NULL)
 		return NIX_ERR_PARAM;
 
+	if (roc_model_is_cn20k())
+		return roc_nix_cn20k_cq_init(roc_nix, cq);
+
 	qsize = nix_qsize_clampup(cq->nb_desc);
 	cq->nb_desc = nix_qsize_to_val(qsize);
 	cq->qmask = cq->nb_desc - 1;
@@ -861,7 +1201,7 @@ roc_nix_cq_init(struct roc_nix *roc_nix, struct roc_nix_cq *cq)
 		aq->ctype = NIX_AQ_CTYPE_CQ;
 		aq->op = NIX_AQ_INSTOP_INIT;
 		cq_ctx = &aq->cq;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox_get(mbox));
@@ -972,7 +1312,7 @@ roc_nix_cq_fini(struct roc_nix_cq *cq)
 		aq->cq.bp_ena = 0;
 		aq->cq_mask.ena = ~aq->cq_mask.ena;
 		aq->cq_mask.bp_ena = ~aq->cq_mask.bp_ena;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -981,6 +1321,26 @@ roc_nix_cq_fini(struct roc_nix_cq *cq)
 			return -ENOSPC;
 		}
 
+		aq->qidx = cq->qid;
+		aq->ctype = NIX_AQ_CTYPE_CQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		aq->cq.ena = 0;
+		aq->cq.bp_ena = 0;
+		aq->cq_mask.ena = ~aq->cq_mask.ena;
+		aq->cq_mask.bp_ena = ~aq->cq_mask.bp_ena;
+		if (roc_feature_nix_has_late_bp() && roc_nix_inl_inb_is_enabled(cq->roc_nix)) {
+			aq->cq.lbp_ena = 0;
+			aq->cq_mask.lbp_ena = ~aq->cq_mask.lbp_ena;
+		}
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			mbox_put(mbox);
+			return -ENOSPC;
+		}
+
 		aq->qidx = cq->qid;
 		aq->ctype = NIX_AQ_CTYPE_CQ;
 		aq->op = NIX_AQ_INSTOP_WRITE;
@@ -1227,14 +1587,152 @@ sq_cn9k_fini(struct nix *nix, struct roc_nix_sq *sq)
 	return 0;
 }
 
+static int
+sq_cn10k_init(struct nix *nix, struct roc_nix_sq *sq, uint32_t rr_quantum, uint16_t smq)
+{
+	struct roc_nix *roc_nix = nix_priv_to_roc_nix(nix);
+	struct mbox *mbox = (&nix->dev)->mbox;
+	struct nix_cn10k_aq_enq_req *aq;
+
+	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	if (!aq)
+		return -ENOSPC;
+
+	aq->qidx = sq->qid;
+	aq->ctype = NIX_AQ_CTYPE_SQ;
+	aq->op = NIX_AQ_INSTOP_INIT;
+	aq->sq.max_sqe_size = sq->max_sqe_sz;
+
+	aq->sq.max_sqe_size = sq->max_sqe_sz;
+	aq->sq.smq = smq;
+	aq->sq.smq_rr_weight = rr_quantum;
+	if (roc_nix_is_sdp(roc_nix))
+		aq->sq.default_chan = nix->tx_chan_base + (sq->qid % nix->tx_chan_cnt);
+	else
+		aq->sq.default_chan = nix->tx_chan_base;
+	aq->sq.sqe_stype = NIX_STYPE_STF;
+	aq->sq.ena = 1;
+	aq->sq.sso_ena = !!sq->sso_ena;
+	aq->sq.cq_ena = !!sq->cq_ena;
+	aq->sq.cq = sq->cqid;
+	aq->sq.cq_limit = sq->cq_drop_thresh;
+	if (aq->sq.max_sqe_size == NIX_MAXSQESZ_W8)
+		aq->sq.sqe_stype = NIX_STYPE_STP;
+	aq->sq.sqb_aura = roc_npa_aura_handle_to_aura(sq->aura_handle);
+	aq->sq.sq_int_ena = BIT(NIX_SQINT_LMT_ERR);
+	aq->sq.sq_int_ena |= BIT(NIX_SQINT_SQB_ALLOC_FAIL);
+	aq->sq.sq_int_ena |= BIT(NIX_SQINT_SEND_ERR);
+	aq->sq.sq_int_ena |= BIT(NIX_SQINT_MNQ_ERR);
+
+	/* Many to one reduction */
+	aq->sq.qint_idx = sq->qid % nix->qints;
+	if (roc_errata_nix_assign_incorrect_qint()) {
+		/* Assigning QINT 0 to all the SQs, an errata exists where NIXTX can
+		 * send incorrect QINT_IDX when reporting queue interrupt (QINT). This
+		 * might result in software missing the interrupt.
+		 */
+		aq->sq.qint_idx = 0;
+	}
+	return 0;
+}
+
+static int
+sq_cn10k_fini(struct nix *nix, struct roc_nix_sq *sq)
+{
+	struct mbox *mbox = mbox_get((&nix->dev)->mbox);
+	struct nix_cn10k_aq_enq_rsp *rsp;
+	struct nix_cn10k_aq_enq_req *aq;
+	uint16_t sqes_per_sqb;
+	void *sqb_buf;
+	int rc, count;
+
+	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	if (!aq) {
+		mbox_put(mbox);
+		return -ENOSPC;
+	}
+
+	aq->qidx = sq->qid;
+	aq->ctype = NIX_AQ_CTYPE_SQ;
+	aq->op = NIX_AQ_INSTOP_READ;
+	rc = mbox_process_msg(mbox, (void *)&rsp);
+	if (rc) {
+		mbox_put(mbox);
+		return rc;
+	}
+
+	/* Check if sq is already cleaned up */
+	if (!rsp->sq.ena) {
+		mbox_put(mbox);
+		return 0;
+	}
+
+	/* Disable sq */
+	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	if (!aq) {
+		mbox_put(mbox);
+		return -ENOSPC;
+	}
+
+	aq->qidx = sq->qid;
+	aq->ctype = NIX_AQ_CTYPE_SQ;
+	aq->op = NIX_AQ_INSTOP_WRITE;
+	aq->sq_mask.ena = ~aq->sq_mask.ena;
+	aq->sq.ena = 0;
+	rc = mbox_process(mbox);
+	if (rc) {
+		mbox_put(mbox);
+		return rc;
+	}
+
+	/* Read SQ and free sqb's */
+	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	if (!aq) {
+		mbox_put(mbox);
+		return -ENOSPC;
+	}
+
+	aq->qidx = sq->qid;
+	aq->ctype = NIX_AQ_CTYPE_SQ;
+	aq->op = NIX_AQ_INSTOP_READ;
+	rc = mbox_process_msg(mbox, (void *)&rsp);
+	if (rc) {
+		mbox_put(mbox);
+		return rc;
+	}
+
+	if (aq->sq.smq_pend)
+		plt_err("SQ has pending SQE's");
+
+	count = aq->sq.sqb_count;
+	sqes_per_sqb = 1 << sq->sqes_per_sqb_log2;
+	/* Free SQB's that are used */
+	sqb_buf = (void *)rsp->sq.head_sqb;
+	while (count) {
+		void *next_sqb;
+
+		next_sqb = *(void **)((uint64_t *)sqb_buf +
+				      (uint32_t)((sqes_per_sqb - 1) * (0x2 >> sq->max_sqe_sz) * 8));
+		roc_npa_aura_op_free(sq->aura_handle, 1, (uint64_t)sqb_buf);
+		sqb_buf = next_sqb;
+		count--;
+	}
+
+	/* Free next to use sqb */
+	if (rsp->sq.next_sqb)
+		roc_npa_aura_op_free(sq->aura_handle, 1, rsp->sq.next_sqb);
+	mbox_put(mbox);
+	return 0;
+}
+
 static int
 sq_init(struct nix *nix, struct roc_nix_sq *sq, uint32_t rr_quantum, uint16_t smq)
 {
 	struct roc_nix *roc_nix = nix_priv_to_roc_nix(nix);
 	struct mbox *mbox = (&nix->dev)->mbox;
-	struct nix_cn10k_aq_enq_req *aq;
+	struct nix_cn20k_aq_enq_req *aq;
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
 	if (!aq)
 		return -ENOSPC;
 
@@ -1280,13 +1778,13 @@ static int
 sq_fini(struct nix *nix, struct roc_nix_sq *sq)
 {
 	struct mbox *mbox = mbox_get((&nix->dev)->mbox);
-	struct nix_cn10k_aq_enq_rsp *rsp;
-	struct nix_cn10k_aq_enq_req *aq;
+	struct nix_cn20k_aq_enq_rsp *rsp;
+	struct nix_cn20k_aq_enq_req *aq;
 	uint16_t sqes_per_sqb;
 	void *sqb_buf;
 	int rc, count;
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
 	if (!aq) {
 		mbox_put(mbox);
 		return -ENOSPC;
@@ -1308,7 +1806,7 @@ sq_fini(struct nix *nix, struct roc_nix_sq *sq)
 	}
 
 	/* Disable sq */
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
 	if (!aq) {
 		mbox_put(mbox);
 		return -ENOSPC;
@@ -1326,7 +1824,7 @@ sq_fini(struct nix *nix, struct roc_nix_sq *sq)
 	}
 
 	/* Read SQ and free sqb's */
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
 	if (!aq) {
 		mbox_put(mbox);
 		return -ENOSPC;
@@ -1408,6 +1906,8 @@ roc_nix_sq_init(struct roc_nix *roc_nix, struct roc_nix_sq *sq)
 	/* Init SQ context */
 	if (roc_model_is_cn9k())
 		rc = sq_cn9k_init(nix, sq, rr_quantum, smq);
+	else if (roc_model_is_cn10k())
+		rc = sq_cn10k_init(nix, sq, rr_quantum, smq);
 	else
 		rc = sq_init(nix, sq, rr_quantum, smq);
 
@@ -1464,6 +1964,8 @@ roc_nix_sq_fini(struct roc_nix_sq *sq)
 	/* Release SQ context */
 	if (roc_model_is_cn9k())
 		rc |= sq_cn9k_fini(roc_nix_to_nix_priv(sq->roc_nix), sq);
+	else if (roc_model_is_cn10k())
+		rc |= sq_cn10k_fini(roc_nix_to_nix_priv(sq->roc_nix), sq);
 	else
 		rc |= sq_fini(roc_nix_to_nix_priv(sq->roc_nix), sq);
 
diff --git a/drivers/common/cnxk/roc_nix_stats.c b/drivers/common/cnxk/roc_nix_stats.c
index 7a9619b39d..6f241c72de 100644
--- a/drivers/common/cnxk/roc_nix_stats.c
+++ b/drivers/common/cnxk/roc_nix_stats.c
@@ -173,7 +173,7 @@ nix_stat_rx_queue_reset(struct nix *nix, uint16_t qid)
 		aq->rq_mask.drop_octs = ~(aq->rq_mask.drop_octs);
 		aq->rq_mask.drop_pkts = ~(aq->rq_mask.drop_pkts);
 		aq->rq_mask.re_pkts = ~(aq->rq_mask.re_pkts);
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -192,6 +192,30 @@ nix_stat_rx_queue_reset(struct nix *nix, uint16_t qid)
 		aq->rq.drop_pkts = 0;
 		aq->rq.re_pkts = 0;
 
+		aq->rq_mask.octs = ~(aq->rq_mask.octs);
+		aq->rq_mask.pkts = ~(aq->rq_mask.pkts);
+		aq->rq_mask.drop_octs = ~(aq->rq_mask.drop_octs);
+		aq->rq_mask.drop_pkts = ~(aq->rq_mask.drop_pkts);
+		aq->rq_mask.re_pkts = ~(aq->rq_mask.re_pkts);
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = qid;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+
+		aq->rq.octs = 0;
+		aq->rq.pkts = 0;
+		aq->rq.drop_octs = 0;
+		aq->rq.drop_pkts = 0;
+		aq->rq.re_pkts = 0;
+
 		aq->rq_mask.octs = ~(aq->rq_mask.octs);
 		aq->rq_mask.pkts = ~(aq->rq_mask.pkts);
 		aq->rq_mask.drop_octs = ~(aq->rq_mask.drop_octs);
@@ -233,7 +257,7 @@ nix_stat_tx_queue_reset(struct nix *nix, uint16_t qid)
 		aq->sq_mask.pkts = ~(aq->sq_mask.pkts);
 		aq->sq_mask.drop_octs = ~(aq->sq_mask.drop_octs);
 		aq->sq_mask.drop_pkts = ~(aq->sq_mask.drop_pkts);
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -250,6 +274,29 @@ nix_stat_tx_queue_reset(struct nix *nix, uint16_t qid)
 		aq->sq.drop_octs = 0;
 		aq->sq.drop_pkts = 0;
 
+		aq->sq_mask.octs = ~(aq->sq_mask.octs);
+		aq->sq_mask.pkts = ~(aq->sq_mask.pkts);
+		aq->sq_mask.drop_octs = ~(aq->sq_mask.drop_octs);
+		aq->sq_mask.drop_pkts = ~(aq->sq_mask.drop_pkts);
+		aq->sq_mask.aged_drop_octs = ~(aq->sq_mask.aged_drop_octs);
+		aq->sq_mask.aged_drop_pkts = ~(aq->sq_mask.aged_drop_pkts);
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = qid;
+		aq->ctype = NIX_AQ_CTYPE_SQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		aq->sq.octs = 0;
+		aq->sq.pkts = 0;
+		aq->sq.drop_octs = 0;
+		aq->sq.drop_pkts = 0;
+
 		aq->sq_mask.octs = ~(aq->sq_mask.octs);
 		aq->sq_mask.pkts = ~(aq->sq_mask.pkts);
 		aq->sq_mask.drop_octs = ~(aq->sq_mask.drop_octs);
@@ -375,7 +422,7 @@ roc_nix_xstats_get(struct roc_nix *roc_nix, struct roc_nix_xstat *xstats,
 	xstats[count].id = count;
 	count++;
 
-	if (roc_model_is_cn10k()) {
+	if (roc_model_is_cn10k() || roc_model_is_cn20k()) {
 		for (i = 0; i < CNXK_NIX_NUM_CN10K_RX_XSTATS; i++) {
 			xstats[count].value =
 				NIX_RX_STATS(nix_cn10k_rx_xstats[i].offset);
@@ -492,7 +539,7 @@ roc_nix_xstats_names_get(struct roc_nix *roc_nix,
 		count++;
 	}
 
-	if (roc_model_is_cn10k()) {
+	if (roc_model_is_cn10k() || roc_model_is_cn20k()) {
 		for (i = 0; i < CNXK_NIX_NUM_CN10K_RX_XSTATS; i++) {
 			NIX_XSTATS_NAME_PRINT(xstats_names, count,
 					      nix_cn10k_rx_xstats, i);
diff --git a/drivers/common/cnxk/roc_nix_tm.c b/drivers/common/cnxk/roc_nix_tm.c
index ac522f8235..5725ef568a 100644
--- a/drivers/common/cnxk/roc_nix_tm.c
+++ b/drivers/common/cnxk/roc_nix_tm.c
@@ -1058,7 +1058,7 @@ nix_tm_sq_sched_conf(struct nix *nix, struct nix_tm_node *node,
 		}
 		aq->sq.smq_rr_quantum = rr_quantum;
 		aq->sq_mask.smq_rr_quantum = ~aq->sq_mask.smq_rr_quantum;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_req *aq;
 
 		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
@@ -1071,6 +1071,26 @@ nix_tm_sq_sched_conf(struct nix *nix, struct nix_tm_node *node,
 		aq->ctype = NIX_AQ_CTYPE_SQ;
 		aq->op = NIX_AQ_INSTOP_WRITE;
 
+		/* smq update only when needed */
+		if (!rr_quantum_only) {
+			aq->sq.smq = smq;
+			aq->sq_mask.smq = ~aq->sq_mask.smq;
+		}
+		aq->sq.smq_rr_weight = rr_quantum;
+		aq->sq_mask.smq_rr_weight = ~aq->sq_mask.smq_rr_weight;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = qid;
+		aq->ctype = NIX_AQ_CTYPE_SQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+
 		/* smq update only when needed */
 		if (!rr_quantum_only) {
 			aq->sq.smq = smq;
diff --git a/drivers/common/cnxk/roc_nix_tm_ops.c b/drivers/common/cnxk/roc_nix_tm_ops.c
index 8144675f89..a9cfadd1b0 100644
--- a/drivers/common/cnxk/roc_nix_tm_ops.c
+++ b/drivers/common/cnxk/roc_nix_tm_ops.c
@@ -1294,15 +1294,19 @@ roc_nix_tm_rsrc_max(bool pf, uint16_t schq[ROC_TM_LVL_MAX])
 
 		switch (hw_lvl) {
 		case NIX_TXSCH_LVL_SMQ:
-			max = (roc_model_is_cn9k() ?
-					     NIX_CN9K_TXSCH_LVL_SMQ_MAX :
-					     NIX_TXSCH_LVL_SMQ_MAX);
+			max = (roc_model_is_cn9k() ? NIX_CN9K_TXSCH_LVL_SMQ_MAX :
+				(roc_model_is_cn10k() ? NIX_CN10K_TXSCH_LVL_SMQ_MAX :
+				 NIX_TXSCH_LVL_SMQ_MAX));
 			break;
 		case NIX_TXSCH_LVL_TL4:
-			max = NIX_TXSCH_LVL_TL4_MAX;
+			max = (roc_model_is_cn9k() ? NIX_CN9K_TXSCH_LVL_TL4_MAX :
+				(roc_model_is_cn10k() ? NIX_CN10K_TXSCH_LVL_TL4_MAX :
+							NIX_TXSCH_LVL_TL4_MAX));
 			break;
 		case NIX_TXSCH_LVL_TL3:
-			max = NIX_TXSCH_LVL_TL3_MAX;
+			max = (roc_model_is_cn9k() ? NIX_CN9K_TXSCH_LVL_TL3_MAX :
+				(roc_model_is_cn10k() ? NIX_CN10K_TXSCH_LVL_TL3_MAX :
+							NIX_TXSCH_LVL_TL3_MAX));
 			break;
 		case NIX_TXSCH_LVL_TL2:
 			max = pf ? NIX_TXSCH_LVL_TL2_MAX : 1;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v3 07/18] common/cnxk: support bandwidth profile for cn20k
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
                     ` (5 preceding siblings ...)
  2024-10-01 12:40   ` [PATCH v3 06/18] common/cnxk: support NIX queue config for cn20k Nithin Dabilpuram
@ 2024-10-01 12:40   ` Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 08/18] common/cnxk: support NIX debug " Nithin Dabilpuram
                     ` (11 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-10-01 12:40 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

From: Satha Rao <skoteshwar@marvell.com>

Add support to setup bandwidth profile config for cn20k
for Rx policier.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
---
 drivers/common/cnxk/roc_nix_bpf.c   | 528 ++++++++++++++++++----------
 drivers/common/cnxk/roc_nix_queue.c | 136 ++++---
 2 files changed, 425 insertions(+), 239 deletions(-)

diff --git a/drivers/common/cnxk/roc_nix_bpf.c b/drivers/common/cnxk/roc_nix_bpf.c
index d60396289b..98c9855a5b 100644
--- a/drivers/common/cnxk/roc_nix_bpf.c
+++ b/drivers/common/cnxk/roc_nix_bpf.c
@@ -547,9 +547,9 @@ roc_nix_bpf_config(struct roc_nix *roc_nix, uint16_t id,
 {
 	uint64_t exponent_p = 0, mantissa_p = 0, div_exp_p = 0;
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	volatile struct nix_band_prof_s *prof, *prof_mask;
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = dev->mbox;
-	struct nix_cn10k_aq_enq_req *aq;
 	uint32_t policer_timeunit;
 	uint8_t level_idx;
 	int rc;
@@ -568,103 +568,122 @@ roc_nix_bpf_config(struct roc_nix *roc_nix, uint16_t id,
 	if (level_idx == ROC_NIX_BPF_LEVEL_IDX_INVALID)
 		return NIX_ERR_PARAM;
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox_get(mbox));
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox_get(mbox));
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | id;
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		prof = &aq->prof;
+		prof_mask = &aq->prof_mask;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox_get(mbox));
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | id;
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		prof = &aq->prof;
+		prof_mask = &aq->prof_mask;
 	}
-	aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | id;
-	aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
-	aq->op = NIX_AQ_INSTOP_WRITE;
 
-	aq->prof.adjust_exponent = NIX_BPF_DEFAULT_ADJUST_EXPONENT;
-	aq->prof.adjust_mantissa = NIX_BPF_DEFAULT_ADJUST_MANTISSA;
+	prof->adjust_exponent = NIX_BPF_DEFAULT_ADJUST_EXPONENT;
+	prof->adjust_mantissa = NIX_BPF_DEFAULT_ADJUST_MANTISSA;
 	if (cfg->lmode == ROC_NIX_BPF_LMODE_BYTE)
-		aq->prof.adjust_mantissa = NIX_BPF_DEFAULT_ADJUST_MANTISSA / 2;
+		prof->adjust_mantissa = NIX_BPF_DEFAULT_ADJUST_MANTISSA / 2;
 
-	aq->prof_mask.adjust_exponent = ~(aq->prof_mask.adjust_exponent);
-	aq->prof_mask.adjust_mantissa = ~(aq->prof_mask.adjust_mantissa);
+	prof_mask->adjust_exponent = ~(prof_mask->adjust_exponent);
+	prof_mask->adjust_mantissa = ~(prof_mask->adjust_mantissa);
 
 	switch (cfg->alg) {
 	case ROC_NIX_BPF_ALGO_2697:
 		meter_rate_to_nix(cfg->algo2697.cir, &exponent_p, &mantissa_p,
 				  &div_exp_p, policer_timeunit);
-		aq->prof.cir_mantissa = mantissa_p;
-		aq->prof.cir_exponent = exponent_p;
+		prof->cir_mantissa = mantissa_p;
+		prof->cir_exponent = exponent_p;
 
 		meter_burst_to_nix(cfg->algo2697.cbs, &exponent_p, &mantissa_p);
-		aq->prof.cbs_mantissa = mantissa_p;
-		aq->prof.cbs_exponent = exponent_p;
+		prof->cbs_mantissa = mantissa_p;
+		prof->cbs_exponent = exponent_p;
 
 		meter_burst_to_nix(cfg->algo2697.ebs, &exponent_p, &mantissa_p);
-		aq->prof.pebs_mantissa = mantissa_p;
-		aq->prof.pebs_exponent = exponent_p;
+		prof->pebs_mantissa = mantissa_p;
+		prof->pebs_exponent = exponent_p;
 
-		aq->prof_mask.cir_mantissa = ~(aq->prof_mask.cir_mantissa);
-		aq->prof_mask.cbs_mantissa = ~(aq->prof_mask.cbs_mantissa);
-		aq->prof_mask.pebs_mantissa = ~(aq->prof_mask.pebs_mantissa);
-		aq->prof_mask.cir_exponent = ~(aq->prof_mask.cir_exponent);
-		aq->prof_mask.cbs_exponent = ~(aq->prof_mask.cbs_exponent);
-		aq->prof_mask.pebs_exponent = ~(aq->prof_mask.pebs_exponent);
+		prof_mask->cir_mantissa = ~(prof_mask->cir_mantissa);
+		prof_mask->cbs_mantissa = ~(prof_mask->cbs_mantissa);
+		prof_mask->pebs_mantissa = ~(prof_mask->pebs_mantissa);
+		prof_mask->cir_exponent = ~(prof_mask->cir_exponent);
+		prof_mask->cbs_exponent = ~(prof_mask->cbs_exponent);
+		prof_mask->pebs_exponent = ~(prof_mask->pebs_exponent);
 		break;
 
 	case ROC_NIX_BPF_ALGO_2698:
 		meter_rate_to_nix(cfg->algo2698.cir, &exponent_p, &mantissa_p,
 				  &div_exp_p, policer_timeunit);
-		aq->prof.cir_mantissa = mantissa_p;
-		aq->prof.cir_exponent = exponent_p;
+		prof->cir_mantissa = mantissa_p;
+		prof->cir_exponent = exponent_p;
 
 		meter_rate_to_nix(cfg->algo2698.pir, &exponent_p, &mantissa_p,
 				  &div_exp_p, policer_timeunit);
-		aq->prof.peir_mantissa = mantissa_p;
-		aq->prof.peir_exponent = exponent_p;
+		prof->peir_mantissa = mantissa_p;
+		prof->peir_exponent = exponent_p;
 
 		meter_burst_to_nix(cfg->algo2698.cbs, &exponent_p, &mantissa_p);
-		aq->prof.cbs_mantissa = mantissa_p;
-		aq->prof.cbs_exponent = exponent_p;
+		prof->cbs_mantissa = mantissa_p;
+		prof->cbs_exponent = exponent_p;
 
 		meter_burst_to_nix(cfg->algo2698.pbs, &exponent_p, &mantissa_p);
-		aq->prof.pebs_mantissa = mantissa_p;
-		aq->prof.pebs_exponent = exponent_p;
+		prof->pebs_mantissa = mantissa_p;
+		prof->pebs_exponent = exponent_p;
 
-		aq->prof_mask.cir_mantissa = ~(aq->prof_mask.cir_mantissa);
-		aq->prof_mask.peir_mantissa = ~(aq->prof_mask.peir_mantissa);
-		aq->prof_mask.cbs_mantissa = ~(aq->prof_mask.cbs_mantissa);
-		aq->prof_mask.pebs_mantissa = ~(aq->prof_mask.pebs_mantissa);
-		aq->prof_mask.cir_exponent = ~(aq->prof_mask.cir_exponent);
-		aq->prof_mask.peir_exponent = ~(aq->prof_mask.peir_exponent);
-		aq->prof_mask.cbs_exponent = ~(aq->prof_mask.cbs_exponent);
-		aq->prof_mask.pebs_exponent = ~(aq->prof_mask.pebs_exponent);
+		prof_mask->cir_mantissa = ~(prof_mask->cir_mantissa);
+		prof_mask->peir_mantissa = ~(prof_mask->peir_mantissa);
+		prof_mask->cbs_mantissa = ~(prof_mask->cbs_mantissa);
+		prof_mask->pebs_mantissa = ~(prof_mask->pebs_mantissa);
+		prof_mask->cir_exponent = ~(prof_mask->cir_exponent);
+		prof_mask->peir_exponent = ~(prof_mask->peir_exponent);
+		prof_mask->cbs_exponent = ~(prof_mask->cbs_exponent);
+		prof_mask->pebs_exponent = ~(prof_mask->pebs_exponent);
 		break;
 
 	case ROC_NIX_BPF_ALGO_4115:
 		meter_rate_to_nix(cfg->algo4115.cir, &exponent_p, &mantissa_p,
 				  &div_exp_p, policer_timeunit);
-		aq->prof.cir_mantissa = mantissa_p;
-		aq->prof.cir_exponent = exponent_p;
+		prof->cir_mantissa = mantissa_p;
+		prof->cir_exponent = exponent_p;
 
 		meter_rate_to_nix(cfg->algo4115.eir, &exponent_p, &mantissa_p,
 				  &div_exp_p, policer_timeunit);
-		aq->prof.peir_mantissa = mantissa_p;
-		aq->prof.peir_exponent = exponent_p;
+		prof->peir_mantissa = mantissa_p;
+		prof->peir_exponent = exponent_p;
 
 		meter_burst_to_nix(cfg->algo4115.cbs, &exponent_p, &mantissa_p);
-		aq->prof.cbs_mantissa = mantissa_p;
-		aq->prof.cbs_exponent = exponent_p;
+		prof->cbs_mantissa = mantissa_p;
+		prof->cbs_exponent = exponent_p;
 
 		meter_burst_to_nix(cfg->algo4115.ebs, &exponent_p, &mantissa_p);
-		aq->prof.pebs_mantissa = mantissa_p;
-		aq->prof.pebs_exponent = exponent_p;
+		prof->pebs_mantissa = mantissa_p;
+		prof->pebs_exponent = exponent_p;
 
-		aq->prof_mask.cir_mantissa = ~(aq->prof_mask.cir_mantissa);
-		aq->prof_mask.peir_mantissa = ~(aq->prof_mask.peir_mantissa);
-		aq->prof_mask.cbs_mantissa = ~(aq->prof_mask.cbs_mantissa);
-		aq->prof_mask.pebs_mantissa = ~(aq->prof_mask.pebs_mantissa);
+		prof_mask->cir_mantissa = ~(prof_mask->cir_mantissa);
+		prof_mask->peir_mantissa = ~(prof_mask->peir_mantissa);
+		prof_mask->cbs_mantissa = ~(prof_mask->cbs_mantissa);
+		prof_mask->pebs_mantissa = ~(prof_mask->pebs_mantissa);
 
-		aq->prof_mask.cir_exponent = ~(aq->prof_mask.cir_exponent);
-		aq->prof_mask.peir_exponent = ~(aq->prof_mask.peir_exponent);
-		aq->prof_mask.cbs_exponent = ~(aq->prof_mask.cbs_exponent);
-		aq->prof_mask.pebs_exponent = ~(aq->prof_mask.pebs_exponent);
+		prof_mask->cir_exponent = ~(prof_mask->cir_exponent);
+		prof_mask->peir_exponent = ~(prof_mask->peir_exponent);
+		prof_mask->cbs_exponent = ~(prof_mask->cbs_exponent);
+		prof_mask->pebs_exponent = ~(prof_mask->pebs_exponent);
 		break;
 
 	default:
@@ -672,23 +691,23 @@ roc_nix_bpf_config(struct roc_nix *roc_nix, uint16_t id,
 		goto exit;
 	}
 
-	aq->prof.lmode = cfg->lmode;
-	aq->prof.icolor = cfg->icolor;
-	aq->prof.meter_algo = cfg->alg;
-	aq->prof.pc_mode = cfg->pc_mode;
-	aq->prof.tnl_ena = cfg->tnl_ena;
-	aq->prof.gc_action = cfg->action[ROC_NIX_BPF_COLOR_GREEN];
-	aq->prof.yc_action = cfg->action[ROC_NIX_BPF_COLOR_YELLOW];
-	aq->prof.rc_action = cfg->action[ROC_NIX_BPF_COLOR_RED];
+	prof->lmode = cfg->lmode;
+	prof->icolor = cfg->icolor;
+	prof->meter_algo = cfg->alg;
+	prof->pc_mode = cfg->pc_mode;
+	prof->tnl_ena = cfg->tnl_ena;
+	prof->gc_action = cfg->action[ROC_NIX_BPF_COLOR_GREEN];
+	prof->yc_action = cfg->action[ROC_NIX_BPF_COLOR_YELLOW];
+	prof->rc_action = cfg->action[ROC_NIX_BPF_COLOR_RED];
 
-	aq->prof_mask.lmode = ~(aq->prof_mask.lmode);
-	aq->prof_mask.icolor = ~(aq->prof_mask.icolor);
-	aq->prof_mask.meter_algo = ~(aq->prof_mask.meter_algo);
-	aq->prof_mask.pc_mode = ~(aq->prof_mask.pc_mode);
-	aq->prof_mask.tnl_ena = ~(aq->prof_mask.tnl_ena);
-	aq->prof_mask.gc_action = ~(aq->prof_mask.gc_action);
-	aq->prof_mask.yc_action = ~(aq->prof_mask.yc_action);
-	aq->prof_mask.rc_action = ~(aq->prof_mask.rc_action);
+	prof_mask->lmode = ~(prof_mask->lmode);
+	prof_mask->icolor = ~(prof_mask->icolor);
+	prof_mask->meter_algo = ~(prof_mask->meter_algo);
+	prof_mask->pc_mode = ~(prof_mask->pc_mode);
+	prof_mask->tnl_ena = ~(prof_mask->tnl_ena);
+	prof_mask->gc_action = ~(prof_mask->gc_action);
+	prof_mask->yc_action = ~(prof_mask->yc_action);
+	prof_mask->rc_action = ~(prof_mask->rc_action);
 
 	rc = mbox_process(mbox);
 exit:
@@ -703,7 +722,6 @@ roc_nix_bpf_ena_dis(struct roc_nix *roc_nix, uint16_t id, struct roc_nix_rq *rq,
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = mbox_get(dev->mbox);
-	struct nix_cn10k_aq_enq_req *aq;
 	int rc;
 
 	if (roc_model_is_cn9k()) {
@@ -716,25 +734,53 @@ roc_nix_bpf_ena_dis(struct roc_nix *roc_nix, uint16_t id, struct roc_nix_rq *rq,
 		goto exit;
 	}
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
-	}
-	aq->qidx = rq->qid;
-	aq->ctype = NIX_AQ_CTYPE_RQ;
-	aq->op = NIX_AQ_INSTOP_WRITE;
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_req *aq;
 
-	aq->rq.policer_ena = enable;
-	aq->rq_mask.policer_ena = ~(aq->rq_mask.policer_ena);
-	if (enable) {
-		aq->rq.band_prof_id = id;
-		aq->rq_mask.band_prof_id = ~(aq->rq_mask.band_prof_id);
-	}
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = rq->qid;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+
+		aq->rq.policer_ena = enable;
+		aq->rq_mask.policer_ena = ~(aq->rq_mask.policer_ena);
+		if (enable) {
+			aq->rq.band_prof_id = id;
+			aq->rq_mask.band_prof_id = ~(aq->rq_mask.band_prof_id);
+		}
+
+		rc = mbox_process(mbox);
+		if (rc)
+			goto exit;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
 
-	rc = mbox_process(mbox);
-	if (rc)
-		goto exit;
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = rq->qid;
+		aq->ctype = NIX_AQ_CTYPE_RQ;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+
+		aq->rq.policer_ena = enable;
+		aq->rq_mask.policer_ena = ~(aq->rq_mask.policer_ena);
+		if (enable) {
+			aq->rq.band_prof_id_l = id & 0x3FF;
+			aq->rq.band_prof_id_h = (id >> 10) & 0xF;
+			aq->rq_mask.band_prof_id_l = ~(aq->rq_mask.band_prof_id_l);
+			aq->rq_mask.band_prof_id_h = ~(aq->rq_mask.band_prof_id_h);
+		}
+
+		rc = mbox_process(mbox);
+		if (rc)
+			goto exit;
+	}
 
 	rq->bpf_id = id;
 
@@ -750,8 +796,7 @@ roc_nix_bpf_dump(struct roc_nix *roc_nix, uint16_t id,
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = mbox_get(dev->mbox);
-	struct nix_cn10k_aq_enq_rsp *rsp;
-	struct nix_cn10k_aq_enq_req *aq;
+	volatile struct nix_band_prof_s *prof;
 	uint8_t level_idx;
 	int rc;
 
@@ -765,19 +810,42 @@ roc_nix_bpf_dump(struct roc_nix *roc_nix, uint16_t id,
 		rc = NIX_ERR_PARAM;
 		goto exit;
 	}
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_rsp *rsp;
+		struct nix_cn10k_aq_enq_req *aq;
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_READ;
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc)
+			goto exit;
+		prof = &rsp->prof;
+	} else {
+		struct nix_cn20k_aq_enq_rsp *rsp;
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_READ;
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc)
+			goto exit;
+		prof = &rsp->prof;
 	}
-	aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
-	aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
-	aq->op = NIX_AQ_INSTOP_READ;
-	rc = mbox_process_msg(mbox, (void *)&rsp);
 	if (!rc) {
 		plt_dump("============= band prof id =%d ===============", id);
-		nix_lf_bpf_dump(&rsp->prof);
+		nix_lf_bpf_dump(prof);
 	}
 exit:
 	mbox_put(mbox);
@@ -792,7 +860,6 @@ roc_nix_bpf_pre_color_tbl_setup(struct roc_nix *roc_nix, uint16_t id,
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = dev->mbox;
-	struct nix_cn10k_aq_enq_req *aq;
 	uint8_t pc_mode, tn_ena;
 	uint8_t level_idx;
 	int rc;
@@ -856,21 +923,43 @@ roc_nix_bpf_pre_color_tbl_setup(struct roc_nix *roc_nix, uint16_t id,
 		goto exit;
 	}
 
-	/* Update corresponding bandwidth profile too */
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox_get(mbox));
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_req *aq;
+
+		/* Update corresponding bandwidth profile too */
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox_get(mbox));
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | id;
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		aq->prof.pc_mode = pc_mode;
+		aq->prof.tnl_ena = tn_ena;
+		aq->prof_mask.pc_mode = ~(aq->prof_mask.pc_mode);
+		aq->prof_mask.tnl_ena = ~(aq->prof_mask.tnl_ena);
+
+		rc = mbox_process(mbox);
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		/* Update corresponding bandwidth profile too */
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox_get(mbox));
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | id;
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		aq->prof.pc_mode = pc_mode;
+		aq->prof.tnl_ena = tn_ena;
+		aq->prof_mask.pc_mode = ~(aq->prof_mask.pc_mode);
+		aq->prof_mask.tnl_ena = ~(aq->prof_mask.tnl_ena);
+
+		rc = mbox_process(mbox);
 	}
-	aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | id;
-	aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
-	aq->op = NIX_AQ_INSTOP_WRITE;
-	aq->prof.pc_mode = pc_mode;
-	aq->prof.tnl_ena = tn_ena;
-	aq->prof_mask.pc_mode = ~(aq->prof_mask.pc_mode);
-	aq->prof_mask.tnl_ena = ~(aq->prof_mask.tnl_ena);
-
-	rc = mbox_process(mbox);
 
 exit:
 	mbox_put(mbox);
@@ -883,9 +972,9 @@ roc_nix_bpf_connect(struct roc_nix *roc_nix,
 		    uint16_t dst_id)
 {
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	volatile struct nix_band_prof_s *prof, *prof_mask;
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = mbox_get(dev->mbox);
-	struct nix_cn10k_aq_enq_req *aq;
 	uint8_t level_idx;
 	int rc;
 
@@ -900,23 +989,42 @@ roc_nix_bpf_connect(struct roc_nix *roc_nix,
 		goto exit;
 	}
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | src_id;
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		prof = &aq->prof;
+		prof_mask = &aq->prof_mask;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | src_id;
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		prof = &aq->prof;
+		prof_mask = &aq->prof_mask;
 	}
-	aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14) | src_id;
-	aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
-	aq->op = NIX_AQ_INSTOP_WRITE;
 
 	if (dst_id == ROC_NIX_BPF_ID_INVALID) {
-		aq->prof.hl_en = false;
-		aq->prof_mask.hl_en = ~(aq->prof_mask.hl_en);
+		prof->hl_en = false;
+		prof_mask->hl_en = ~(prof_mask->hl_en);
 	} else {
-		aq->prof.hl_en = true;
-		aq->prof.band_prof_id = dst_id;
-		aq->prof_mask.hl_en = ~(aq->prof_mask.hl_en);
-		aq->prof_mask.band_prof_id = ~(aq->prof_mask.band_prof_id);
+		prof->hl_en = true;
+		prof->band_prof_id = dst_id;
+		prof_mask->hl_en = ~(prof_mask->hl_en);
+		prof_mask->band_prof_id = ~(prof_mask->band_prof_id);
 	}
 
 	rc = mbox_process(mbox);
@@ -937,8 +1045,7 @@ roc_nix_bpf_stats_read(struct roc_nix *roc_nix, uint16_t id, uint64_t mask,
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = mbox_get(dev->mbox);
-	struct nix_cn10k_aq_enq_rsp *rsp;
-	struct nix_cn10k_aq_enq_req *aq;
+	volatile struct nix_band_prof_s *prof;
 	uint8_t level_idx;
 	int rc;
 
@@ -953,17 +1060,39 @@ roc_nix_bpf_stats_read(struct roc_nix *roc_nix, uint16_t id, uint64_t mask,
 		goto exit;
 	}
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_rsp *rsp;
+		struct nix_cn10k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_READ;
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc)
+			goto exit;
+		prof = &rsp->prof;
+	} else {
+		struct nix_cn20k_aq_enq_rsp *rsp;
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_READ;
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc)
+			goto exit;
+		prof = &rsp->prof;
 	}
-	aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
-	aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
-	aq->op = NIX_AQ_INSTOP_READ;
-	rc = mbox_process_msg(mbox, (void *)&rsp);
-	if (rc)
-		goto exit;
 
 	green_pkt_pass =
 		roc_nix_bpf_stats_to_idx(mask & ROC_NIX_BPF_GREEN_PKT_F_PASS);
@@ -991,40 +1120,40 @@ roc_nix_bpf_stats_read(struct roc_nix *roc_nix, uint16_t id, uint64_t mask,
 		roc_nix_bpf_stats_to_idx(mask & ROC_NIX_BPF_RED_OCTS_F_DROP);
 
 	if (green_pkt_pass != ROC_NIX_BPF_STATS_MAX)
-		stats[green_pkt_pass] = rsp->prof.green_pkt_pass;
+		stats[green_pkt_pass] = prof->green_pkt_pass;
 
 	if (green_octs_pass != ROC_NIX_BPF_STATS_MAX)
-		stats[green_octs_pass] = rsp->prof.green_octs_pass;
+		stats[green_octs_pass] = prof->green_octs_pass;
 
 	if (green_pkt_drop != ROC_NIX_BPF_STATS_MAX)
-		stats[green_pkt_drop] = rsp->prof.green_pkt_drop;
+		stats[green_pkt_drop] = prof->green_pkt_drop;
 
 	if (green_octs_drop != ROC_NIX_BPF_STATS_MAX)
-		stats[green_octs_drop] = rsp->prof.green_octs_pass;
+		stats[green_octs_drop] = prof->green_octs_pass;
 
 	if (yellow_pkt_pass != ROC_NIX_BPF_STATS_MAX)
-		stats[yellow_pkt_pass] = rsp->prof.yellow_pkt_pass;
+		stats[yellow_pkt_pass] = prof->yellow_pkt_pass;
 
 	if (yellow_octs_pass != ROC_NIX_BPF_STATS_MAX)
-		stats[yellow_octs_pass] = rsp->prof.yellow_octs_pass;
+		stats[yellow_octs_pass] = prof->yellow_octs_pass;
 
 	if (yellow_pkt_drop != ROC_NIX_BPF_STATS_MAX)
-		stats[yellow_pkt_drop] = rsp->prof.yellow_pkt_drop;
+		stats[yellow_pkt_drop] = prof->yellow_pkt_drop;
 
 	if (yellow_octs_drop != ROC_NIX_BPF_STATS_MAX)
-		stats[yellow_octs_drop] = rsp->prof.yellow_octs_drop;
+		stats[yellow_octs_drop] = prof->yellow_octs_drop;
 
 	if (red_pkt_pass != ROC_NIX_BPF_STATS_MAX)
-		stats[red_pkt_pass] = rsp->prof.red_pkt_pass;
+		stats[red_pkt_pass] = prof->red_pkt_pass;
 
 	if (red_octs_pass != ROC_NIX_BPF_STATS_MAX)
-		stats[red_octs_pass] = rsp->prof.red_octs_pass;
+		stats[red_octs_pass] = prof->red_octs_pass;
 
 	if (red_pkt_drop != ROC_NIX_BPF_STATS_MAX)
-		stats[red_pkt_drop] = rsp->prof.red_pkt_drop;
+		stats[red_pkt_drop] = prof->red_pkt_drop;
 
 	if (red_octs_drop != ROC_NIX_BPF_STATS_MAX)
-		stats[red_octs_drop] = rsp->prof.red_octs_drop;
+		stats[red_octs_drop] = prof->red_octs_drop;
 
 	rc = 0;
 exit:
@@ -1037,9 +1166,9 @@ roc_nix_bpf_stats_reset(struct roc_nix *roc_nix, uint16_t id, uint64_t mask,
 			enum roc_nix_bpf_level_flag lvl_flag)
 {
 	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	volatile struct nix_band_prof_s *prof, *prof_mask;
 	struct dev *dev = &nix->dev;
 	struct mbox *mbox = mbox_get(dev->mbox);
-	struct nix_cn10k_aq_enq_req *aq;
 	uint8_t level_idx;
 	int rc;
 
@@ -1054,68 +1183,81 @@ roc_nix_bpf_stats_reset(struct roc_nix *roc_nix, uint16_t id, uint64_t mask,
 		goto exit;
 	}
 
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
-	if (aq == NULL) {
-		rc = -ENOSPC;
-		goto exit;
+	if (roc_model_is_cn10k()) {
+		struct nix_cn10k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		prof = &aq->prof;
+		prof_mask = &aq->prof_mask;
+	} else {
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (aq == NULL) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+		aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
+		aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
+		aq->op = NIX_AQ_INSTOP_WRITE;
+		prof = &aq->prof;
+		prof_mask = &aq->prof_mask;
 	}
-	aq->qidx = (sw_to_hw_lvl_map[level_idx] << 14 | id);
-	aq->ctype = NIX_AQ_CTYPE_BAND_PROF;
-	aq->op = NIX_AQ_INSTOP_WRITE;
 
 	if (mask & ROC_NIX_BPF_GREEN_PKT_F_PASS) {
-		aq->prof.green_pkt_pass = 0;
-		aq->prof_mask.green_pkt_pass = ~(aq->prof_mask.green_pkt_pass);
+		prof->green_pkt_pass = 0;
+		prof_mask->green_pkt_pass = ~(prof_mask->green_pkt_pass);
 	}
 	if (mask & ROC_NIX_BPF_GREEN_OCTS_F_PASS) {
-		aq->prof.green_octs_pass = 0;
-		aq->prof_mask.green_octs_pass =
-			~(aq->prof_mask.green_octs_pass);
+		prof->green_octs_pass = 0;
+		prof_mask->green_octs_pass = ~(prof_mask->green_octs_pass);
 	}
 	if (mask & ROC_NIX_BPF_GREEN_PKT_F_DROP) {
-		aq->prof.green_pkt_drop = 0;
-		aq->prof_mask.green_pkt_drop = ~(aq->prof_mask.green_pkt_drop);
+		prof->green_pkt_drop = 0;
+		prof_mask->green_pkt_drop = ~(prof_mask->green_pkt_drop);
 	}
 	if (mask & ROC_NIX_BPF_GREEN_OCTS_F_DROP) {
-		aq->prof.green_octs_drop = 0;
-		aq->prof_mask.green_octs_drop =
-			~(aq->prof_mask.green_octs_drop);
+		prof->green_octs_drop = 0;
+		prof_mask->green_octs_drop = ~(prof_mask->green_octs_drop);
 	}
 	if (mask & ROC_NIX_BPF_YELLOW_PKT_F_PASS) {
-		aq->prof.yellow_pkt_pass = 0;
-		aq->prof_mask.yellow_pkt_pass =
-			~(aq->prof_mask.yellow_pkt_pass);
+		prof->yellow_pkt_pass = 0;
+		prof_mask->yellow_pkt_pass = ~(prof_mask->yellow_pkt_pass);
 	}
 	if (mask & ROC_NIX_BPF_YELLOW_OCTS_F_PASS) {
-		aq->prof.yellow_octs_pass = 0;
-		aq->prof_mask.yellow_octs_pass =
-			~(aq->prof_mask.yellow_octs_pass);
+		prof->yellow_octs_pass = 0;
+		prof_mask->yellow_octs_pass = ~(prof_mask->yellow_octs_pass);
 	}
 	if (mask & ROC_NIX_BPF_YELLOW_PKT_F_DROP) {
-		aq->prof.yellow_pkt_drop = 0;
-		aq->prof_mask.yellow_pkt_drop =
-			~(aq->prof_mask.yellow_pkt_drop);
+		prof->yellow_pkt_drop = 0;
+		prof_mask->yellow_pkt_drop = ~(prof_mask->yellow_pkt_drop);
 	}
 	if (mask & ROC_NIX_BPF_YELLOW_OCTS_F_DROP) {
-		aq->prof.yellow_octs_drop = 0;
-		aq->prof_mask.yellow_octs_drop =
-			~(aq->prof_mask.yellow_octs_drop);
+		prof->yellow_octs_drop = 0;
+		prof_mask->yellow_octs_drop = ~(prof_mask->yellow_octs_drop);
 	}
 	if (mask & ROC_NIX_BPF_RED_PKT_F_PASS) {
-		aq->prof.red_pkt_pass = 0;
-		aq->prof_mask.red_pkt_pass = ~(aq->prof_mask.red_pkt_pass);
+		prof->red_pkt_pass = 0;
+		prof_mask->red_pkt_pass = ~(prof_mask->red_pkt_pass);
 	}
 	if (mask & ROC_NIX_BPF_RED_OCTS_F_PASS) {
-		aq->prof.red_octs_pass = 0;
-		aq->prof_mask.red_octs_pass = ~(aq->prof_mask.red_octs_pass);
+		prof->red_octs_pass = 0;
+		prof_mask->red_octs_pass = ~(prof_mask->red_octs_pass);
 	}
 	if (mask & ROC_NIX_BPF_RED_PKT_F_DROP) {
-		aq->prof.red_pkt_drop = 0;
-		aq->prof_mask.red_pkt_drop = ~(aq->prof_mask.red_pkt_drop);
+		prof->red_pkt_drop = 0;
+		prof_mask->red_pkt_drop = ~(prof_mask->red_pkt_drop);
 	}
 	if (mask & ROC_NIX_BPF_RED_OCTS_F_DROP) {
-		aq->prof.red_octs_drop = 0;
-		aq->prof_mask.red_octs_drop = ~(aq->prof_mask.red_octs_drop);
+		prof->red_octs_drop = 0;
+		prof_mask->red_octs_drop = ~(prof_mask->red_octs_drop);
 	}
 
 	rc = mbox_process(mbox);
diff --git a/drivers/common/cnxk/roc_nix_queue.c b/drivers/common/cnxk/roc_nix_queue.c
index bb1b70424f..06029275af 100644
--- a/drivers/common/cnxk/roc_nix_queue.c
+++ b/drivers/common/cnxk/roc_nix_queue.c
@@ -384,6 +384,94 @@ nix_rq_cn9k_cman_cfg(struct dev *dev, struct roc_nix_rq *rq)
 	return rc;
 }
 
+static int
+nix_rq_cn10k_cman_cfg(struct dev *dev, struct roc_nix_rq *rq)
+{
+	struct nix_cn10k_aq_enq_req *aq;
+	struct mbox *mbox = mbox_get(dev->mbox);
+	int rc;
+
+	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
+	if (!aq) {
+		rc = -ENOSPC;
+		goto exit;
+	}
+
+	aq->qidx = rq->qid;
+	aq->ctype = NIX_AQ_CTYPE_RQ;
+	aq->op = NIX_AQ_INSTOP_WRITE;
+
+	if (rq->red_pass && (rq->red_pass >= rq->red_drop)) {
+		aq->rq.lpb_pool_pass = rq->red_pass;
+		aq->rq.lpb_pool_drop = rq->red_drop;
+		aq->rq_mask.lpb_pool_pass = ~(aq->rq_mask.lpb_pool_pass);
+		aq->rq_mask.lpb_pool_drop = ~(aq->rq_mask.lpb_pool_drop);
+	}
+
+	if (rq->spb_red_pass && (rq->spb_red_pass >= rq->spb_red_drop)) {
+		aq->rq.spb_pool_pass = rq->spb_red_pass;
+		aq->rq.spb_pool_drop = rq->spb_red_drop;
+		aq->rq_mask.spb_pool_pass = ~(aq->rq_mask.spb_pool_pass);
+		aq->rq_mask.spb_pool_drop = ~(aq->rq_mask.spb_pool_drop);
+	}
+
+	if (rq->xqe_red_pass && (rq->xqe_red_pass >= rq->xqe_red_drop)) {
+		aq->rq.xqe_pass = rq->xqe_red_pass;
+		aq->rq.xqe_drop = rq->xqe_red_drop;
+		aq->rq_mask.xqe_drop = ~(aq->rq_mask.xqe_drop);
+		aq->rq_mask.xqe_pass = ~(aq->rq_mask.xqe_pass);
+	}
+
+	rc = mbox_process(mbox);
+exit:
+	mbox_put(mbox);
+	return rc;
+}
+
+static int
+nix_rq_cman_cfg(struct dev *dev, struct roc_nix_rq *rq)
+{
+	struct nix_cn20k_aq_enq_req *aq;
+	struct mbox *mbox = mbox_get(dev->mbox);
+	int rc;
+
+	aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+	if (!aq) {
+		rc = -ENOSPC;
+		goto exit;
+	}
+
+	aq->qidx = rq->qid;
+	aq->ctype = NIX_AQ_CTYPE_RQ;
+	aq->op = NIX_AQ_INSTOP_WRITE;
+
+	if (rq->red_pass && (rq->red_pass >= rq->red_drop)) {
+		aq->rq.lpb_pool_pass = rq->red_pass;
+		aq->rq.lpb_pool_drop = rq->red_drop;
+		aq->rq_mask.lpb_pool_pass = ~(aq->rq_mask.lpb_pool_pass);
+		aq->rq_mask.lpb_pool_drop = ~(aq->rq_mask.lpb_pool_drop);
+	}
+
+	if (rq->spb_red_pass && (rq->spb_red_pass >= rq->spb_red_drop)) {
+		aq->rq.spb_pool_pass = rq->spb_red_pass;
+		aq->rq.spb_pool_drop = rq->spb_red_drop;
+		aq->rq_mask.spb_pool_pass = ~(aq->rq_mask.spb_pool_pass);
+		aq->rq_mask.spb_pool_drop = ~(aq->rq_mask.spb_pool_drop);
+	}
+
+	if (rq->xqe_red_pass && (rq->xqe_red_pass >= rq->xqe_red_drop)) {
+		aq->rq.xqe_pass = rq->xqe_red_pass;
+		aq->rq.xqe_drop = rq->xqe_red_drop;
+		aq->rq_mask.xqe_drop = ~(aq->rq_mask.xqe_drop);
+		aq->rq_mask.xqe_pass = ~(aq->rq_mask.xqe_pass);
+	}
+
+	rc = mbox_process(mbox);
+exit:
+	mbox_put(mbox);
+	return rc;
+}
+
 int
 nix_rq_cn9k_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints,
 		bool cfg, bool ena)
@@ -680,52 +768,6 @@ nix_rq_cn10k_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cf
 	return 0;
 }
 
-static int
-nix_rq_cman_cfg(struct dev *dev, struct roc_nix_rq *rq)
-{
-	struct nix_cn10k_aq_enq_req *aq;
-	struct mbox *mbox = mbox_get(dev->mbox);
-	int rc;
-
-	aq = mbox_alloc_msg_nix_cn10k_aq_enq(mbox);
-	if (!aq) {
-		rc = -ENOSPC;
-		goto exit;
-	}
-
-	aq->qidx = rq->qid;
-	aq->ctype = NIX_AQ_CTYPE_RQ;
-	aq->op = NIX_AQ_INSTOP_WRITE;
-
-	if (rq->red_pass && (rq->red_pass >= rq->red_drop)) {
-		aq->rq.lpb_pool_pass = rq->red_pass;
-		aq->rq.lpb_pool_drop = rq->red_drop;
-		aq->rq_mask.lpb_pool_pass = ~(aq->rq_mask.lpb_pool_pass);
-		aq->rq_mask.lpb_pool_drop = ~(aq->rq_mask.lpb_pool_drop);
-
-	}
-
-	if (rq->spb_red_pass && (rq->spb_red_pass >= rq->spb_red_drop)) {
-		aq->rq.spb_pool_pass = rq->spb_red_pass;
-		aq->rq.spb_pool_drop = rq->spb_red_drop;
-		aq->rq_mask.spb_pool_pass = ~(aq->rq_mask.spb_pool_pass);
-		aq->rq_mask.spb_pool_drop = ~(aq->rq_mask.spb_pool_drop);
-
-	}
-
-	if (rq->xqe_red_pass && (rq->xqe_red_pass >= rq->xqe_red_drop)) {
-		aq->rq.xqe_pass = rq->xqe_red_pass;
-		aq->rq.xqe_drop = rq->xqe_red_drop;
-		aq->rq_mask.xqe_drop = ~(aq->rq_mask.xqe_drop);
-		aq->rq_mask.xqe_pass = ~(aq->rq_mask.xqe_pass);
-	}
-
-	rc = mbox_process(mbox);
-exit:
-	mbox_put(mbox);
-	return rc;
-}
-
 int
 nix_rq_cfg(struct dev *dev, struct roc_nix_rq *rq, uint16_t qints, bool cfg, bool ena)
 {
@@ -1021,6 +1063,8 @@ roc_nix_rq_cman_config(struct roc_nix *roc_nix, struct roc_nix_rq *rq)
 
 	if (is_cn9k)
 		rc = nix_rq_cn9k_cman_cfg(dev, rq);
+	else if (roc_model_is_cn10k())
+		rc = nix_rq_cn10k_cman_cfg(dev, rq);
 	else
 		rc = nix_rq_cman_cfg(dev, rq);
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v3 08/18] common/cnxk: support NIX debug for cn20k
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
                     ` (6 preceding siblings ...)
  2024-10-01 12:40   ` [PATCH v3 07/18] common/cnxk: support bandwidth profile " Nithin Dabilpuram
@ 2024-10-01 12:40   ` Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 09/18] common/cnxk: add RSS support " Nithin Dabilpuram
                     ` (10 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-10-01 12:40 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

From: Satha Rao <skoteshwar@marvell.com>

Add support to dump cn20k queue structs and also provide the same
in telemetry data.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
---
 drivers/common/cnxk/cnxk_telemetry_nix.c | 260 ++++++++++++++++++++++-
 drivers/common/cnxk/roc_nix_debug.c      | 234 +++++++++++++++++++-
 drivers/common/cnxk/roc_nix_priv.h       |   3 +-
 3 files changed, 488 insertions(+), 9 deletions(-)

diff --git a/drivers/common/cnxk/cnxk_telemetry_nix.c b/drivers/common/cnxk/cnxk_telemetry_nix.c
index ccae5d7853..abeefafe1e 100644
--- a/drivers/common/cnxk/cnxk_telemetry_nix.c
+++ b/drivers/common/cnxk/cnxk_telemetry_nix.c
@@ -346,7 +346,7 @@ nix_rq_ctx_cn9k(volatile void *qctx, struct plt_tel_data *d)
 }
 
 static void
-nix_rq_ctx(volatile void *qctx, struct plt_tel_data *d)
+nix_rq_ctx_cn10k(volatile void *qctx, struct plt_tel_data *d)
 {
 	volatile struct nix_cn10k_rq_ctx_s *ctx;
 
@@ -438,6 +438,100 @@ nix_rq_ctx(volatile void *qctx, struct plt_tel_data *d)
 	CNXK_TEL_DICT_U64(d, ctx, re_pkts, w10_);
 }
 
+static void
+nix_rq_ctx(volatile void *qctx, struct plt_tel_data *d)
+{
+	volatile struct nix_cn20k_rq_ctx_s *ctx;
+
+	ctx = (volatile struct nix_cn20k_rq_ctx_s *)qctx;
+
+	/* W0 */
+	CNXK_TEL_DICT_INT(d, ctx, wqe_aura, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, len_ol3_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, len_ol4_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, len_il3_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, len_il4_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, csum_ol4_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, csum_il4_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, lenerr_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, port_ol4_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, port_il4_dis, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, ena_wqwd, w0);
+	CNXK_TEL_DICT_INT(d, ctx, ipsech_ena, w0);
+	CNXK_TEL_DICT_INT(d, ctx, sso_ena, w0);
+	CNXK_TEL_DICT_INT(d, ctx, ena, w0);
+
+	/* W1 */
+	CNXK_TEL_DICT_INT(d, ctx, chi_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, ipsecd_drop_en, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, pb_stashing, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_drop_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_drop_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, xqe_drop_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, wqe_caching, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, pb_caching, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, sso_tt, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, sso_grp, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_aura, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_aura, w1_);
+
+	/* W2 */
+	CNXK_TEL_DICT_INT(d, ctx, xqe_hdr_split, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, xqe_imm_copy, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, band_prof_id_h, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, xqe_imm_size, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, later_skip, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, sso_bp_ena, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, first_skip, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_sizem1, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_ena, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_high_sizem1, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, wqe_skip, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_sizem1, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, policer_ena, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, band_prof_id_l, w2_);
+
+	/* W3 */
+	CNXK_TEL_DICT_INT(d, ctx, spb_pool_pass, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_pool_drop, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_aura_pass, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, spb_aura_drop, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, wqe_pool_pass, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, wqe_pool_drop, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, xqe_pass, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, xqe_drop, w3_);
+
+	/* W4 */
+	CNXK_TEL_DICT_INT(d, ctx, qint_idx, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, rq_int_ena, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, rq_int, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_pool_pass, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_pool_drop, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_aura_pass, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, lpb_aura_drop, w4_);
+
+	/* W5 */
+	CNXK_TEL_DICT_INT(d, ctx, flow_tagw, w5_);
+	CNXK_TEL_DICT_INT(d, ctx, bad_utag, w5_);
+	CNXK_TEL_DICT_INT(d, ctx, good_utag, w5_);
+	CNXK_TEL_DICT_INT(d, ctx, ltag, w5_);
+
+	/* W6 */
+	CNXK_TEL_DICT_U64(d, ctx, octs, w6_);
+
+	/* W7 */
+	CNXK_TEL_DICT_U64(d, ctx, pkts, w7_);
+
+	/* W8 */
+	CNXK_TEL_DICT_U64(d, ctx, drop_octs, w8_);
+
+	/* W9 */
+	CNXK_TEL_DICT_U64(d, ctx, drop_pkts, w9_);
+
+	/* W10 */
+	CNXK_TEL_DICT_U64(d, ctx, re_pkts, w10_);
+}
+
 static int
 cnxk_tel_nix_rq_ctx(struct roc_nix *roc_nix, uint8_t n, struct plt_tel_data *d)
 {
@@ -459,12 +553,77 @@ cnxk_tel_nix_rq_ctx(struct roc_nix *roc_nix, uint8_t n, struct plt_tel_data *d)
 
 	if (roc_model_is_cn9k())
 		nix_rq_ctx_cn9k(qctx, d);
+	else if (roc_model_is_cn10k())
+		nix_rq_ctx_cn10k(qctx, d);
 	else
 		nix_rq_ctx(qctx, d);
 
 	return 0;
 }
 
+static int
+cnxk_tel_nix_cq_ctx_cn20k(struct roc_nix *roc_nix, uint8_t n, struct plt_tel_data *d)
+{
+	struct nix *nix = roc_nix_to_nix_priv(roc_nix);
+	struct dev *dev = &nix->dev;
+	struct npa_lf *npa_lf;
+	volatile struct nix_cn20k_cq_ctx_s *ctx;
+	int rc = -1;
+
+	npa_lf = idev_npa_obj_get();
+	if (npa_lf == NULL)
+		return NPA_ERR_DEVICE_NOT_BOUNDED;
+
+	rc = nix_q_ctx_get(dev, NIX_AQ_CTYPE_CQ, n, (void *)&ctx);
+	if (rc) {
+		plt_err("Failed to get cq context");
+		return rc;
+	}
+
+	/* W0 */
+	CNXK_TEL_DICT_PTR(d, ctx, base, w0_);
+
+	/* W1 */
+	CNXK_TEL_DICT_U64(d, ctx, wrptr, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, avg_con, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, cint_idx, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, cq_err, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, qint_idx, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, lbpid_high, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, bpid, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, lbpid_med, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, bp_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, lbpid_low, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, lbp_ena, w1_);
+
+	/* W2 */
+	CNXK_TEL_DICT_INT(d, ctx, update_time, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, avg_level, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, head, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, tail, w2_);
+
+	/* W3 */
+	CNXK_TEL_DICT_INT(d, ctx, cq_err_int_ena, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, cq_err_int, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, qsize, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, stashing, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, caching, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, lbp_frac, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, stash_thresh, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, msh_valid, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, msh_dst, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, cpt_drop_err_en, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, ena, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, drop_ena, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, drop, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, bp, w3_);
+
+	CNXK_TEL_DICT_INT(d, ctx, lbpid_ext, w4_);
+	CNXK_TEL_DICT_INT(d, ctx, bpid_ext, w4_);
+
+	return 0;
+}
+
 static int
 cnxk_tel_nix_cq_ctx(struct roc_nix *roc_nix, uint8_t n, struct plt_tel_data *d)
 {
@@ -474,6 +633,9 @@ cnxk_tel_nix_cq_ctx(struct roc_nix *roc_nix, uint8_t n, struct plt_tel_data *d)
 	volatile struct nix_cq_ctx_s *ctx;
 	int rc = -1;
 
+	if (roc_model_is_cn20k())
+		return cnxk_tel_nix_cq_ctx_cn20k(roc_nix, n, d);
+
 	npa_lf = idev_npa_obj_get();
 	if (npa_lf == NULL)
 		return NPA_ERR_DEVICE_NOT_BOUNDED;
@@ -602,7 +764,7 @@ nix_sq_ctx_cn9k(volatile void *qctx, struct plt_tel_data *d)
 }
 
 static void
-nix_sq_ctx(volatile void *qctx, struct plt_tel_data *d)
+nix_sq_ctx_cn10k(volatile void *qctx, struct plt_tel_data *d)
 {
 	volatile struct nix_cn10k_sq_ctx_s *ctx;
 
@@ -617,6 +779,97 @@ nix_sq_ctx(volatile void *qctx, struct plt_tel_data *d)
 	CNXK_TEL_DICT_INT(d, ctx, ena, w0_);
 
 	/* W1 */
+	CNXK_TEL_DICT_INT(d, ctx, smq_rr_count_lb, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, sqb_count, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, default_chan, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, smq_rr_weight, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, sso_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, xoff, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, cq_ena, w1_);
+	CNXK_TEL_DICT_INT(d, ctx, smq, w1_);
+
+	/* W2 */
+	CNXK_TEL_DICT_INT(d, ctx, sqe_stype, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, sq_int_ena, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, sq_int, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, sqb_aura, w2_);
+	CNXK_TEL_DICT_INT(d, ctx, smq_rr_count_ub, w2_);
+
+	/* W3 */
+	CNXK_TEL_DICT_INT(d, ctx, smq_next_sq_vld, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, smq_pend, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, smenq_next_sqb_vld, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, head_offset, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, smenq_offset, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, tail_offset, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, smq_lso_segnum, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, smq_next_sq, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, mnq_dis, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, lmt_dis, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, cq_limit, w3_);
+	CNXK_TEL_DICT_INT(d, ctx, max_sqe_size, w3_);
+
+	/* W4 */
+	CNXK_TEL_DICT_PTR(d, ctx, next_sqb, w4_);
+
+	/* W5 */
+	CNXK_TEL_DICT_PTR(d, ctx, tail_sqb, w5_);
+
+	/* W6 */
+	CNXK_TEL_DICT_PTR(d, ctx, smenq_sqb, w6_);
+
+	/* W7 */
+	CNXK_TEL_DICT_PTR(d, ctx, smenq_next_sqb, w7_);
+
+	/* W8 */
+	CNXK_TEL_DICT_PTR(d, ctx, head_sqb, w8_);
+
+	/* W9 */
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_vld, w9_);
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_vlan1_ins_ena, w9_);
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_vlan0_ins_ena, w9_);
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_mps, w9_);
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_sb, w9_);
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_sizem1, w9_);
+	CNXK_TEL_DICT_INT(d, ctx, vfi_lso_total, w9_);
+
+	/* W10 */
+	CNXK_TEL_DICT_BF_PTR(d, ctx, scm_lso_rem, w10_);
+
+	/* W11 */
+	CNXK_TEL_DICT_BF_PTR(d, ctx, octs, w11_);
+
+	/* W12 */
+	CNXK_TEL_DICT_BF_PTR(d, ctx, pkts, w12_);
+
+	/* W13 */
+	CNXK_TEL_DICT_INT(d, ctx, aged_drop_octs, w13_);
+	CNXK_TEL_DICT_INT(d, ctx, aged_drop_pkts, w13_);
+
+	/* W14 */
+	CNXK_TEL_DICT_BF_PTR(d, ctx, drop_octs, w14_);
+
+	/* W15 */
+	CNXK_TEL_DICT_BF_PTR(d, ctx, drop_pkts, w15_);
+}
+
+static void
+nix_sq_ctx(volatile void *qctx, struct plt_tel_data *d)
+{
+	volatile struct nix_cn20k_sq_ctx_s *ctx;
+
+	ctx = (volatile struct nix_cn20k_sq_ctx_s *)qctx;
+
+	/* W0 */
+	CNXK_TEL_DICT_INT(d, ctx, sqe_way_mask, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, cq, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, sdp_mcast, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, substream, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, qint_idx, w0_);
+	CNXK_TEL_DICT_INT(d, ctx, ena, w0_);
+
+	/* W1 */
+	CNXK_TEL_DICT_INT(d, ctx, smq_rr_count_lb, w1_);
 	CNXK_TEL_DICT_INT(d, ctx, sqb_count, w1_);
 	CNXK_TEL_DICT_INT(d, ctx, default_chan, w1_);
 	CNXK_TEL_DICT_INT(d, ctx, smq_rr_weight, w1_);
@@ -631,7 +884,6 @@ nix_sq_ctx(volatile void *qctx, struct plt_tel_data *d)
 	CNXK_TEL_DICT_INT(d, ctx, sq_int, w2_);
 	CNXK_TEL_DICT_INT(d, ctx, sqb_aura, w2_);
 	CNXK_TEL_DICT_INT(d, ctx, smq_rr_count_ub, w2_);
-	CNXK_TEL_DICT_INT(d, ctx, smq_rr_count_lb, w2_);
 
 	/* W3 */
 	CNXK_TEL_DICT_INT(d, ctx, smq_next_sq_vld, w3_);
@@ -712,6 +964,8 @@ cnxk_tel_nix_sq_ctx(struct roc_nix *roc_nix, uint8_t n, struct plt_tel_data *d)
 
 	if (roc_model_is_cn9k())
 		nix_sq_ctx_cn9k(qctx, d);
+	else if (roc_model_is_cn10k())
+		nix_sq_ctx_cn10k(qctx, d);
 	else
 		nix_sq_ctx(qctx, d);
 
diff --git a/drivers/common/cnxk/roc_nix_debug.c b/drivers/common/cnxk/roc_nix_debug.c
index 2e91470c09..0cc8d7cc1e 100644
--- a/drivers/common/cnxk/roc_nix_debug.c
+++ b/drivers/common/cnxk/roc_nix_debug.c
@@ -358,7 +358,7 @@ nix_q_ctx_get(struct dev *dev, uint8_t ctype, uint16_t qid, __io void **ctx_p)
 			*ctx_p = &rsp->sq;
 		else
 			*ctx_p = &rsp->cq;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		struct nix_cn10k_aq_enq_rsp *rsp;
 		struct nix_cn10k_aq_enq_req *aq;
 
@@ -372,6 +372,30 @@ nix_q_ctx_get(struct dev *dev, uint8_t ctype, uint16_t qid, __io void **ctx_p)
 		aq->ctype = ctype;
 		aq->op = NIX_AQ_INSTOP_READ;
 
+		rc = mbox_process_msg(mbox, (void *)&rsp);
+		if (rc)
+			goto exit;
+
+		if (ctype == NIX_AQ_CTYPE_RQ)
+			*ctx_p = &rsp->rq;
+		else if (ctype == NIX_AQ_CTYPE_SQ)
+			*ctx_p = &rsp->sq;
+		else
+			*ctx_p = &rsp->cq;
+	} else {
+		struct nix_cn20k_aq_enq_rsp *rsp;
+		struct nix_cn20k_aq_enq_req *aq;
+
+		aq = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!aq) {
+			rc = -ENOSPC;
+			goto exit;
+		}
+
+		aq->qidx = qid;
+		aq->ctype = ctype;
+		aq->op = NIX_AQ_INSTOP_READ;
+
 		rc = mbox_process_msg(mbox, (void *)&rsp);
 		if (rc)
 			goto exit;
@@ -452,7 +476,69 @@ nix_cn9k_lf_sq_dump(__io struct nix_sq_ctx_s *ctx, uint32_t *sqb_aura_p, FILE *f
 }
 
 static inline void
-nix_lf_sq_dump(__io struct nix_cn10k_sq_ctx_s *ctx, uint32_t *sqb_aura_p, FILE *file)
+nix_cn10k_lf_sq_dump(__io struct nix_cn10k_sq_ctx_s *ctx, uint32_t *sqb_aura_p, FILE *file)
+{
+	nix_dump(file, "W0: sqe_way_mask \t\t%d\nW0: cq \t\t\t\t%d",
+		 ctx->sqe_way_mask, ctx->cq);
+	nix_dump(file, "W0: sdp_mcast \t\t\t%d\nW0: substream \t\t\t0x%03x",
+		 ctx->sdp_mcast, ctx->substream);
+	nix_dump(file, "W0: qint_idx \t\t\t%d\nW0: ena \t\t\t%d\n", ctx->qint_idx,
+		 ctx->ena);
+
+	nix_dump(file, "W1: sqb_count \t\t\t%d\nW1: default_chan \t\t%d",
+		 ctx->sqb_count, ctx->default_chan);
+	nix_dump(file, "W1: smq_rr_weight \t\t%d\nW1: sso_ena \t\t\t%d",
+		 ctx->smq_rr_weight, ctx->sso_ena);
+	nix_dump(file, "W1: xoff \t\t\t%d\nW1: cq_ena \t\t\t%d\nW1: smq\t\t\t\t%d\n",
+		 ctx->xoff, ctx->cq_ena, ctx->smq);
+
+	nix_dump(file, "W2: sqe_stype \t\t\t%d\nW2: sq_int_ena \t\t\t%d",
+		 ctx->sqe_stype, ctx->sq_int_ena);
+	nix_dump(file, "W2: sq_int  \t\t\t%d\nW2: sqb_aura \t\t\t%d", ctx->sq_int,
+		 ctx->sqb_aura);
+	nix_dump(file, "W2: smq_rr_count[ub:lb] \t\t%x:%x\n", ctx->smq_rr_count_ub,
+		 ctx->smq_rr_count_lb);
+
+	nix_dump(file, "W3: smq_next_sq_vld\t\t%d\nW3: smq_pend\t\t\t%d",
+		 ctx->smq_next_sq_vld, ctx->smq_pend);
+	nix_dump(file, "W3: smenq_next_sqb_vld  \t%d\nW3: head_offset\t\t\t%d",
+		 ctx->smenq_next_sqb_vld, ctx->head_offset);
+	nix_dump(file, "W3: smenq_offset\t\t%d\nW3: tail_offset \t\t%d",
+		 ctx->smenq_offset, ctx->tail_offset);
+	nix_dump(file, "W3: smq_lso_segnum \t\t%d\nW3: smq_next_sq \t\t%d",
+		 ctx->smq_lso_segnum, ctx->smq_next_sq);
+	nix_dump(file, "W3: mnq_dis \t\t\t%d\nW3: lmt_dis \t\t\t%d", ctx->mnq_dis,
+		 ctx->lmt_dis);
+	nix_dump(file, "W3: cq_limit\t\t\t%d\nW3: max_sqe_size\t\t%d\n",
+		 ctx->cq_limit, ctx->max_sqe_size);
+
+	nix_dump(file, "W4: next_sqb \t\t\t0x%" PRIx64 "", ctx->next_sqb);
+	nix_dump(file, "W5: tail_sqb \t\t\t0x%" PRIx64 "", ctx->tail_sqb);
+	nix_dump(file, "W6: smenq_sqb \t\t\t0x%" PRIx64 "", ctx->smenq_sqb);
+	nix_dump(file, "W7: smenq_next_sqb \t\t0x%" PRIx64 "", ctx->smenq_next_sqb);
+	nix_dump(file, "W8: head_sqb \t\t\t0x%" PRIx64 "", ctx->head_sqb);
+
+	nix_dump(file, "W9: vfi_lso_vld \t\t%d\nW9: vfi_lso_vlan1_ins_ena\t%d", ctx->vfi_lso_vld,
+		 ctx->vfi_lso_vlan1_ins_ena);
+	nix_dump(file, "W9: vfi_lso_vlan0_ins_ena\t%d\nW9: vfi_lso_mps\t\t\t%d",
+		 ctx->vfi_lso_vlan0_ins_ena, ctx->vfi_lso_mps);
+	nix_dump(file, "W9: vfi_lso_sb \t\t\t%d\nW9: vfi_lso_sizem1\t\t%d", ctx->vfi_lso_sb,
+		 ctx->vfi_lso_sizem1);
+	nix_dump(file, "W9: vfi_lso_total\t\t%d", ctx->vfi_lso_total);
+
+	nix_dump(file, "W10: scm_lso_rem \t\t0x%" PRIx64 "", (uint64_t)ctx->scm_lso_rem);
+	nix_dump(file, "W11: octs \t\t\t0x%" PRIx64 "", (uint64_t)ctx->octs);
+	nix_dump(file, "W12: pkts \t\t\t0x%" PRIx64 "", (uint64_t)ctx->pkts);
+	nix_dump(file, "W13: aged_drop_pkts \t\t\t0x%" PRIx64 "", (uint64_t)ctx->aged_drop_pkts);
+	nix_dump(file, "W13: aged_drop_octs \t\t\t0x%" PRIx64 "", (uint64_t)ctx->aged_drop_octs);
+	nix_dump(file, "W14: dropped_octs \t\t0x%" PRIx64 "", (uint64_t)ctx->drop_octs);
+	nix_dump(file, "W15: dropped_pkts \t\t0x%" PRIx64 "", (uint64_t)ctx->drop_pkts);
+
+	*sqb_aura_p = ctx->sqb_aura;
+}
+
+static inline void
+nix_lf_sq_dump(__io struct nix_cn20k_sq_ctx_s *ctx, uint32_t *sqb_aura_p, FILE *file)
 {
 	nix_dump(file, "W0: sqe_way_mask \t\t%d\nW0: cq \t\t\t\t%d",
 		 ctx->sqe_way_mask, ctx->cq);
@@ -574,7 +660,7 @@ nix_cn9k_lf_rq_dump(__io struct nix_rq_ctx_s *ctx, FILE *file)
 }
 
 void
-nix_lf_rq_dump(__io struct nix_cn10k_rq_ctx_s *ctx, FILE *file)
+nix_cn10k_lf_rq_dump(__io struct nix_cn10k_rq_ctx_s *ctx, FILE *file)
 {
 	nix_dump(file, "W0: wqe_aura \t\t\t%d\nW0: len_ol3_dis \t\t\t%d",
 		 ctx->wqe_aura, ctx->len_ol3_dis);
@@ -649,6 +735,124 @@ nix_lf_rq_dump(__io struct nix_cn10k_rq_ctx_s *ctx, FILE *file)
 	nix_dump(file, "W10: re_pkts \t\t\t0x%" PRIx64 "\n", (uint64_t)ctx->re_pkts);
 }
 
+void
+nix_lf_rq_dump(__io struct nix_cn20k_rq_ctx_s *ctx, FILE *file)
+{
+	nix_dump(file, "W0: wqe_aura \t\t\t%d\nW0: len_ol3_dis \t\t\t%d",
+		 ctx->wqe_aura, ctx->len_ol3_dis);
+	nix_dump(file, "W0: len_ol4_dis \t\t\t%d\nW0: len_il3_dis \t\t\t%d",
+		 ctx->len_ol4_dis, ctx->len_il3_dis);
+	nix_dump(file, "W0: len_il4_dis \t\t\t%d\nW0: csum_ol4_dis \t\t\t%d",
+		 ctx->len_il4_dis, ctx->csum_ol4_dis);
+	nix_dump(file, "W0: csum_il4_dis \t\t\t%d\nW0: lenerr_dis \t\t\t%d",
+		 ctx->csum_il4_dis, ctx->lenerr_dis);
+	nix_dump(file, "W0: port_ol4_dis \t\t\t%d\nW0: port_il4_dis\t\t\t%d",
+		 ctx->port_ol4_dis, ctx->port_il4_dis);
+	nix_dump(file, "W0: cq \t\t\t\t%d\nW0: ena_wqwd \t\t\t%d", ctx->cq,
+		 ctx->ena_wqwd);
+	nix_dump(file, "W0: ipsech_ena \t\t\t%d\nW0: sso_ena \t\t\t%d",
+		 ctx->ipsech_ena, ctx->sso_ena);
+	nix_dump(file, "W0: ena \t\t\t%d\n", ctx->ena);
+
+	nix_dump(file, "W1: chi_ena \t\t%d\nW1: ipsecd_drop_en \t\t%d", ctx->chi_ena,
+		 ctx->ipsecd_drop_en);
+	nix_dump(file, "W1: pb_stashing \t\t\t%d", ctx->pb_stashing);
+	nix_dump(file, "W1: lpb_drop_ena \t\t%d\nW1: spb_drop_ena \t\t%d",
+		 ctx->lpb_drop_ena, ctx->spb_drop_ena);
+	nix_dump(file, "W1: xqe_drop_ena \t\t%d\nW1: wqe_caching \t\t%d",
+		 ctx->xqe_drop_ena, ctx->wqe_caching);
+	nix_dump(file, "W1: pb_caching \t\t\t%d\nW1: sso_tt \t\t\t%d",
+		 ctx->pb_caching, ctx->sso_tt);
+	nix_dump(file, "W1: sso_grp \t\t\t%d\nW1: lpb_aura \t\t\t%d", ctx->sso_grp,
+		 ctx->lpb_aura);
+	nix_dump(file, "W1: spb_aura \t\t\t%d\n", ctx->spb_aura);
+
+	nix_dump(file, "W2: xqe_hdr_split \t\t%d\nW2: xqe_imm_copy \t\t%d",
+		 ctx->xqe_hdr_split, ctx->xqe_imm_copy);
+	nix_dump(file, "W2: band_prof_id\t\t%d\n",
+		 ((ctx->band_prof_id_h << 10) | ctx->band_prof_id_l));
+	nix_dump(file, "W2: xqe_imm_size \t\t%d\nW2: later_skip \t\t\t%d",
+		 ctx->xqe_imm_size, ctx->later_skip);
+	nix_dump(file, "W2: sso_bp_ena\t\t%d\n", ctx->sso_bp_ena);
+	nix_dump(file, "W2: first_skip \t\t\t%d\nW2: lpb_sizem1 \t\t\t%d",
+		 ctx->first_skip, ctx->lpb_sizem1);
+	nix_dump(file, "W2: spb_ena \t\t\t%d\nW2: spb_high_sizem1 \t\t\t%d", ctx->spb_ena,
+		 ctx->spb_high_sizem1);
+	nix_dump(file, "W2: wqe_skip \t\t\t%d", ctx->wqe_skip);
+	nix_dump(file, "W2: spb_sizem1 \t\t\t%d\nW2: policer_ena \t\t\t%d",
+		 ctx->spb_sizem1, ctx->policer_ena);
+	nix_dump(file, "W2: sso_fc_ena \t\t\t%d\n", ctx->sso_fc_ena);
+
+	nix_dump(file, "W3: spb_pool_pass \t\t%d\nW3: spb_pool_drop \t\t%d",
+		 ctx->spb_pool_pass, ctx->spb_pool_drop);
+	nix_dump(file, "W3: spb_aura_pass \t\t%d\nW3: spb_aura_drop \t\t%d",
+		 ctx->spb_aura_pass, ctx->spb_aura_drop);
+	nix_dump(file, "W3: wqe_pool_pass \t\t%d\nW3: wqe_pool_drop \t\t%d",
+		 ctx->wqe_pool_pass, ctx->wqe_pool_drop);
+	nix_dump(file, "W3: xqe_pass \t\t\t%d\nW3: xqe_drop \t\t\t%d\n",
+		 ctx->xqe_pass, ctx->xqe_drop);
+
+	nix_dump(file, "W4: qint_idx \t\t\t%d\nW4: rq_int_ena \t\t\t%d",
+		 ctx->qint_idx, ctx->rq_int_ena);
+	nix_dump(file, "W4: rq_int \t\t\t%d\nW4: lpb_pool_pass \t\t%d", ctx->rq_int,
+		 ctx->lpb_pool_pass);
+	nix_dump(file, "W4: lpb_pool_drop \t\t%d\nW4: lpb_aura_pass \t\t%d",
+		 ctx->lpb_pool_drop, ctx->lpb_aura_pass);
+	nix_dump(file, "W4: lpb_aura_drop \t\t%d\n", ctx->lpb_aura_drop);
+
+	nix_dump(file, "W5: flow_tagw \t\t\t%d\nW5: bad_utag \t\t\t%d",
+		 ctx->flow_tagw, ctx->bad_utag);
+	nix_dump(file, "W5: good_utag \t\t\t%d\nW5: ltag \t\t\t%d\n", ctx->good_utag,
+		 ctx->ltag);
+
+	nix_dump(file, "W6: octs \t\t\t0x%" PRIx64 "", (uint64_t)ctx->octs);
+	nix_dump(file, "W7: pkts \t\t\t0x%" PRIx64 "", (uint64_t)ctx->pkts);
+	nix_dump(file, "W8: drop_octs \t\t\t0x%" PRIx64 "", (uint64_t)ctx->drop_octs);
+	nix_dump(file, "W9: drop_pkts \t\t\t0x%" PRIx64 "", (uint64_t)ctx->drop_pkts);
+	nix_dump(file, "W10: re_pkts \t\t\t0x%" PRIx64 "\n", (uint64_t)ctx->re_pkts);
+}
+
+static inline void
+nix_cn20k_lf_cq_dump(__io struct nix_cn20k_cq_ctx_s *ctx, FILE *file)
+{
+	nix_dump(file, "W0: base \t\t\t0x%" PRIx64 "\n", ctx->base);
+
+	nix_dump(file, "W1: wrptr \t\t\t%" PRIx64 "", (uint64_t)ctx->wrptr);
+	nix_dump(file, "W1: avg_con \t\t\t%d\nW1: cint_idx \t\t\t%d", ctx->avg_con,
+		 ctx->cint_idx);
+	nix_dump(file, "W1: cq_err \t\t\t%d\nW1: qint_idx \t\t\t%d", ctx->cq_err,
+		 ctx->qint_idx);
+	nix_dump(file, "W1: bpid  \t\t\t%d\nW1: bp_ena \t\t\t%d\n", ctx->bpid,
+		 ctx->bp_ena);
+	nix_dump(file,
+		 "W1: lbpid_high \t\t\t0x%03x\nW1: lbpid_med \t\t\t0x%03x\n"
+		 "W1: lbpid_low \t\t\t0x%03x\n(W1: lbpid) \t\t\t0x%03x\n",
+		 ctx->lbpid_high, ctx->lbpid_med, ctx->lbpid_low, (unsigned int)
+		 (ctx->lbpid_high << 6 | ctx->lbpid_med << 3 | ctx->lbpid_low));
+	nix_dump(file, "W1: lbp_ena \t\t\t\t%d\n", ctx->lbp_ena);
+
+	nix_dump(file, "W2: update_time \t\t%d\nW2: avg_level \t\t\t%d",
+		 ctx->update_time, ctx->avg_level);
+	nix_dump(file, "W2: head \t\t\t%d\nW2: tail \t\t\t%d\n", ctx->head,
+		 ctx->tail);
+
+	nix_dump(file, "W3: cq_err_int_ena \t\t%d\nW3: cq_err_int \t\t\t%d",
+		 ctx->cq_err_int_ena, ctx->cq_err_int);
+	nix_dump(file, "W3: qsize \t\t\t%d\nW3: stashing \t\t\t%d", ctx->qsize,
+		 ctx->stashing);
+	nix_dump(file, "W3: caching \t\t\t%d\nW3: lbp_frac \t\t\t%d", ctx->caching, ctx->lbp_frac);
+	nix_dump(file, "W3: stash_thresh \t\t\t%d\nW3: msh_valid\t\t\t%d", ctx->stash_thresh,
+		 ctx->msh_valid);
+	nix_dump(file, "W3: msh_dst \t\t\t0x%03x\nW3: cpt_drop_err_en \t\t\t%d\n",
+		 ctx->msh_dst, ctx->cpt_drop_err_en);
+	nix_dump(file, "W3: ena \t\t\t%d\n", ctx->ena);
+	nix_dump(file, "W3: drop_ena \t\t\t%d\nW3: drop \t\t\t%d", ctx->drop_ena,
+		 ctx->drop);
+	nix_dump(file, "W3: bp \t\t\t\t%d\n", ctx->bp);
+	nix_dump(file, "W4: lbpid_ext \t\t\t%d\nW3: bpid_ext \t\t\t%d", ctx->lbpid_ext,
+		 ctx->bpid_ext);
+}
+
 static inline void
 nix_lf_cq_dump(__io struct nix_cq_ctx_s *ctx, FILE *file)
 {
@@ -713,7 +917,10 @@ roc_nix_queues_ctx_dump(struct roc_nix *roc_nix, FILE *file)
 		}
 		nix_dump(file, "============== port=%d cq=%d ===============",
 			 roc_nix->port_id, q);
-		nix_lf_cq_dump(ctx, file);
+		if (roc_model_is_cn20k())
+			nix_cn20k_lf_cq_dump(ctx, file);
+		else
+			nix_lf_cq_dump(ctx, file);
 	}
 
 	for (q = 0; q < rq; q++) {
@@ -726,6 +933,8 @@ roc_nix_queues_ctx_dump(struct roc_nix *roc_nix, FILE *file)
 			 roc_nix->port_id, q);
 		if (roc_model_is_cn9k())
 			nix_cn9k_lf_rq_dump(ctx, file);
+		else if (roc_model_is_cn10k())
+			nix_cn10k_lf_rq_dump(ctx, file);
 		else
 			nix_lf_rq_dump(ctx, file);
 	}
@@ -751,6 +960,8 @@ roc_nix_queues_ctx_dump(struct roc_nix *roc_nix, FILE *file)
 			 inl_rq->qid);
 		if (roc_model_is_cn9k())
 			nix_cn9k_lf_rq_dump(ctx, file);
+		else if (roc_model_is_cn10k())
+			nix_cn10k_lf_rq_dump(ctx, file);
 		else
 			nix_lf_rq_dump(ctx, file);
 	}
@@ -765,6 +976,8 @@ roc_nix_queues_ctx_dump(struct roc_nix *roc_nix, FILE *file)
 			 roc_nix->port_id, q);
 		if (roc_model_is_cn9k())
 			nix_cn9k_lf_sq_dump(ctx, &sqb_aura, file);
+		else if (roc_model_is_cn10k())
+			nix_cn10k_lf_sq_dump(ctx, &sqb_aura, file);
 		else
 			nix_lf_sq_dump(ctx, &sqb_aura, file);
 
@@ -1480,9 +1693,20 @@ roc_nix_sq_desc_dump(struct roc_nix *roc_nix, uint16_t q, uint16_t offset, uint1
 		tail_sqb = (void *)ctx->tail_sqb;
 		head_off = ctx->head_offset;
 		tail_off = ctx->tail_offset;
-	} else {
+	} else if (roc_model_is_cn10k()) {
 		volatile struct nix_cn10k_sq_ctx_s *ctx = (struct nix_cn10k_sq_ctx_s *)dat;
 
+		if (ctx->mnq_dis || ctx->lmt_dis)
+			full = 1;
+
+		count = ctx->sqb_count;
+		sqb_buf = (void *)ctx->head_sqb;
+		tail_sqb = (void *)ctx->tail_sqb;
+		head_off = ctx->head_offset;
+		tail_off = ctx->tail_offset;
+	} else {
+		volatile struct nix_cn20k_sq_ctx_s *ctx = (struct nix_cn20k_sq_ctx_s *)dat;
+
 		if (ctx->mnq_dis || ctx->lmt_dis)
 			full = 1;
 
diff --git a/drivers/common/cnxk/roc_nix_priv.h b/drivers/common/cnxk/roc_nix_priv.h
index ade42c1878..3fd6fcbe9f 100644
--- a/drivers/common/cnxk/roc_nix_priv.h
+++ b/drivers/common/cnxk/roc_nix_priv.h
@@ -469,7 +469,8 @@ struct nix_tm_shaper_profile *nix_tm_shaper_profile_alloc(void);
 void nix_tm_shaper_profile_free(struct nix_tm_shaper_profile *profile);
 
 uint64_t nix_get_blkaddr(struct dev *dev);
-void nix_lf_rq_dump(__io struct nix_cn10k_rq_ctx_s *ctx, FILE *file);
+void nix_cn10k_lf_rq_dump(__io struct nix_cn10k_rq_ctx_s *ctx, FILE *file);
+void nix_lf_rq_dump(__io struct nix_cn20k_rq_ctx_s *ctx, FILE *file);
 int nix_lf_gen_reg_dump(uintptr_t nix_lf_base, uint64_t *data);
 int nix_lf_stat_reg_dump(uintptr_t nix_lf_base, uint64_t *data, uint8_t lf_tx_stats,
 			 uint8_t lf_rx_stats);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v3 09/18] common/cnxk: add RSS support for cn20k
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
                     ` (7 preceding siblings ...)
  2024-10-01 12:40   ` [PATCH v3 08/18] common/cnxk: support NIX debug " Nithin Dabilpuram
@ 2024-10-01 12:40   ` Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 10/18] net/cnxk: add cn20k base control path support Nithin Dabilpuram
                     ` (9 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-10-01 12:40 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

From: Satha Rao <skoteshwar@marvell.com>

Add RSS configuration support for cn20k.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
---
 drivers/common/cnxk/roc_nix_rss.c | 74 +++++++++++++++++++++++++++++--
 1 file changed, 70 insertions(+), 4 deletions(-)

diff --git a/drivers/common/cnxk/roc_nix_rss.c b/drivers/common/cnxk/roc_nix_rss.c
index 2b88e1360d..fd1472e9b9 100644
--- a/drivers/common/cnxk/roc_nix_rss.c
+++ b/drivers/common/cnxk/roc_nix_rss.c
@@ -70,7 +70,7 @@ nix_cn9k_rss_reta_set(struct nix *nix, uint8_t group,
 				goto exit;
 			req = mbox_alloc_msg_nix_aq_enq(mbox);
 			if (!req) {
-				rc =  NIX_ERR_NO_MEM;
+				rc = NIX_ERR_NO_MEM;
 				goto exit;
 			}
 		}
@@ -93,7 +93,7 @@ nix_cn9k_rss_reta_set(struct nix *nix, uint8_t group,
 				goto exit;
 			req = mbox_alloc_msg_nix_aq_enq(mbox);
 			if (!req) {
-				rc =  NIX_ERR_NO_MEM;
+				rc = NIX_ERR_NO_MEM;
 				goto exit;
 			}
 		}
@@ -115,8 +115,8 @@ nix_cn9k_rss_reta_set(struct nix *nix, uint8_t group,
 }
 
 static int
-nix_rss_reta_set(struct nix *nix, uint8_t group,
-		 uint16_t reta[ROC_NIX_RSS_RETA_MAX], uint8_t lock_rx_ctx)
+nix_cn10k_rss_reta_set(struct nix *nix, uint8_t group, uint16_t reta[ROC_NIX_RSS_RETA_MAX],
+		       uint8_t lock_rx_ctx)
 {
 	struct mbox *mbox = mbox_get((&nix->dev)->mbox);
 	struct nix_cn10k_aq_enq_req *req;
@@ -178,6 +178,70 @@ nix_rss_reta_set(struct nix *nix, uint8_t group,
 	return rc;
 }
 
+static int
+nix_rss_reta_set(struct nix *nix, uint8_t group, uint16_t reta[ROC_NIX_RSS_RETA_MAX],
+		 uint8_t lock_rx_ctx)
+{
+	struct mbox *mbox = mbox_get((&nix->dev)->mbox);
+	struct nix_cn20k_aq_enq_req *req;
+	uint16_t idx;
+	int rc;
+
+	for (idx = 0; idx < nix->reta_sz; idx++) {
+		req = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!req) {
+			/* The shared memory buffer can be full.
+			 * Flush it and retry
+			 */
+			rc = mbox_process(mbox);
+			if (rc < 0)
+				goto exit;
+			req = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+			if (!req) {
+				rc =  NIX_ERR_NO_MEM;
+				goto exit;
+			}
+		}
+		req->rss.rq = reta[idx];
+		/* Fill AQ info */
+		req->qidx = (group * nix->reta_sz) + idx;
+		req->ctype = NIX_AQ_CTYPE_RSS;
+		req->op = NIX_AQ_INSTOP_INIT;
+
+		if (!lock_rx_ctx)
+			continue;
+
+		req = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+		if (!req) {
+			/* The shared memory buffer can be full.
+			 * Flush it and retry
+			 */
+			rc = mbox_process(mbox);
+			if (rc < 0)
+				goto exit;
+			req = mbox_alloc_msg_nix_cn20k_aq_enq(mbox);
+			if (!req) {
+				rc =  NIX_ERR_NO_MEM;
+				goto exit;
+			}
+		}
+		req->rss.rq = reta[idx];
+		/* Fill AQ info */
+		req->qidx = (group * nix->reta_sz) + idx;
+		req->ctype = NIX_AQ_CTYPE_RSS;
+		req->op = NIX_AQ_INSTOP_LOCK;
+	}
+
+	rc = mbox_process(mbox);
+	if (rc < 0)
+		goto exit;
+
+	rc = 0;
+exit:
+	mbox_put(mbox);
+	return rc;
+}
+
 int
 roc_nix_rss_reta_set(struct roc_nix *roc_nix, uint8_t group,
 		     uint16_t reta[ROC_NIX_RSS_RETA_MAX])
@@ -191,6 +255,8 @@ roc_nix_rss_reta_set(struct roc_nix *roc_nix, uint8_t group,
 	if (roc_model_is_cn9k())
 		rc = nix_cn9k_rss_reta_set(nix, group, reta,
 					   roc_nix->lock_rx_ctx);
+	else if (roc_model_is_cn10k())
+		rc = nix_cn10k_rss_reta_set(nix, group, reta, roc_nix->lock_rx_ctx);
 	else
 		rc = nix_rss_reta_set(nix, group, reta, roc_nix->lock_rx_ctx);
 	if (rc)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v3 10/18] net/cnxk: add cn20k base control path support
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
                     ` (8 preceding siblings ...)
  2024-10-01 12:40   ` [PATCH v3 09/18] common/cnxk: add RSS support " Nithin Dabilpuram
@ 2024-10-01 12:40   ` Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 11/18] net/cnxk: support Rx function select for cn20k Nithin Dabilpuram
                     ` (8 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-10-01 12:40 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra, Anatoly Burakov
  Cc: dev, Rakesh Kudurumalla, Rahul Bhansali

Add cn20k base control path support for ethdev.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
---
 doc/guides/rel_notes/release_24_11.rst |   4 +
 drivers/net/cnxk/cn20k_ethdev.c        | 553 +++++++++++++++++++++++++
 drivers/net/cnxk/cn20k_ethdev.h        |  11 +
 drivers/net/cnxk/cn20k_rx.h            |  33 ++
 drivers/net/cnxk/cn20k_rxtx.h          |  89 ++++
 drivers/net/cnxk/cn20k_tx.h            |  35 ++
 drivers/net/cnxk/cnxk_ethdev_dp.h      |   3 +
 drivers/net/cnxk/meson.build           |  11 +-
 8 files changed, 738 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/cnxk/cn20k_ethdev.c
 create mode 100644 drivers/net/cnxk/cn20k_ethdev.h
 create mode 100644 drivers/net/cnxk/cn20k_rx.h
 create mode 100644 drivers/net/cnxk/cn20k_rxtx.h
 create mode 100644 drivers/net/cnxk/cn20k_tx.h

diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index edcfcaa25a..023f357d94 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -59,6 +59,10 @@ New Features
 
   * Added mempool driver support for CN20K SoC.
 
+* **Updated Marvell cnxk net driver.**
+
+  * Added ethdev driver support for CN20K SoC.
+
 
 Removed Items
 -------------
diff --git a/drivers/net/cnxk/cn20k_ethdev.c b/drivers/net/cnxk/cn20k_ethdev.c
new file mode 100644
index 0000000000..5bd7a43353
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_ethdev.c
@@ -0,0 +1,553 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+#include "cn20k_ethdev.h"
+#include "cn20k_rx.h"
+#include "cn20k_tx.h"
+
+static int
+cn20k_nix_ptypes_set(struct rte_eth_dev *eth_dev, uint32_t ptype_mask)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	if (ptype_mask) {
+		dev->rx_offload_flags |= NIX_RX_OFFLOAD_PTYPE_F;
+		dev->ptype_disable = 0;
+	} else {
+		dev->rx_offload_flags &= ~NIX_RX_OFFLOAD_PTYPE_F;
+		dev->ptype_disable = 1;
+	}
+
+	return 0;
+}
+
+static void
+nix_form_default_desc(struct cnxk_eth_dev *dev, struct cn20k_eth_txq *txq, uint16_t qid)
+{
+	union nix_send_hdr_w0_u send_hdr_w0;
+
+	/* Initialize the fields based on basic single segment packet */
+	send_hdr_w0.u = 0;
+	if (dev->tx_offload_flags & NIX_TX_NEED_EXT_HDR) {
+		/* 2(HDR) + 2(EXT_HDR) + 1(SG) + 1(IOVA) = 6/2 - 1 = 2 */
+		send_hdr_w0.sizem1 = 2;
+		if (dev->tx_offload_flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+			/* Default: one seg packet would have:
+			 * 2(HDR) + 2(EXT) + 1(SG) + 1(IOVA) + 2(MEM)
+			 * => 8/2 - 1 = 3
+			 */
+			send_hdr_w0.sizem1 = 3;
+
+			/* To calculate the offset for send_mem,
+			 * send_hdr->w0.sizem1 * 2
+			 */
+			txq->ts_mem = dev->tstamp.tx_tstamp_iova;
+		}
+	} else {
+		/* 2(HDR) + 1(SG) + 1(IOVA) = 4/2 - 1 = 1 */
+		send_hdr_w0.sizem1 = 1;
+	}
+	send_hdr_w0.sq = qid;
+	txq->send_hdr_w0 = send_hdr_w0.u;
+	rte_wmb();
+}
+
+static int
+cn20k_nix_tx_compl_setup(struct cnxk_eth_dev *dev, struct cn20k_eth_txq *txq, struct roc_nix_sq *sq,
+			 uint16_t nb_desc)
+{
+	struct roc_nix_cq *cq;
+
+	cq = &dev->cqs[sq->cqid];
+	txq->tx_compl.desc_base = (uintptr_t)cq->desc_base;
+	txq->tx_compl.cq_door = cq->door;
+	txq->tx_compl.cq_status = cq->status;
+	txq->tx_compl.wdata = cq->wdata;
+	txq->tx_compl.head = cq->head;
+	txq->tx_compl.qmask = cq->qmask;
+	/* Total array size holding buffers is equal to
+	 * number of entries in cq and sq
+	 * max buffer in array = desc in cq + desc in sq
+	 */
+	txq->tx_compl.nb_desc_mask = (2 * rte_align32pow2(nb_desc)) - 1;
+	txq->tx_compl.ena = true;
+
+	txq->tx_compl.ptr = (struct rte_mbuf **)plt_zmalloc(txq->tx_compl.nb_desc_mask *
+							    sizeof(struct rte_mbuf *), 0);
+	if (!txq->tx_compl.ptr)
+		return -1;
+
+	return 0;
+}
+
+static void
+cn20k_nix_tx_queue_release(struct rte_eth_dev *eth_dev, uint16_t qid)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct roc_nix *nix = &dev->nix;
+	struct cn20k_eth_txq *txq;
+
+	cnxk_nix_tx_queue_release(eth_dev, qid);
+	txq = eth_dev->data->tx_queues[qid];
+
+	if (nix->tx_compl_ena)
+		plt_free(txq->tx_compl.ptr);
+}
+
+static int
+cn20k_nix_tx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid, uint16_t nb_desc,
+			 unsigned int socket, const struct rte_eth_txconf *tx_conf)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct roc_nix *nix = &dev->nix;
+	uint64_t mark_fmt, mark_flag;
+	struct roc_cpt_lf *inl_lf;
+	struct cn20k_eth_txq *txq;
+	struct roc_nix_sq *sq;
+	uint16_t crypto_qid;
+	int rc;
+
+	RTE_SET_USED(socket);
+
+	/* Common Tx queue setup */
+	rc = cnxk_nix_tx_queue_setup(eth_dev, qid, nb_desc, sizeof(struct cn20k_eth_txq), tx_conf);
+	if (rc)
+		return rc;
+
+	sq = &dev->sqs[qid];
+	/* Update fast path queue */
+	txq = eth_dev->data->tx_queues[qid];
+	txq->fc_mem = sq->fc;
+	if (nix->tx_compl_ena) {
+		rc = cn20k_nix_tx_compl_setup(dev, txq, sq, nb_desc);
+		if (rc)
+			return rc;
+	}
+
+	/* Set Txq flag for MT_LOCKFREE */
+	txq->flag = !!(dev->tx_offloads & RTE_ETH_TX_OFFLOAD_MT_LOCKFREE);
+
+	/* Store lmt base in tx queue for easy access */
+	txq->lmt_base = nix->lmt_base;
+	txq->io_addr = sq->io_addr;
+	txq->nb_sqb_bufs_adj = sq->nb_sqb_bufs_adj;
+	txq->sqes_per_sqb_log2 = sq->sqes_per_sqb_log2;
+
+	/* Fetch CPT LF info for outbound if present */
+	if (dev->outb.lf_base) {
+		crypto_qid = qid % dev->outb.nb_crypto_qs;
+		inl_lf = dev->outb.lf_base + crypto_qid;
+
+		txq->cpt_io_addr = inl_lf->io_addr;
+		txq->cpt_fc = inl_lf->fc_addr;
+		txq->cpt_fc_sw = (int32_t *)((uintptr_t)dev->outb.fc_sw_mem +
+					     crypto_qid * RTE_CACHE_LINE_SIZE);
+
+		txq->cpt_desc = inl_lf->nb_desc * 0.7;
+		txq->sa_base = (uint64_t)dev->outb.sa_base;
+		txq->sa_base |= (uint64_t)eth_dev->data->port_id;
+		PLT_STATIC_ASSERT(BIT_ULL(16) == ROC_NIX_INL_SA_BASE_ALIGN);
+	}
+
+	/* Restore marking flag from roc */
+	mark_fmt = roc_nix_tm_mark_format_get(nix, &mark_flag);
+	txq->mark_flag = mark_flag & CNXK_TM_MARK_MASK;
+	txq->mark_fmt = mark_fmt & CNXK_TX_MARK_FMT_MASK;
+
+	nix_form_default_desc(dev, txq, qid);
+	txq->lso_tun_fmt = dev->lso_tun_fmt;
+	return 0;
+}
+
+static int
+cn20k_nix_rx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid, uint16_t nb_desc,
+			 unsigned int socket, const struct rte_eth_rxconf *rx_conf,
+			 struct rte_mempool *mp)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct cn20k_eth_rxq *rxq;
+	struct roc_nix_rq *rq;
+	struct roc_nix_cq *cq;
+	int rc;
+
+	RTE_SET_USED(socket);
+
+	/* CQ Errata needs min 4K ring */
+	if (dev->cq_min_4k && nb_desc < 4096)
+		nb_desc = 4096;
+
+	/* Common Rx queue setup */
+	rc = cnxk_nix_rx_queue_setup(eth_dev, qid, nb_desc, sizeof(struct cn20k_eth_rxq), rx_conf,
+				     mp);
+	if (rc)
+		return rc;
+
+	/* Do initial mtu setup for RQ0 before device start */
+	if (!qid) {
+		rc = nix_recalc_mtu(eth_dev);
+		if (rc)
+			return rc;
+	}
+
+	rq = &dev->rqs[qid];
+	cq = &dev->cqs[qid];
+
+	/* Update fast path queue */
+	rxq = eth_dev->data->rx_queues[qid];
+	rxq->rq = qid;
+	rxq->desc = (uintptr_t)cq->desc_base;
+	rxq->cq_door = cq->door;
+	rxq->cq_status = cq->status;
+	rxq->wdata = cq->wdata;
+	rxq->head = cq->head;
+	rxq->qmask = cq->qmask;
+	rxq->tstamp = &dev->tstamp;
+
+	/* Data offset from data to start of mbuf is first_skip */
+	rxq->data_off = rq->first_skip;
+	rxq->mbuf_initializer = cnxk_nix_rxq_mbuf_setup(dev);
+
+	/* Setup security related info */
+	if (dev->rx_offload_flags & NIX_RX_OFFLOAD_SECURITY_F) {
+		rxq->lmt_base = dev->nix.lmt_base;
+		rxq->sa_base = roc_nix_inl_inb_sa_base_get(&dev->nix, dev->inb.inl_dev);
+	}
+
+	/* Lookup mem */
+	rxq->lookup_mem = cnxk_nix_fastpath_lookup_mem_get();
+	return 0;
+}
+
+static int
+cn20k_nix_tx_queue_stop(struct rte_eth_dev *eth_dev, uint16_t qidx)
+{
+	struct cn20k_eth_txq *txq = eth_dev->data->tx_queues[qidx];
+	int rc;
+
+	rc = cnxk_nix_tx_queue_stop(eth_dev, qidx);
+	if (rc)
+		return rc;
+
+	/* Clear fc cache pkts to trigger worker stop */
+	txq->fc_cache_pkts = 0;
+
+	return 0;
+}
+
+static int
+cn20k_nix_configure(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	int rc;
+
+	/* Common nix configure */
+	rc = cnxk_nix_configure(eth_dev);
+	if (rc)
+		return rc;
+
+	/* reset reassembly dynfield/flag offset */
+	dev->reass_dynfield_off = -1;
+	dev->reass_dynflag_bit = -1;
+
+	plt_nix_dbg("Configured port%d platform specific rx_offload_flags=%x"
+		    " tx_offload_flags=0x%x",
+		    eth_dev->data->port_id, dev->rx_offload_flags, dev->tx_offload_flags);
+	return 0;
+}
+
+static int
+cn20k_nix_timesync_enable(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	int i, rc;
+
+	rc = cnxk_nix_timesync_enable(eth_dev);
+	if (rc)
+		return rc;
+
+	dev->rx_offload_flags |= NIX_RX_OFFLOAD_TSTAMP_F;
+	dev->tx_offload_flags |= NIX_TX_OFFLOAD_TSTAMP_F;
+
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
+		nix_form_default_desc(dev, eth_dev->data->tx_queues[i], i);
+
+	return 0;
+}
+
+static int
+cn20k_nix_timesync_disable(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	int i, rc;
+
+	rc = cnxk_nix_timesync_disable(eth_dev);
+	if (rc)
+		return rc;
+
+	dev->rx_offload_flags &= ~NIX_RX_OFFLOAD_TSTAMP_F;
+	dev->tx_offload_flags &= ~NIX_TX_OFFLOAD_TSTAMP_F;
+
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
+		nix_form_default_desc(dev, eth_dev->data->tx_queues[i], i);
+
+	return 0;
+}
+
+static int
+cn20k_nix_timesync_read_tx_timestamp(struct rte_eth_dev *eth_dev, struct timespec *timestamp)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct cnxk_timesync_info *tstamp = &dev->tstamp;
+	uint64_t ns;
+
+	if (*tstamp->tx_tstamp == 0)
+		return -EINVAL;
+
+	*tstamp->tx_tstamp =
+		((*tstamp->tx_tstamp >> 32) * NSEC_PER_SEC) + (*tstamp->tx_tstamp & 0xFFFFFFFFUL);
+	ns = rte_timecounter_update(&dev->tx_tstamp_tc, *tstamp->tx_tstamp);
+	*timestamp = rte_ns_to_timespec(ns);
+	*tstamp->tx_tstamp = 0;
+	rte_wmb();
+
+	return 0;
+}
+
+static int
+cn20k_nix_dev_start(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct roc_nix *nix = &dev->nix;
+	int rc;
+
+	/* Common eth dev start */
+	rc = cnxk_nix_dev_start(eth_dev);
+	if (rc)
+		return rc;
+
+	/* Set flags for Rx Inject feature */
+	if (roc_idev_nix_rx_inject_get(nix->port_id))
+		dev->rx_offload_flags |= NIX_RX_SEC_REASSEMBLY_F;
+
+	return 0;
+}
+
+static int
+cn20k_nix_reassembly_capability_get(struct rte_eth_dev *eth_dev,
+				    struct rte_eth_ip_reassembly_params *reassembly_capa)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	int rc = -ENOTSUP;
+	RTE_SET_USED(eth_dev);
+
+	if (!roc_feature_nix_has_reass())
+		return -ENOTSUP;
+
+	if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SECURITY) {
+		reassembly_capa->timeout_ms = 60 * 1000;
+		reassembly_capa->max_frags = 4;
+		reassembly_capa->flags =
+			RTE_ETH_DEV_REASSEMBLY_F_IPV4 | RTE_ETH_DEV_REASSEMBLY_F_IPV6;
+		rc = 0;
+	}
+
+	return rc;
+}
+
+static int
+cn20k_nix_reassembly_conf_get(struct rte_eth_dev *eth_dev,
+			      struct rte_eth_ip_reassembly_params *conf)
+{
+	RTE_SET_USED(eth_dev);
+	RTE_SET_USED(conf);
+	return -ENOTSUP;
+}
+
+static int
+cn20k_nix_reassembly_conf_set(struct rte_eth_dev *eth_dev,
+			      const struct rte_eth_ip_reassembly_params *conf)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	int rc = 0;
+
+	if (!roc_feature_nix_has_reass())
+		return -ENOTSUP;
+
+	if (!conf->flags) {
+		/* Clear offload flags on disable */
+		if (!dev->inb.nb_oop)
+			dev->rx_offload_flags &= ~NIX_RX_REAS_F;
+		dev->inb.reass_en = false;
+		return 0;
+	}
+
+	rc = roc_nix_reassembly_configure(conf->timeout_ms, conf->max_frags);
+	if (!rc && dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SECURITY) {
+		dev->rx_offload_flags |= NIX_RX_REAS_F;
+		dev->inb.reass_en = true;
+	}
+
+	return rc;
+}
+
+static int
+cn20k_nix_rx_avail_get(struct cn20k_eth_rxq *rxq)
+{
+	uint32_t qmask = rxq->qmask;
+	uint64_t reg, head, tail;
+	int available;
+
+	/* Use LDADDA version to avoid reorder */
+	reg = roc_atomic64_add_sync(rxq->wdata, rxq->cq_status);
+	/* CQ_OP_STATUS operation error */
+	if (reg & BIT_ULL(NIX_CQ_OP_STAT_OP_ERR) || reg & BIT_ULL(NIX_CQ_OP_STAT_CQ_ERR))
+		return 0;
+	tail = reg & 0xFFFFF;
+	head = (reg >> 20) & 0xFFFFF;
+	if (tail < head)
+		available = tail - head + qmask + 1;
+	else
+		available = tail - head;
+
+	return available;
+}
+
+static int
+cn20k_rx_descriptor_dump(const struct rte_eth_dev *eth_dev, uint16_t qid, uint16_t offset,
+			 uint16_t num, FILE *file)
+{
+	struct cn20k_eth_rxq *rxq = eth_dev->data->rx_queues[qid];
+	const uint64_t data_off = rxq->data_off;
+	const uint32_t qmask = rxq->qmask;
+	const uintptr_t desc = rxq->desc;
+	struct cpt_parse_hdr_s *cpth;
+	uint32_t head = rxq->head;
+	struct nix_cqe_hdr_s *cq;
+	uint16_t count = 0;
+	int available_pkts;
+	uint64_t cq_w1;
+
+	available_pkts = cn20k_nix_rx_avail_get(rxq);
+
+	if ((offset + num - 1) >= available_pkts) {
+		plt_err("Invalid BD num=%u\n", num);
+		return -EINVAL;
+	}
+
+	while (count < num) {
+		cq = (struct nix_cqe_hdr_s *)(desc + CQE_SZ(head) + count + offset);
+		cq_w1 = *((const uint64_t *)cq + 1);
+		if (cq_w1 & BIT(11)) {
+			rte_iova_t buff = *((rte_iova_t *)((uint64_t *)cq + 9));
+			struct rte_mbuf *mbuf = (struct rte_mbuf *)(buff - data_off);
+			cpth = (struct cpt_parse_hdr_s *)((uintptr_t)mbuf + (uint16_t)data_off);
+			roc_cpt_parse_hdr_dump(file, cpth);
+		} else {
+			roc_nix_cqe_dump(file, cq);
+		}
+
+		count++;
+		head &= qmask;
+	}
+	return 0;
+}
+
+/* Update platform specific eth dev ops */
+static void
+nix_eth_dev_ops_override(void)
+{
+	static int init_once;
+
+	if (init_once)
+		return;
+	init_once = 1;
+
+	/* Update platform specific ops */
+	cnxk_eth_dev_ops.dev_configure = cn20k_nix_configure;
+	cnxk_eth_dev_ops.tx_queue_setup = cn20k_nix_tx_queue_setup;
+	cnxk_eth_dev_ops.rx_queue_setup = cn20k_nix_rx_queue_setup;
+	cnxk_eth_dev_ops.tx_queue_release = cn20k_nix_tx_queue_release;
+	cnxk_eth_dev_ops.tx_queue_stop = cn20k_nix_tx_queue_stop;
+	cnxk_eth_dev_ops.dev_start = cn20k_nix_dev_start;
+	cnxk_eth_dev_ops.dev_ptypes_set = cn20k_nix_ptypes_set;
+	cnxk_eth_dev_ops.timesync_enable = cn20k_nix_timesync_enable;
+	cnxk_eth_dev_ops.timesync_disable = cn20k_nix_timesync_disable;
+	cnxk_eth_dev_ops.timesync_read_tx_timestamp = cn20k_nix_timesync_read_tx_timestamp;
+	cnxk_eth_dev_ops.ip_reassembly_capability_get = cn20k_nix_reassembly_capability_get;
+	cnxk_eth_dev_ops.ip_reassembly_conf_get = cn20k_nix_reassembly_conf_get;
+	cnxk_eth_dev_ops.ip_reassembly_conf_set = cn20k_nix_reassembly_conf_set;
+	cnxk_eth_dev_ops.eth_rx_descriptor_dump = cn20k_rx_descriptor_dump;
+}
+
+/* Update platform specific tm ops */
+static void
+nix_tm_ops_override(void)
+{
+	static int init_once;
+
+	if (init_once)
+		return;
+	init_once = 1;
+}
+
+static int
+cn20k_nix_remove(struct rte_pci_device *pci_dev)
+{
+	return cnxk_nix_remove(pci_dev);
+}
+
+static int
+cn20k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
+{
+	struct rte_eth_dev *eth_dev;
+	int rc;
+
+	rc = roc_plt_init();
+	if (rc) {
+		plt_err("Failed to initialize platform model, rc=%d", rc);
+		return rc;
+	}
+
+	nix_eth_dev_ops_override();
+	nix_tm_ops_override();
+
+	/* Common probe */
+	rc = cnxk_nix_probe(pci_drv, pci_dev);
+	if (rc)
+		return rc;
+
+	/* Find eth dev allocated */
+	eth_dev = rte_eth_dev_allocated(pci_dev->device.name);
+	if (!eth_dev) {
+		/* Ignore if ethdev is in mid of detach state in secondary */
+		if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+			return 0;
+		return -ENOENT;
+	}
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return 0;
+
+	return 0;
+}
+
+static const struct rte_pci_id cn20k_pci_nix_map[] = {
+	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN20KA, PCI_DEVID_CNXK_RVU_PF),
+	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN20KA, PCI_DEVID_CNXK_RVU_VF),
+	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN20KA, PCI_DEVID_CNXK_RVU_AF_VF),
+	CNXK_PCI_ID(PCI_SUBSYSTEM_DEVID_CN20KA, PCI_DEVID_CNXK_RVU_SDP_VF),
+	{
+		.vendor_id = 0,
+	},
+};
+
+static struct rte_pci_driver cn20k_pci_nix = {
+	.id_table = cn20k_pci_nix_map,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA | RTE_PCI_DRV_INTR_LSC,
+	.probe = cn20k_nix_probe,
+	.remove = cn20k_nix_remove,
+};
+
+RTE_PMD_REGISTER_PCI(net_cn20k, cn20k_pci_nix);
+RTE_PMD_REGISTER_PCI_TABLE(net_cn20k, cn20k_pci_nix_map);
+RTE_PMD_REGISTER_KMOD_DEP(net_cn20k, "vfio-pci");
diff --git a/drivers/net/cnxk/cn20k_ethdev.h b/drivers/net/cnxk/cn20k_ethdev.h
new file mode 100644
index 0000000000..1af490befc
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_ethdev.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+#ifndef __CN20K_ETHDEV_H__
+#define __CN20K_ETHDEV_H__
+
+#include <cn20k_rxtx.h>
+#include <cnxk_ethdev.h>
+#include <cnxk_security.h>
+
+#endif /* __CN20K_ETHDEV_H__ */
diff --git a/drivers/net/cnxk/cn20k_rx.h b/drivers/net/cnxk/cn20k_rx.h
new file mode 100644
index 0000000000..58a2920a54
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_rx.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+#ifndef __CN20K_RX_H__
+#define __CN20K_RX_H__
+
+#include "cn20k_rxtx.h"
+#include <rte_ethdev.h>
+#include <rte_security_driver.h>
+#include <rte_vect.h>
+
+#define NSEC_PER_SEC 1000000000L
+
+#define NIX_RX_OFFLOAD_NONE	     (0)
+#define NIX_RX_OFFLOAD_RSS_F	     BIT(0)
+#define NIX_RX_OFFLOAD_PTYPE_F	     BIT(1)
+#define NIX_RX_OFFLOAD_CHECKSUM_F    BIT(2)
+#define NIX_RX_OFFLOAD_MARK_UPDATE_F BIT(3)
+#define NIX_RX_OFFLOAD_TSTAMP_F	     BIT(4)
+#define NIX_RX_OFFLOAD_VLAN_STRIP_F  BIT(5)
+#define NIX_RX_OFFLOAD_SECURITY_F    BIT(6)
+#define NIX_RX_OFFLOAD_MAX	     (NIX_RX_OFFLOAD_SECURITY_F << 1)
+
+/* Flags to control cqe_to_mbuf conversion function.
+ * Defining it from backwards to denote its been
+ * not used as offload flags to pick function
+ */
+#define NIX_RX_REAS_F	   BIT(12)
+#define NIX_RX_VWQE_F	   BIT(13)
+#define NIX_RX_MULTI_SEG_F BIT(14)
+
+#define NIX_RX_SEC_REASSEMBLY_F (NIX_RX_REAS_F | NIX_RX_OFFLOAD_SECURITY_F)
+#endif /* __CN20K_RX_H__ */
diff --git a/drivers/net/cnxk/cn20k_rxtx.h b/drivers/net/cnxk/cn20k_rxtx.h
new file mode 100644
index 0000000000..5cc445d4b1
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_rxtx.h
@@ -0,0 +1,89 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#ifndef __CN20K_RXTX_H__
+#define __CN20K_RXTX_H__
+
+#include <rte_security.h>
+
+/* ROC Constants */
+#include "roc_constants.h"
+
+/* Platform definition */
+#include "roc_platform.h"
+
+/* IO */
+#if defined(__aarch64__)
+#include "roc_io.h"
+#else
+#include "roc_io_generic.h"
+#endif
+
+/* HW structure definition */
+#include "hw/cpt.h"
+#include "hw/nix.h"
+#include "hw/npa.h"
+#include "hw/npc.h"
+#include "hw/ssow.h"
+
+#include "roc_ie_ot.h"
+
+/* NPA */
+#include "roc_npa_dp.h"
+
+/* SSO */
+#include "roc_sso_dp.h"
+
+/* CPT */
+#include "roc_cpt.h"
+
+/* NIX Inline dev */
+#include "roc_nix_inl_dp.h"
+
+#include "cnxk_ethdev_dp.h"
+
+struct cn20k_eth_txq {
+	uint64_t send_hdr_w0;
+	int64_t fc_cache_pkts;
+	uint64_t *fc_mem;
+	uintptr_t lmt_base;
+	rte_iova_t io_addr;
+	uint16_t sqes_per_sqb_log2;
+	int16_t nb_sqb_bufs_adj;
+	uint8_t flag;
+	rte_iova_t cpt_io_addr;
+	uint64_t sa_base;
+	uint64_t *cpt_fc;
+	uint16_t cpt_desc;
+	int32_t *cpt_fc_sw;
+	uint64_t lso_tun_fmt;
+	uint64_t ts_mem;
+	uint64_t mark_flag : 8;
+	uint64_t mark_fmt : 48;
+	struct cnxk_eth_txq_comp tx_compl;
+} __plt_cache_aligned;
+
+struct cn20k_eth_rxq {
+	uint64_t mbuf_initializer;
+	uintptr_t desc;
+	void *lookup_mem;
+	uintptr_t cq_door;
+	uint64_t wdata;
+	int64_t *cq_status;
+	uint32_t head;
+	uint32_t qmask;
+	uint32_t available;
+	uint16_t data_off;
+	uint64_t sa_base;
+	uint64_t lmt_base;
+	uint64_t meta_aura;
+	uintptr_t meta_pool;
+	uint16_t rq;
+	struct cnxk_timesync_info *tstamp;
+} __plt_cache_aligned;
+
+#define LMT_OFF(lmt_addr, lmt_num, offset)                                                         \
+	(void *)((uintptr_t)(lmt_addr) + ((uint64_t)(lmt_num) << ROC_LMT_LINE_SIZE_LOG2) + (offset))
+
+#endif /* __CN20K_RXTX_H__ */
diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
new file mode 100644
index 0000000000..a00c9d5776
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+#ifndef __CN20K_TX_H__
+#define __CN20K_TX_H__
+
+#include "cn20k_rxtx.h"
+#include <rte_eventdev.h>
+#include <rte_vect.h>
+
+#define NIX_TX_OFFLOAD_NONE	      (0)
+#define NIX_TX_OFFLOAD_L3_L4_CSUM_F   BIT(0)
+#define NIX_TX_OFFLOAD_OL3_OL4_CSUM_F BIT(1)
+#define NIX_TX_OFFLOAD_VLAN_QINQ_F    BIT(2)
+#define NIX_TX_OFFLOAD_MBUF_NOFF_F    BIT(3)
+#define NIX_TX_OFFLOAD_TSO_F	      BIT(4)
+#define NIX_TX_OFFLOAD_TSTAMP_F	      BIT(5)
+#define NIX_TX_OFFLOAD_SECURITY_F     BIT(6)
+#define NIX_TX_OFFLOAD_MAX	      (NIX_TX_OFFLOAD_SECURITY_F << 1)
+
+/* Flags to control xmit_prepare function.
+ * Defining it from backwards to denote its been
+ * not used as offload flags to pick function
+ */
+#define NIX_TX_VWQE_F	   BIT(14)
+#define NIX_TX_MULTI_SEG_F BIT(15)
+
+#define NIX_TX_NEED_SEND_HDR_W1                                                                    \
+	(NIX_TX_OFFLOAD_L3_L4_CSUM_F | NIX_TX_OFFLOAD_OL3_OL4_CSUM_F |                             \
+	 NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSO_F)
+
+#define NIX_TX_NEED_EXT_HDR                                                                        \
+	(NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSTAMP_F | NIX_TX_OFFLOAD_TSO_F)
+
+#endif /* __CN20K_TX_H__ */
diff --git a/drivers/net/cnxk/cnxk_ethdev_dp.h b/drivers/net/cnxk/cnxk_ethdev_dp.h
index 119bb1836a..100d22e759 100644
--- a/drivers/net/cnxk/cnxk_ethdev_dp.h
+++ b/drivers/net/cnxk/cnxk_ethdev_dp.h
@@ -59,6 +59,9 @@
 
 #define CNXK_TX_MARK_FMT_MASK (0xFFFFFFFFFFFFull)
 
+#define CNXK_NIX_CQ_ENTRY_SZ 128
+#define CQE_SZ(x)            ((x) * CNXK_NIX_CQ_ENTRY_SZ)
+
 struct cnxk_eth_txq_comp {
 	uintptr_t desc_base;
 	uintptr_t cq_door;
diff --git a/drivers/net/cnxk/meson.build b/drivers/net/cnxk/meson.build
index 7bce80098a..cf2ce09f77 100644
--- a/drivers/net/cnxk/meson.build
+++ b/drivers/net/cnxk/meson.build
@@ -14,7 +14,7 @@ else
         soc_type = platform
 endif
 
-if soc_type != 'cn9k' and soc_type != 'cn10k'
+if soc_type != 'cn9k' and soc_type != 'cn10k' and soc_type != 'cn20k'
         soc_type = 'all'
 endif
 
@@ -231,6 +231,15 @@ sources += files(
 endif
 endif
 
+
+if soc_type == 'cn20k' or soc_type == 'all'
+# CN20K
+sources += files(
+        'cn20k_ethdev.c',
+)
+endif
+
+
 deps += ['bus_pci', 'cryptodev', 'eventdev', 'security']
 deps += ['common_cnxk', 'mempool_cnxk']
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v3 11/18] net/cnxk: support Rx function select for cn20k
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
                     ` (9 preceding siblings ...)
  2024-10-01 12:40   ` [PATCH v3 10/18] net/cnxk: add cn20k base control path support Nithin Dabilpuram
@ 2024-10-01 12:40   ` Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 12/18] net/cnxk: support Tx " Nithin Dabilpuram
                     ` (7 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-10-01 12:40 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra, Anatoly Burakov
  Cc: dev

Add support to select Rx function based on offload flags
for cn20k.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
---
 drivers/net/cnxk/cn20k_ethdev.c               |  59 ++++-
 drivers/net/cnxk/cn20k_ethdev.h               |   3 +
 drivers/net/cnxk/cn20k_rx.h                   | 226 ++++++++++++++++++
 drivers/net/cnxk/cn20k_rx_select.c            | 162 +++++++++++++
 drivers/net/cnxk/meson.build                  |  44 ++++
 drivers/net/cnxk/rx/cn20k/rx_0_15.c           |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c      |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c       |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c  |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_112_127.c        |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c   |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c    |  20 ++
 .../net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c   |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_16_31.c          |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c     |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c      |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_32_47.c          |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c     |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c      |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_48_63.c          |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c     |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c      |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_64_79.c          |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c     |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c      |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_80_95.c          |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c     |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c      |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_96_111.c         |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c    |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c     |  20 ++
 .../net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c    |  20 ++
 drivers/net/cnxk/rx/cn20k/rx_all_offload.c    |  55 +++++
 38 files changed, 1188 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/cnxk/cn20k_rx_select.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c
 create mode 100644 drivers/net/cnxk/rx/cn20k/rx_all_offload.c

diff --git a/drivers/net/cnxk/cn20k_ethdev.c b/drivers/net/cnxk/cn20k_ethdev.c
index 5bd7a43353..545634a70e 100644
--- a/drivers/net/cnxk/cn20k_ethdev.c
+++ b/drivers/net/cnxk/cn20k_ethdev.c
@@ -5,6 +5,41 @@
 #include "cn20k_rx.h"
 #include "cn20k_tx.h"
 
+static uint16_t
+nix_rx_offload_flags(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct rte_eth_dev_data *data = eth_dev->data;
+	struct rte_eth_conf *conf = &data->dev_conf;
+	struct rte_eth_rxmode *rxmode = &conf->rxmode;
+	uint16_t flags = 0;
+
+	if (rxmode->mq_mode == RTE_ETH_MQ_RX_RSS &&
+	    (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH))
+		flags |= NIX_RX_OFFLOAD_RSS_F;
+
+	if (dev->rx_offloads & (RTE_ETH_RX_OFFLOAD_TCP_CKSUM | RTE_ETH_RX_OFFLOAD_UDP_CKSUM))
+		flags |= NIX_RX_OFFLOAD_CHECKSUM_F;
+
+	if (dev->rx_offloads &
+	    (RTE_ETH_RX_OFFLOAD_IPV4_CKSUM | RTE_ETH_RX_OFFLOAD_OUTER_IPV4_CKSUM))
+		flags |= NIX_RX_OFFLOAD_CHECKSUM_F;
+
+	if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER)
+		flags |= NIX_RX_MULTI_SEG_F;
+
+	if ((dev->rx_offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP))
+		flags |= NIX_RX_OFFLOAD_TSTAMP_F;
+
+	if (!dev->ptype_disable)
+		flags |= NIX_RX_OFFLOAD_PTYPE_F;
+
+	if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SECURITY)
+		flags |= NIX_RX_OFFLOAD_SECURITY_F;
+
+	return flags;
+}
+
 static int
 cn20k_nix_ptypes_set(struct rte_eth_dev *eth_dev, uint32_t ptype_mask)
 {
@@ -18,6 +53,7 @@ cn20k_nix_ptypes_set(struct rte_eth_dev *eth_dev, uint32_t ptype_mask)
 		dev->ptype_disable = 1;
 	}
 
+	cn20k_eth_set_rx_function(eth_dev);
 	return 0;
 }
 
@@ -187,6 +223,9 @@ cn20k_nix_rx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid, uint16_t nb_
 		rc = nix_recalc_mtu(eth_dev);
 		if (rc)
 			return rc;
+
+		/* Update offload flags */
+		dev->rx_offload_flags = nix_rx_offload_flags(eth_dev);
 	}
 
 	rq = &dev->rqs[qid];
@@ -245,6 +284,8 @@ cn20k_nix_configure(struct rte_eth_dev *eth_dev)
 	if (rc)
 		return rc;
 
+	/* Update offload flags */
+	dev->rx_offload_flags = nix_rx_offload_flags(eth_dev);
 	/* reset reassembly dynfield/flag offset */
 	dev->reass_dynfield_off = -1;
 	dev->reass_dynflag_bit = -1;
@@ -271,6 +312,10 @@ cn20k_nix_timesync_enable(struct rte_eth_dev *eth_dev)
 	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
 		nix_form_default_desc(dev, eth_dev->data->tx_queues[i], i);
 
+	/* Setting up the rx[tx]_offload_flags due to change
+	 * in rx[tx]_offloads.
+	 */
+	cn20k_eth_set_rx_function(eth_dev);
 	return 0;
 }
 
@@ -290,6 +335,10 @@ cn20k_nix_timesync_disable(struct rte_eth_dev *eth_dev)
 	for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
 		nix_form_default_desc(dev, eth_dev->data->tx_queues[i], i);
 
+	/* Setting up the rx[tx]_offload_flags due to change
+	 * in rx[tx]_offloads.
+	 */
+	cn20k_eth_set_rx_function(eth_dev);
 	return 0;
 }
 
@@ -325,10 +374,15 @@ cn20k_nix_dev_start(struct rte_eth_dev *eth_dev)
 	if (rc)
 		return rc;
 
+	/* Setting up the rx[tx]_offload_flags due to change
+	 * in rx[tx]_offloads.
+	 */
+	dev->rx_offload_flags |= nix_rx_offload_flags(eth_dev);
 	/* Set flags for Rx Inject feature */
 	if (roc_idev_nix_rx_inject_get(nix->port_id))
 		dev->rx_offload_flags |= NIX_RX_SEC_REASSEMBLY_F;
 
+	cn20k_eth_set_rx_function(eth_dev);
 	return 0;
 }
 
@@ -525,8 +579,11 @@ cn20k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		return -ENOENT;
 	}
 
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		/* Setup callbacks for secondary process */
+		cn20k_eth_set_rx_function(eth_dev);
 		return 0;
+	}
 
 	return 0;
 }
diff --git a/drivers/net/cnxk/cn20k_ethdev.h b/drivers/net/cnxk/cn20k_ethdev.h
index 1af490befc..2049ee7fa4 100644
--- a/drivers/net/cnxk/cn20k_ethdev.h
+++ b/drivers/net/cnxk/cn20k_ethdev.h
@@ -8,4 +8,7 @@
 #include <cnxk_ethdev.h>
 #include <cnxk_security.h>
 
+/* Rx and Tx routines */
+void cn20k_eth_set_rx_function(struct rte_eth_dev *eth_dev);
+
 #endif /* __CN20K_ETHDEV_H__ */
diff --git a/drivers/net/cnxk/cn20k_rx.h b/drivers/net/cnxk/cn20k_rx.h
index 58a2920a54..2cb77c0b46 100644
--- a/drivers/net/cnxk/cn20k_rx.h
+++ b/drivers/net/cnxk/cn20k_rx.h
@@ -30,4 +30,230 @@
 #define NIX_RX_MULTI_SEG_F BIT(14)
 
 #define NIX_RX_SEC_REASSEMBLY_F (NIX_RX_REAS_F | NIX_RX_OFFLOAD_SECURITY_F)
+
+#define RSS_F	  NIX_RX_OFFLOAD_RSS_F
+#define PTYPE_F	  NIX_RX_OFFLOAD_PTYPE_F
+#define CKSUM_F	  NIX_RX_OFFLOAD_CHECKSUM_F
+#define MARK_F	  NIX_RX_OFFLOAD_MARK_UPDATE_F
+#define TS_F	  NIX_RX_OFFLOAD_TSTAMP_F
+#define RX_VLAN_F NIX_RX_OFFLOAD_VLAN_STRIP_F
+#define R_SEC_F	  NIX_RX_OFFLOAD_SECURITY_F
+
+/* [R_SEC_F] [RX_VLAN_F] [TS] [MARK] [CKSUM] [PTYPE] [RSS] */
+#define NIX_RX_FASTPATH_MODES_0_15                                                                 \
+	R(no_offload, NIX_RX_OFFLOAD_NONE)                                                         \
+	R(rss, RSS_F)                                                                              \
+	R(ptype, PTYPE_F)                                                                          \
+	R(ptype_rss, PTYPE_F | RSS_F)                                                              \
+	R(cksum, CKSUM_F)                                                                          \
+	R(cksum_rss, CKSUM_F | RSS_F)                                                              \
+	R(cksum_ptype, CKSUM_F | PTYPE_F)                                                          \
+	R(cksum_ptype_rss, CKSUM_F | PTYPE_F | RSS_F)                                              \
+	R(mark, MARK_F)                                                                            \
+	R(mark_rss, MARK_F | RSS_F)                                                                \
+	R(mark_ptype, MARK_F | PTYPE_F)                                                            \
+	R(mark_ptype_rss, MARK_F | PTYPE_F | RSS_F)                                                \
+	R(mark_cksum, MARK_F | CKSUM_F)                                                            \
+	R(mark_cksum_rss, MARK_F | CKSUM_F | RSS_F)                                                \
+	R(mark_cksum_ptype, MARK_F | CKSUM_F | PTYPE_F)                                            \
+	R(mark_cksum_ptype_rss, MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_16_31                                                                \
+	R(ts, TS_F)                                                                                \
+	R(ts_rss, TS_F | RSS_F)                                                                    \
+	R(ts_ptype, TS_F | PTYPE_F)                                                                \
+	R(ts_ptype_rss, TS_F | PTYPE_F | RSS_F)                                                    \
+	R(ts_cksum, TS_F | CKSUM_F)                                                                \
+	R(ts_cksum_rss, TS_F | CKSUM_F | RSS_F)                                                    \
+	R(ts_cksum_ptype, TS_F | CKSUM_F | PTYPE_F)                                                \
+	R(ts_cksum_ptype_rss, TS_F | CKSUM_F | PTYPE_F | RSS_F)                                    \
+	R(ts_mark, TS_F | MARK_F)                                                                  \
+	R(ts_mark_rss, TS_F | MARK_F | RSS_F)                                                      \
+	R(ts_mark_ptype, TS_F | MARK_F | PTYPE_F)                                                  \
+	R(ts_mark_ptype_rss, TS_F | MARK_F | PTYPE_F | RSS_F)                                      \
+	R(ts_mark_cksum, TS_F | MARK_F | CKSUM_F)                                                  \
+	R(ts_mark_cksum_rss, TS_F | MARK_F | CKSUM_F | RSS_F)                                      \
+	R(ts_mark_cksum_ptype, TS_F | MARK_F | CKSUM_F | PTYPE_F)                                  \
+	R(ts_mark_cksum_ptype_rss, TS_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_32_47                                                                \
+	R(vlan, RX_VLAN_F)                                                                         \
+	R(vlan_rss, RX_VLAN_F | RSS_F)                                                             \
+	R(vlan_ptype, RX_VLAN_F | PTYPE_F)                                                         \
+	R(vlan_ptype_rss, RX_VLAN_F | PTYPE_F | RSS_F)                                             \
+	R(vlan_cksum, RX_VLAN_F | CKSUM_F)                                                         \
+	R(vlan_cksum_rss, RX_VLAN_F | CKSUM_F | RSS_F)                                             \
+	R(vlan_cksum_ptype, RX_VLAN_F | CKSUM_F | PTYPE_F)                                         \
+	R(vlan_cksum_ptype_rss, RX_VLAN_F | CKSUM_F | PTYPE_F | RSS_F)                             \
+	R(vlan_mark, RX_VLAN_F | MARK_F)                                                           \
+	R(vlan_mark_rss, RX_VLAN_F | MARK_F | RSS_F)                                               \
+	R(vlan_mark_ptype, RX_VLAN_F | MARK_F | PTYPE_F)                                           \
+	R(vlan_mark_ptype_rss, RX_VLAN_F | MARK_F | PTYPE_F | RSS_F)                               \
+	R(vlan_mark_cksum, RX_VLAN_F | MARK_F | CKSUM_F)                                           \
+	R(vlan_mark_cksum_rss, RX_VLAN_F | MARK_F | CKSUM_F | RSS_F)                               \
+	R(vlan_mark_cksum_ptype, RX_VLAN_F | MARK_F | CKSUM_F | PTYPE_F)                           \
+	R(vlan_mark_cksum_ptype_rss, RX_VLAN_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_48_63                                                                \
+	R(vlan_ts, RX_VLAN_F | TS_F)                                                               \
+	R(vlan_ts_rss, RX_VLAN_F | TS_F | RSS_F)                                                   \
+	R(vlan_ts_ptype, RX_VLAN_F | TS_F | PTYPE_F)                                               \
+	R(vlan_ts_ptype_rss, RX_VLAN_F | TS_F | PTYPE_F | RSS_F)                                   \
+	R(vlan_ts_cksum, RX_VLAN_F | TS_F | CKSUM_F)                                               \
+	R(vlan_ts_cksum_rss, RX_VLAN_F | TS_F | CKSUM_F | RSS_F)                                   \
+	R(vlan_ts_cksum_ptype, RX_VLAN_F | TS_F | CKSUM_F | PTYPE_F)                               \
+	R(vlan_ts_cksum_ptype_rss, RX_VLAN_F | TS_F | CKSUM_F | PTYPE_F | RSS_F)                   \
+	R(vlan_ts_mark, RX_VLAN_F | TS_F | MARK_F)                                                 \
+	R(vlan_ts_mark_rss, RX_VLAN_F | TS_F | MARK_F | RSS_F)                                     \
+	R(vlan_ts_mark_ptype, RX_VLAN_F | TS_F | MARK_F | PTYPE_F)                                 \
+	R(vlan_ts_mark_ptype_rss, RX_VLAN_F | TS_F | MARK_F | PTYPE_F | RSS_F)                     \
+	R(vlan_ts_mark_cksum, RX_VLAN_F | TS_F | MARK_F | CKSUM_F)                                 \
+	R(vlan_ts_mark_cksum_rss, RX_VLAN_F | TS_F | MARK_F | CKSUM_F | RSS_F)                     \
+	R(vlan_ts_mark_cksum_ptype, RX_VLAN_F | TS_F | MARK_F | CKSUM_F | PTYPE_F)                 \
+	R(vlan_ts_mark_cksum_ptype_rss, RX_VLAN_F | TS_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_64_79                                                                \
+	R(sec, R_SEC_F)                                                                            \
+	R(sec_rss, R_SEC_F | RSS_F)                                                                \
+	R(sec_ptype, R_SEC_F | PTYPE_F)                                                            \
+	R(sec_ptype_rss, R_SEC_F | PTYPE_F | RSS_F)                                                \
+	R(sec_cksum, R_SEC_F | CKSUM_F)                                                            \
+	R(sec_cksum_rss, R_SEC_F | CKSUM_F | RSS_F)                                                \
+	R(sec_cksum_ptype, R_SEC_F | CKSUM_F | PTYPE_F)                                            \
+	R(sec_cksum_ptype_rss, R_SEC_F | CKSUM_F | PTYPE_F | RSS_F)                                \
+	R(sec_mark, R_SEC_F | MARK_F)                                                              \
+	R(sec_mark_rss, R_SEC_F | MARK_F | RSS_F)                                                  \
+	R(sec_mark_ptype, R_SEC_F | MARK_F | PTYPE_F)                                              \
+	R(sec_mark_ptype_rss, R_SEC_F | MARK_F | PTYPE_F | RSS_F)                                  \
+	R(sec_mark_cksum, R_SEC_F | MARK_F | CKSUM_F)                                              \
+	R(sec_mark_cksum_rss, R_SEC_F | MARK_F | CKSUM_F | RSS_F)                                  \
+	R(sec_mark_cksum_ptype, R_SEC_F | MARK_F | CKSUM_F | PTYPE_F)                              \
+	R(sec_mark_cksum_ptype_rss, R_SEC_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_80_95                                                                \
+	R(sec_ts, R_SEC_F | TS_F)                                                                  \
+	R(sec_ts_rss, R_SEC_F | TS_F | RSS_F)                                                      \
+	R(sec_ts_ptype, R_SEC_F | TS_F | PTYPE_F)                                                  \
+	R(sec_ts_ptype_rss, R_SEC_F | TS_F | PTYPE_F | RSS_F)                                      \
+	R(sec_ts_cksum, R_SEC_F | TS_F | CKSUM_F)                                                  \
+	R(sec_ts_cksum_rss, R_SEC_F | TS_F | CKSUM_F | RSS_F)                                      \
+	R(sec_ts_cksum_ptype, R_SEC_F | TS_F | CKSUM_F | PTYPE_F)                                  \
+	R(sec_ts_cksum_ptype_rss, R_SEC_F | TS_F | CKSUM_F | PTYPE_F | RSS_F)                      \
+	R(sec_ts_mark, R_SEC_F | TS_F | MARK_F)                                                    \
+	R(sec_ts_mark_rss, R_SEC_F | TS_F | MARK_F | RSS_F)                                        \
+	R(sec_ts_mark_ptype, R_SEC_F | TS_F | MARK_F | PTYPE_F)                                    \
+	R(sec_ts_mark_ptype_rss, R_SEC_F | TS_F | MARK_F | PTYPE_F | RSS_F)                        \
+	R(sec_ts_mark_cksum, R_SEC_F | TS_F | MARK_F | CKSUM_F)                                    \
+	R(sec_ts_mark_cksum_rss, R_SEC_F | TS_F | MARK_F | CKSUM_F | RSS_F)                        \
+	R(sec_ts_mark_cksum_ptype, R_SEC_F | TS_F | MARK_F | CKSUM_F | PTYPE_F)                    \
+	R(sec_ts_mark_cksum_ptype_rss, R_SEC_F | TS_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_96_111                                                               \
+	R(sec_vlan, R_SEC_F | RX_VLAN_F)                                                           \
+	R(sec_vlan_rss, R_SEC_F | RX_VLAN_F | RSS_F)                                               \
+	R(sec_vlan_ptype, R_SEC_F | RX_VLAN_F | PTYPE_F)                                           \
+	R(sec_vlan_ptype_rss, R_SEC_F | RX_VLAN_F | PTYPE_F | RSS_F)                               \
+	R(sec_vlan_cksum, R_SEC_F | RX_VLAN_F | CKSUM_F)                                           \
+	R(sec_vlan_cksum_rss, R_SEC_F | RX_VLAN_F | CKSUM_F | RSS_F)                               \
+	R(sec_vlan_cksum_ptype, R_SEC_F | RX_VLAN_F | CKSUM_F | PTYPE_F)                           \
+	R(sec_vlan_cksum_ptype_rss, R_SEC_F | RX_VLAN_F | CKSUM_F | PTYPE_F | RSS_F)               \
+	R(sec_vlan_mark, R_SEC_F | RX_VLAN_F | MARK_F)                                             \
+	R(sec_vlan_mark_rss, R_SEC_F | RX_VLAN_F | MARK_F | RSS_F)                                 \
+	R(sec_vlan_mark_ptype, R_SEC_F | RX_VLAN_F | MARK_F | PTYPE_F)                             \
+	R(sec_vlan_mark_ptype_rss, R_SEC_F | RX_VLAN_F | MARK_F | PTYPE_F | RSS_F)                 \
+	R(sec_vlan_mark_cksum, R_SEC_F | RX_VLAN_F | MARK_F | CKSUM_F)                             \
+	R(sec_vlan_mark_cksum_rss, R_SEC_F | RX_VLAN_F | MARK_F | CKSUM_F | RSS_F)                 \
+	R(sec_vlan_mark_cksum_ptype, R_SEC_F | RX_VLAN_F | MARK_F | CKSUM_F | PTYPE_F)             \
+	R(sec_vlan_mark_cksum_ptype_rss, R_SEC_F | RX_VLAN_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES_112_127                                                              \
+	R(sec_vlan_ts, R_SEC_F | RX_VLAN_F | TS_F)                                                 \
+	R(sec_vlan_ts_rss, R_SEC_F | RX_VLAN_F | TS_F | RSS_F)                                     \
+	R(sec_vlan_ts_ptype, R_SEC_F | RX_VLAN_F | TS_F | PTYPE_F)                                 \
+	R(sec_vlan_ts_ptype_rss, R_SEC_F | RX_VLAN_F | TS_F | PTYPE_F | RSS_F)                     \
+	R(sec_vlan_ts_cksum, R_SEC_F | RX_VLAN_F | TS_F | CKSUM_F)                                 \
+	R(sec_vlan_ts_cksum_rss, R_SEC_F | RX_VLAN_F | TS_F | CKSUM_F | RSS_F)                     \
+	R(sec_vlan_ts_cksum_ptype, R_SEC_F | RX_VLAN_F | TS_F | CKSUM_F | PTYPE_F)                 \
+	R(sec_vlan_ts_cksum_ptype_rss, R_SEC_F | RX_VLAN_F | TS_F | CKSUM_F | PTYPE_F | RSS_F)     \
+	R(sec_vlan_ts_mark, R_SEC_F | RX_VLAN_F | TS_F | MARK_F)                                   \
+	R(sec_vlan_ts_mark_rss, R_SEC_F | RX_VLAN_F | TS_F | MARK_F | RSS_F)                       \
+	R(sec_vlan_ts_mark_ptype, R_SEC_F | RX_VLAN_F | TS_F | MARK_F | PTYPE_F)                   \
+	R(sec_vlan_ts_mark_ptype_rss, R_SEC_F | RX_VLAN_F | TS_F | MARK_F | PTYPE_F | RSS_F)       \
+	R(sec_vlan_ts_mark_cksum, R_SEC_F | RX_VLAN_F | TS_F | MARK_F | CKSUM_F)                   \
+	R(sec_vlan_ts_mark_cksum_rss, R_SEC_F | RX_VLAN_F | TS_F | MARK_F | CKSUM_F | RSS_F)       \
+	R(sec_vlan_ts_mark_cksum_ptype, R_SEC_F | RX_VLAN_F | TS_F | MARK_F | CKSUM_F | PTYPE_F)   \
+	R(sec_vlan_ts_mark_cksum_ptype_rss,                                                        \
+	  R_SEC_F | RX_VLAN_F | TS_F | MARK_F | CKSUM_F | PTYPE_F | RSS_F)
+
+#define NIX_RX_FASTPATH_MODES                                                                      \
+	NIX_RX_FASTPATH_MODES_0_15                                                                 \
+	NIX_RX_FASTPATH_MODES_16_31                                                                \
+	NIX_RX_FASTPATH_MODES_32_47                                                                \
+	NIX_RX_FASTPATH_MODES_48_63                                                                \
+	NIX_RX_FASTPATH_MODES_64_79                                                                \
+	NIX_RX_FASTPATH_MODES_80_95                                                                \
+	NIX_RX_FASTPATH_MODES_96_111                                                               \
+	NIX_RX_FASTPATH_MODES_112_127
+
+#define R(name, flags)                                                                             \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_##name(                              \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_mseg_##name(                         \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_vec_##name(                          \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_vec_mseg_##name(                     \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_reas_##name(                         \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_reas_mseg_##name(                    \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_reas_vec_##name(                     \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_reas_vec_mseg_##name(                \
+		void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts);
+
+NIX_RX_FASTPATH_MODES
+#undef R
+
+#define NIX_RX_RECV(fn, flags)                                                                     \
+	uint16_t __rte_noinline __rte_hot fn(void *rx_queue, struct rte_mbuf **rx_pkts,            \
+					     uint16_t pkts)                                        \
+	{                                                                                          \
+		RTE_SET_USED(rx_queue);                                                            \
+		RTE_SET_USED(rx_pkts);                                                             \
+		RTE_SET_USED(pkts);                                                                \
+		return 0;                                                                          \
+	}
+
+#define NIX_RX_RECV_MSEG(fn, flags) NIX_RX_RECV(fn, flags | NIX_RX_MULTI_SEG_F)
+
+#define NIX_RX_RECV_VEC(fn, flags)                                                                 \
+	uint16_t __rte_noinline __rte_hot fn(void *rx_queue, struct rte_mbuf **rx_pkts,            \
+					     uint16_t pkts)                                        \
+	{                                                                                          \
+		RTE_SET_USED(rx_queue);                                                            \
+		RTE_SET_USED(rx_pkts);                                                             \
+		RTE_SET_USED(pkts);                                                                \
+		return 0;                                                                          \
+	}
+
+#define NIX_RX_RECV_VEC_MSEG(fn, flags) NIX_RX_RECV_VEC(fn, flags | NIX_RX_MULTI_SEG_F)
+
+uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_all_offload(void *rx_queue,
+								  struct rte_mbuf **rx_pkts,
+								  uint16_t pkts);
+
+uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_vec_all_offload(void *rx_queue,
+								      struct rte_mbuf **rx_pkts,
+								      uint16_t pkts);
+
+uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_all_offload_tst(void *rx_queue,
+								      struct rte_mbuf **rx_pkts,
+								      uint16_t pkts);
+
+uint16_t __rte_noinline __rte_hot cn20k_nix_recv_pkts_vec_all_offload_tst(void *rx_queue,
+									  struct rte_mbuf **rx_pkts,
+									  uint16_t pkts);
+
 #endif /* __CN20K_RX_H__ */
diff --git a/drivers/net/cnxk/cn20k_rx_select.c b/drivers/net/cnxk/cn20k_rx_select.c
new file mode 100644
index 0000000000..82e06a62ef
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_rx_select.c
@@ -0,0 +1,162 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_ethdev.h"
+#include "cn20k_rx.h"
+
+static __rte_used void
+pick_rx_func(struct rte_eth_dev *eth_dev, const eth_rx_burst_t rx_burst[NIX_RX_OFFLOAD_MAX])
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	/* [VLAN] [TSP] [MARK] [CKSUM] [PTYPE] [RSS] */
+	eth_dev->rx_pkt_burst = rx_burst[dev->rx_offload_flags & (NIX_RX_OFFLOAD_MAX - 1)];
+
+	if (eth_dev->data->dev_started)
+		rte_eth_fp_ops[eth_dev->data->port_id].rx_pkt_burst = eth_dev->rx_pkt_burst;
+
+	rte_atomic_thread_fence(rte_memory_order_release);
+}
+
+static uint16_t __rte_noinline __rte_hot __rte_unused
+cn20k_nix_flush_rx(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)
+{
+	RTE_SET_USED(rx_queue);
+	RTE_SET_USED(rx_pkts);
+	RTE_SET_USED(pkts);
+	return 0;
+}
+
+#if defined(RTE_ARCH_ARM64)
+static void
+cn20k_eth_set_rx_tmplt_func(struct rte_eth_dev *eth_dev)
+{
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	const eth_rx_burst_t nix_eth_rx_burst[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_##name,
+
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_burst_mseg[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_mseg_##name,
+
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_burst_reas[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_reas_##name,
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_burst_mseg_reas[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_reas_mseg_##name,
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_vec_burst[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_vec_##name,
+
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_vec_burst_mseg[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_vec_mseg_##name,
+
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_vec_burst_reas[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_reas_vec_##name,
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	const eth_rx_burst_t nix_eth_rx_vec_burst_mseg_reas[NIX_RX_OFFLOAD_MAX] = {
+#define R(name, flags) [flags] = cn20k_nix_recv_pkts_reas_vec_mseg_##name,
+		NIX_RX_FASTPATH_MODES
+#undef R
+	};
+
+	/* Copy multi seg version with security for tear down sequence */
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		dev->rx_pkt_burst_no_offload = cn20k_nix_flush_rx;
+
+	if (dev->scalar_ena) {
+		if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER) {
+			if (dev->rx_offload_flags & NIX_RX_REAS_F)
+				return pick_rx_func(eth_dev, nix_eth_rx_burst_mseg_reas);
+			else
+				return pick_rx_func(eth_dev, nix_eth_rx_burst_mseg);
+		}
+		if (dev->rx_offload_flags & NIX_RX_REAS_F)
+			return pick_rx_func(eth_dev, nix_eth_rx_burst_reas);
+		else
+			return pick_rx_func(eth_dev, nix_eth_rx_burst);
+	}
+
+	if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER) {
+		if (dev->rx_offload_flags & NIX_RX_REAS_F)
+			return pick_rx_func(eth_dev, nix_eth_rx_vec_burst_mseg_reas);
+		else
+			return pick_rx_func(eth_dev, nix_eth_rx_vec_burst_mseg);
+	}
+
+	if (dev->rx_offload_flags & NIX_RX_REAS_F)
+		return pick_rx_func(eth_dev, nix_eth_rx_vec_burst_reas);
+	else
+		return pick_rx_func(eth_dev, nix_eth_rx_vec_burst);
+#else
+	RTE_SET_USED(eth_dev);
+#endif
+}
+
+static void
+cn20k_eth_set_rx_blk_func(struct rte_eth_dev *eth_dev)
+{
+#if defined(CNXK_DIS_TMPLT_FUNC)
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	/* Copy multi seg version with security for tear down sequence */
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		dev->rx_pkt_burst_no_offload = cn20k_nix_flush_rx;
+
+	if (dev->scalar_ena) {
+		eth_dev->rx_pkt_burst = cn20k_nix_recv_pkts_all_offload;
+		if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
+			eth_dev->rx_pkt_burst = cn20k_nix_recv_pkts_all_offload_tst;
+	} else {
+		eth_dev->rx_pkt_burst = cn20k_nix_recv_pkts_vec_all_offload;
+		if (dev->rx_offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP)
+			eth_dev->rx_pkt_burst = cn20k_nix_recv_pkts_vec_all_offload_tst;
+	}
+
+	if (eth_dev->data->dev_started)
+		rte_eth_fp_ops[eth_dev->data->port_id].rx_pkt_burst = eth_dev->rx_pkt_burst;
+#else
+	RTE_SET_USED(eth_dev);
+#endif
+}
+#endif
+
+void
+cn20k_eth_set_rx_function(struct rte_eth_dev *eth_dev)
+{
+#if defined(RTE_ARCH_ARM64)
+	cn20k_eth_set_rx_blk_func(eth_dev);
+	cn20k_eth_set_rx_tmplt_func(eth_dev);
+
+	rte_atomic_thread_fence(rte_memory_order_release);
+#else
+	RTE_SET_USED(eth_dev);
+#endif
+}
diff --git a/drivers/net/cnxk/meson.build b/drivers/net/cnxk/meson.build
index cf2ce09f77..f41238be9c 100644
--- a/drivers/net/cnxk/meson.build
+++ b/drivers/net/cnxk/meson.build
@@ -236,7 +236,51 @@ if soc_type == 'cn20k' or soc_type == 'all'
 # CN20K
 sources += files(
         'cn20k_ethdev.c',
+        'cn20k_rx_select.c',
 )
+
+if host_machine.cpu_family().startswith('aarch') and not disable_template
+sources += files(
+        'rx/cn20k/rx_0_15.c',
+        'rx/cn20k/rx_16_31.c',
+        'rx/cn20k/rx_32_47.c',
+        'rx/cn20k/rx_48_63.c',
+        'rx/cn20k/rx_64_79.c',
+        'rx/cn20k/rx_80_95.c',
+        'rx/cn20k/rx_96_111.c',
+        'rx/cn20k/rx_112_127.c',
+        'rx/cn20k/rx_0_15_mseg.c',
+        'rx/cn20k/rx_16_31_mseg.c',
+        'rx/cn20k/rx_32_47_mseg.c',
+        'rx/cn20k/rx_48_63_mseg.c',
+        'rx/cn20k/rx_64_79_mseg.c',
+        'rx/cn20k/rx_80_95_mseg.c',
+        'rx/cn20k/rx_96_111_mseg.c',
+        'rx/cn20k/rx_112_127_mseg.c',
+        'rx/cn20k/rx_0_15_vec.c',
+        'rx/cn20k/rx_16_31_vec.c',
+        'rx/cn20k/rx_32_47_vec.c',
+        'rx/cn20k/rx_48_63_vec.c',
+        'rx/cn20k/rx_64_79_vec.c',
+        'rx/cn20k/rx_80_95_vec.c',
+        'rx/cn20k/rx_96_111_vec.c',
+        'rx/cn20k/rx_112_127_vec.c',
+        'rx/cn20k/rx_0_15_vec_mseg.c',
+        'rx/cn20k/rx_16_31_vec_mseg.c',
+        'rx/cn20k/rx_32_47_vec_mseg.c',
+        'rx/cn20k/rx_48_63_vec_mseg.c',
+        'rx/cn20k/rx_64_79_vec_mseg.c',
+        'rx/cn20k/rx_80_95_vec_mseg.c',
+        'rx/cn20k/rx_96_111_vec_mseg.c',
+        'rx/cn20k/rx_112_127_vec_mseg.c',
+        'rx/cn20k/rx_all_offload.c',
+)
+
+else
+sources += files(
+        'rx/cn20k/rx_all_offload.c',
+)
+endif
 endif
 
 
diff --git a/drivers/net/cnxk/rx/cn20k/rx_0_15.c b/drivers/net/cnxk/rx/cn20k/rx_0_15.c
new file mode 100644
index 0000000000..d248eb8c7e
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_0_15.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_0_15
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c
new file mode 100644
index 0000000000..b159632921
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_0_15_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_0_15
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c b/drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c
new file mode 100644
index 0000000000..76846bfea8
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_0_15_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_0_15
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c
new file mode 100644
index 0000000000..73533631ad
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_0_15_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_0_15
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_112_127.c b/drivers/net/cnxk/rx/cn20k/rx_112_127.c
new file mode 100644
index 0000000000..b7c53def26
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_112_127.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_112_127
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c
new file mode 100644
index 0000000000..ed3a95479c
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_112_127_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_112_127
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c b/drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c
new file mode 100644
index 0000000000..4bbba8bdbe
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_112_127_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_112_127
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c
new file mode 100644
index 0000000000..3a2b67436f
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_112_127_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_112_127
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_16_31.c b/drivers/net/cnxk/rx/cn20k/rx_16_31.c
new file mode 100644
index 0000000000..cd60faaefd
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_16_31.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_16_31
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c
new file mode 100644
index 0000000000..2f2d527def
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_16_31_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_16_31
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c b/drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c
new file mode 100644
index 0000000000..595ec8689e
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_16_31_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_16_31
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c
new file mode 100644
index 0000000000..7cf1c65f4a
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_16_31_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_16_31
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_32_47.c b/drivers/net/cnxk/rx/cn20k/rx_32_47.c
new file mode 100644
index 0000000000..e3778448ca
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_32_47.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_32_47
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c
new file mode 100644
index 0000000000..2203247aa4
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_32_47_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_32_47
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c b/drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c
new file mode 100644
index 0000000000..7aae8225e7
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_32_47_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_32_47
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c
new file mode 100644
index 0000000000..1a221ae095
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_32_47_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_32_47
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_48_63.c b/drivers/net/cnxk/rx/cn20k/rx_48_63.c
new file mode 100644
index 0000000000..c5fedd06cd
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_48_63.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_48_63
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c
new file mode 100644
index 0000000000..6c2d8ac331
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_48_63_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_48_63
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c b/drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c
new file mode 100644
index 0000000000..20a937e453
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_48_63_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_48_63
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c
new file mode 100644
index 0000000000..929d807c8d
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_48_63_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_48_63
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_64_79.c b/drivers/net/cnxk/rx/cn20k/rx_64_79.c
new file mode 100644
index 0000000000..30beebc326
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_64_79.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_64_79
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c
new file mode 100644
index 0000000000..30ece8f8ee
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_64_79_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_64_79
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c b/drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c
new file mode 100644
index 0000000000..1f533c01f6
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_64_79_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_64_79
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c
new file mode 100644
index 0000000000..ed3c012798
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_64_79_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_64_79
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_80_95.c b/drivers/net/cnxk/rx/cn20k/rx_80_95.c
new file mode 100644
index 0000000000..a13ecb244f
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_80_95.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_80_95
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c
new file mode 100644
index 0000000000..c6438120d8
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_80_95_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_80_95
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c b/drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c
new file mode 100644
index 0000000000..94c685ba7c
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_80_95_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_80_95
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c
new file mode 100644
index 0000000000..370376da7d
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_80_95_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_80_95
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_96_111.c b/drivers/net/cnxk/rx/cn20k/rx_96_111.c
new file mode 100644
index 0000000000..15b5375e3c
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_96_111.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_##name, flags)                                             \
+	NIX_RX_RECV(cn20k_nix_recv_pkts_reas_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_96_111
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c
new file mode 100644
index 0000000000..561b48c789
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_96_111_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_mseg_##name, flags)                                   \
+	NIX_RX_RECV_MSEG(cn20k_nix_recv_pkts_reas_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_96_111
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c b/drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c
new file mode 100644
index 0000000000..17031f7b6f
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_96_111_vec.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_vec_##name, flags)                                     \
+	NIX_RX_RECV_VEC(cn20k_nix_recv_pkts_reas_vec_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_96_111
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c b/drivers/net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c
new file mode 100644
index 0000000000..9dd1f3f39a
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_96_111_vec_mseg.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define R(name, flags)                                                                             \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_vec_mseg_##name, flags)                           \
+	NIX_RX_RECV_VEC_MSEG(cn20k_nix_recv_pkts_reas_vec_mseg_##name, flags | NIX_RX_REAS_F)
+
+NIX_RX_FASTPATH_MODES_96_111
+#undef R
+
+#endif
diff --git a/drivers/net/cnxk/rx/cn20k/rx_all_offload.c b/drivers/net/cnxk/rx/cn20k/rx_all_offload.c
new file mode 100644
index 0000000000..83615e78aa
--- /dev/null
+++ b/drivers/net/cnxk/rx/cn20k/rx_all_offload.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_rx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if defined(CNXK_DIS_TMPLT_FUNC)
+
+uint16_t __rte_noinline __rte_hot
+cn20k_nix_recv_pkts_all_offload(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)
+{
+	return cn20k_nix_recv_pkts(rx_queue, rx_pkts, pkts,
+				   NIX_RX_OFFLOAD_RSS_F | NIX_RX_OFFLOAD_PTYPE_F |
+				   NIX_RX_OFFLOAD_CHECKSUM_F |
+				   NIX_RX_OFFLOAD_MARK_UPDATE_F | NIX_RX_OFFLOAD_TSTAMP_F |
+				   NIX_RX_OFFLOAD_VLAN_STRIP_F | NIX_RX_OFFLOAD_SECURITY_F |
+				   NIX_RX_MULTI_SEG_F | NIX_RX_REAS_F);
+}
+
+uint16_t __rte_noinline __rte_hot
+cn20k_nix_recv_pkts_vec_all_offload(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)
+{
+	return cn20k_nix_recv_pkts_vector(rx_queue, rx_pkts, pkts,
+					  NIX_RX_OFFLOAD_RSS_F | NIX_RX_OFFLOAD_PTYPE_F |
+					  NIX_RX_OFFLOAD_CHECKSUM_F | NIX_RX_OFFLOAD_MARK_UPDATE_F |
+					  NIX_RX_OFFLOAD_TSTAMP_F | NIX_RX_OFFLOAD_VLAN_STRIP_F |
+					  NIX_RX_OFFLOAD_SECURITY_F | NIX_RX_MULTI_SEG_F |
+					  NIX_RX_REAS_F, NULL, NULL, 0, 0);
+}
+
+uint16_t __rte_noinline __rte_hot
+cn20k_nix_recv_pkts_all_offload_tst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)
+{
+	return cn20k_nix_recv_pkts(rx_queue, rx_pkts, pkts,
+				   NIX_RX_OFFLOAD_RSS_F | NIX_RX_OFFLOAD_PTYPE_F |
+				   NIX_RX_OFFLOAD_CHECKSUM_F | NIX_RX_OFFLOAD_MARK_UPDATE_F |
+				   NIX_RX_OFFLOAD_VLAN_STRIP_F | NIX_RX_OFFLOAD_SECURITY_F |
+				   NIX_RX_MULTI_SEG_F | NIX_RX_REAS_F);
+}
+
+uint16_t __rte_noinline __rte_hot
+cn20k_nix_recv_pkts_vec_all_offload_tst(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)
+{
+	return cn20k_nix_recv_pkts_vector(rx_queue, rx_pkts, pkts,
+					  NIX_RX_OFFLOAD_RSS_F | NIX_RX_OFFLOAD_PTYPE_F |
+					  NIX_RX_OFFLOAD_CHECKSUM_F | NIX_RX_OFFLOAD_MARK_UPDATE_F |
+					  NIX_RX_OFFLOAD_VLAN_STRIP_F |	NIX_RX_OFFLOAD_SECURITY_F |
+					  NIX_RX_MULTI_SEG_F | NIX_RX_REAS_F, NULL, NULL, 0, 0);
+}
+
+#endif
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v3 12/18] net/cnxk: support Tx function select for cn20k
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
                     ` (10 preceding siblings ...)
  2024-10-01 12:40   ` [PATCH v3 11/18] net/cnxk: support Rx function select for cn20k Nithin Dabilpuram
@ 2024-10-01 12:40   ` Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 13/18] net/cnxk: support Rx burst scalar " Nithin Dabilpuram
                     ` (6 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-10-01 12:40 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev

Add support to select Tx function based on offload flags
for cn20k.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
---
 drivers/net/cnxk/cn20k_ethdev.c               |  80 ++++++
 drivers/net/cnxk/cn20k_ethdev.h               |   1 +
 drivers/net/cnxk/cn20k_tx.h                   | 237 ++++++++++++++++++
 drivers/net/cnxk/cn20k_tx_select.c            | 122 +++++++++
 drivers/net/cnxk/meson.build                  |  37 +++
 drivers/net/cnxk/tx/cn20k/tx_0_15.c           |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c      |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c       |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c  |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_112_127.c        |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c   |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c    |  18 ++
 .../net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c   |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_16_31.c          |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c     |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c      |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_32_47.c          |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c     |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c      |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_48_63.c          |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c     |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c      |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_64_79.c          |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c     |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c      |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_80_95.c          |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c     |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c      |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_96_111.c         |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c    |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c     |  18 ++
 .../net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c    |  18 ++
 drivers/net/cnxk/tx/cn20k/tx_all_offload.c    |  37 +++
 38 files changed, 1090 insertions(+)
 create mode 100644 drivers/net/cnxk/cn20k_tx_select.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c
 create mode 100644 drivers/net/cnxk/tx/cn20k/tx_all_offload.c

diff --git a/drivers/net/cnxk/cn20k_ethdev.c b/drivers/net/cnxk/cn20k_ethdev.c
index 545634a70e..1c67de8fa9 100644
--- a/drivers/net/cnxk/cn20k_ethdev.c
+++ b/drivers/net/cnxk/cn20k_ethdev.c
@@ -40,6 +40,78 @@ nix_rx_offload_flags(struct rte_eth_dev *eth_dev)
 	return flags;
 }
 
+static uint16_t
+nix_tx_offload_flags(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	uint64_t conf = dev->tx_offloads;
+	struct roc_nix *nix = &dev->nix;
+	uint16_t flags = 0;
+
+	/* Fastpath is dependent on these enums */
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_TCP_CKSUM != (1ULL << 52));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_SCTP_CKSUM != (2ULL << 52));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_UDP_CKSUM != (3ULL << 52));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_IP_CKSUM != (1ULL << 54));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_IPV4 != (1ULL << 55));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_OUTER_IP_CKSUM != (1ULL << 58));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_OUTER_IPV4 != (1ULL << 59));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_OUTER_IPV6 != (1ULL << 60));
+	RTE_BUILD_BUG_ON(RTE_MBUF_F_TX_OUTER_UDP_CKSUM != (1ULL << 41));
+	RTE_BUILD_BUG_ON(RTE_MBUF_L2_LEN_BITS != 7);
+	RTE_BUILD_BUG_ON(RTE_MBUF_L3_LEN_BITS != 9);
+	RTE_BUILD_BUG_ON(RTE_MBUF_OUTL2_LEN_BITS != 7);
+	RTE_BUILD_BUG_ON(RTE_MBUF_OUTL3_LEN_BITS != 9);
+	RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, data_off) !=
+			 offsetof(struct rte_mbuf, buf_addr) + 16);
+	RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) !=
+			 offsetof(struct rte_mbuf, buf_addr) + 24);
+	RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, pkt_len) !=
+			 offsetof(struct rte_mbuf, ol_flags) + 12);
+	RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, tx_offload) !=
+			 offsetof(struct rte_mbuf, pool) + 2 * sizeof(void *));
+
+	if (conf & RTE_ETH_TX_OFFLOAD_VLAN_INSERT || conf & RTE_ETH_TX_OFFLOAD_QINQ_INSERT)
+		flags |= NIX_TX_OFFLOAD_VLAN_QINQ_F;
+
+	if (conf & RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM || conf & RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM)
+		flags |= NIX_TX_OFFLOAD_OL3_OL4_CSUM_F;
+
+	if (conf & RTE_ETH_TX_OFFLOAD_IPV4_CKSUM || conf & RTE_ETH_TX_OFFLOAD_TCP_CKSUM ||
+	    conf & RTE_ETH_TX_OFFLOAD_UDP_CKSUM || conf & RTE_ETH_TX_OFFLOAD_SCTP_CKSUM)
+		flags |= NIX_TX_OFFLOAD_L3_L4_CSUM_F;
+
+	if (!(conf & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE))
+		flags |= NIX_TX_OFFLOAD_MBUF_NOFF_F;
+
+	if (conf & RTE_ETH_TX_OFFLOAD_MULTI_SEGS)
+		flags |= NIX_TX_MULTI_SEG_F;
+
+	/* Enable Inner checksum for TSO */
+	if (conf & RTE_ETH_TX_OFFLOAD_TCP_TSO)
+		flags |= (NIX_TX_OFFLOAD_TSO_F | NIX_TX_OFFLOAD_L3_L4_CSUM_F);
+
+	/* Enable Inner and Outer checksum for Tunnel TSO */
+	if (conf & (RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO | RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO |
+		    RTE_ETH_TX_OFFLOAD_GRE_TNL_TSO))
+		flags |= (NIX_TX_OFFLOAD_TSO_F | NIX_TX_OFFLOAD_OL3_OL4_CSUM_F |
+			  NIX_TX_OFFLOAD_L3_L4_CSUM_F);
+
+	if ((dev->rx_offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP))
+		flags |= NIX_TX_OFFLOAD_TSTAMP_F;
+
+	if (conf & RTE_ETH_TX_OFFLOAD_SECURITY)
+		flags |= NIX_TX_OFFLOAD_SECURITY_F;
+
+	if (dev->tx_mark)
+		flags |= NIX_TX_OFFLOAD_VLAN_QINQ_F;
+
+	if (nix->tx_compl_ena)
+		flags |= NIX_TX_OFFLOAD_MBUF_NOFF_F;
+
+	return flags;
+}
+
 static int
 cn20k_nix_ptypes_set(struct rte_eth_dev *eth_dev, uint32_t ptype_mask)
 {
@@ -226,6 +298,7 @@ cn20k_nix_rx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid, uint16_t nb_
 
 		/* Update offload flags */
 		dev->rx_offload_flags = nix_rx_offload_flags(eth_dev);
+		dev->tx_offload_flags = nix_tx_offload_flags(eth_dev);
 	}
 
 	rq = &dev->rqs[qid];
@@ -286,6 +359,8 @@ cn20k_nix_configure(struct rte_eth_dev *eth_dev)
 
 	/* Update offload flags */
 	dev->rx_offload_flags = nix_rx_offload_flags(eth_dev);
+	dev->tx_offload_flags = nix_tx_offload_flags(eth_dev);
+
 	/* reset reassembly dynfield/flag offset */
 	dev->reass_dynfield_off = -1;
 	dev->reass_dynflag_bit = -1;
@@ -316,6 +391,7 @@ cn20k_nix_timesync_enable(struct rte_eth_dev *eth_dev)
 	 * in rx[tx]_offloads.
 	 */
 	cn20k_eth_set_rx_function(eth_dev);
+	cn20k_eth_set_tx_function(eth_dev);
 	return 0;
 }
 
@@ -339,6 +415,7 @@ cn20k_nix_timesync_disable(struct rte_eth_dev *eth_dev)
 	 * in rx[tx]_offloads.
 	 */
 	cn20k_eth_set_rx_function(eth_dev);
+	cn20k_eth_set_tx_function(eth_dev);
 	return 0;
 }
 
@@ -378,10 +455,12 @@ cn20k_nix_dev_start(struct rte_eth_dev *eth_dev)
 	 * in rx[tx]_offloads.
 	 */
 	dev->rx_offload_flags |= nix_rx_offload_flags(eth_dev);
+	dev->tx_offload_flags |= nix_tx_offload_flags(eth_dev);
 	/* Set flags for Rx Inject feature */
 	if (roc_idev_nix_rx_inject_get(nix->port_id))
 		dev->rx_offload_flags |= NIX_RX_SEC_REASSEMBLY_F;
 
+	cn20k_eth_set_tx_function(eth_dev);
 	cn20k_eth_set_rx_function(eth_dev);
 	return 0;
 }
@@ -581,6 +660,7 @@ cn20k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
 		/* Setup callbacks for secondary process */
+		cn20k_eth_set_tx_function(eth_dev);
 		cn20k_eth_set_rx_function(eth_dev);
 		return 0;
 	}
diff --git a/drivers/net/cnxk/cn20k_ethdev.h b/drivers/net/cnxk/cn20k_ethdev.h
index 2049ee7fa4..cb46044d60 100644
--- a/drivers/net/cnxk/cn20k_ethdev.h
+++ b/drivers/net/cnxk/cn20k_ethdev.h
@@ -10,5 +10,6 @@
 
 /* Rx and Tx routines */
 void cn20k_eth_set_rx_function(struct rte_eth_dev *eth_dev);
+void cn20k_eth_set_tx_function(struct rte_eth_dev *eth_dev);
 
 #endif /* __CN20K_ETHDEV_H__ */
diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
index a00c9d5776..9fd925ac34 100644
--- a/drivers/net/cnxk/cn20k_tx.h
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -32,4 +32,241 @@
 #define NIX_TX_NEED_EXT_HDR                                                                        \
 	(NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSTAMP_F | NIX_TX_OFFLOAD_TSO_F)
 
+#define L3L4CSUM_F   NIX_TX_OFFLOAD_L3_L4_CSUM_F
+#define OL3OL4CSUM_F NIX_TX_OFFLOAD_OL3_OL4_CSUM_F
+#define VLAN_F	     NIX_TX_OFFLOAD_VLAN_QINQ_F
+#define NOFF_F	     NIX_TX_OFFLOAD_MBUF_NOFF_F
+#define TSO_F	     NIX_TX_OFFLOAD_TSO_F
+#define TSP_F	     NIX_TX_OFFLOAD_TSTAMP_F
+#define T_SEC_F	     NIX_TX_OFFLOAD_SECURITY_F
+
+/* [T_SEC_F] [TSP] [TSO] [NOFF] [VLAN] [OL3OL4CSUM] [L3L4CSUM] */
+#define NIX_TX_FASTPATH_MODES_0_15                                                                 \
+	T(no_offload, 6, NIX_TX_OFFLOAD_NONE)                                                      \
+	T(l3l4csum, 6, L3L4CSUM_F)                                                                 \
+	T(ol3ol4csum, 6, OL3OL4CSUM_F)                                                             \
+	T(ol3ol4csum_l3l4csum, 6, OL3OL4CSUM_F | L3L4CSUM_F)                                       \
+	T(vlan, 6, VLAN_F)                                                                         \
+	T(vlan_l3l4csum, 6, VLAN_F | L3L4CSUM_F)                                                   \
+	T(vlan_ol3ol4csum, 6, VLAN_F | OL3OL4CSUM_F)                                               \
+	T(vlan_ol3ol4csum_l3l4csum, 6, VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)                         \
+	T(noff, 6, NOFF_F)                                                                         \
+	T(noff_l3l4csum, 6, NOFF_F | L3L4CSUM_F)                                                   \
+	T(noff_ol3ol4csum, 6, NOFF_F | OL3OL4CSUM_F)                                               \
+	T(noff_ol3ol4csum_l3l4csum, 6, NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)                         \
+	T(noff_vlan, 6, NOFF_F | VLAN_F)                                                           \
+	T(noff_vlan_l3l4csum, 6, NOFF_F | VLAN_F | L3L4CSUM_F)                                     \
+	T(noff_vlan_ol3ol4csum, 6, NOFF_F | VLAN_F | OL3OL4CSUM_F)                                 \
+	T(noff_vlan_ol3ol4csum_l3l4csum, 6, NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_16_31                                                                \
+	T(tso, 6, TSO_F)                                                                           \
+	T(tso_l3l4csum, 6, TSO_F | L3L4CSUM_F)                                                     \
+	T(tso_ol3ol4csum, 6, TSO_F | OL3OL4CSUM_F)                                                 \
+	T(tso_ol3ol4csum_l3l4csum, 6, TSO_F | OL3OL4CSUM_F | L3L4CSUM_F)                           \
+	T(tso_vlan, 6, TSO_F | VLAN_F)                                                             \
+	T(tso_vlan_l3l4csum, 6, TSO_F | VLAN_F | L3L4CSUM_F)                                       \
+	T(tso_vlan_ol3ol4csum, 6, TSO_F | VLAN_F | OL3OL4CSUM_F)                                   \
+	T(tso_vlan_ol3ol4csum_l3l4csum, 6, TSO_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)             \
+	T(tso_noff, 6, TSO_F | NOFF_F)                                                             \
+	T(tso_noff_l3l4csum, 6, TSO_F | NOFF_F | L3L4CSUM_F)                                       \
+	T(tso_noff_ol3ol4csum, 6, TSO_F | NOFF_F | OL3OL4CSUM_F)                                   \
+	T(tso_noff_ol3ol4csum_l3l4csum, 6, TSO_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)             \
+	T(tso_noff_vlan, 6, TSO_F | NOFF_F | VLAN_F)                                               \
+	T(tso_noff_vlan_l3l4csum, 6, TSO_F | NOFF_F | VLAN_F | L3L4CSUM_F)                         \
+	T(tso_noff_vlan_ol3ol4csum, 6, TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)                     \
+	T(tso_noff_vlan_ol3ol4csum_l3l4csum, 6, TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_32_47                                                                \
+	T(ts, 8, TSP_F)                                                                            \
+	T(ts_l3l4csum, 8, TSP_F | L3L4CSUM_F)                                                      \
+	T(ts_ol3ol4csum, 8, TSP_F | OL3OL4CSUM_F)                                                  \
+	T(ts_ol3ol4csum_l3l4csum, 8, TSP_F | OL3OL4CSUM_F | L3L4CSUM_F)                            \
+	T(ts_vlan, 8, TSP_F | VLAN_F)                                                              \
+	T(ts_vlan_l3l4csum, 8, TSP_F | VLAN_F | L3L4CSUM_F)                                        \
+	T(ts_vlan_ol3ol4csum, 8, TSP_F | VLAN_F | OL3OL4CSUM_F)                                    \
+	T(ts_vlan_ol3ol4csum_l3l4csum, 8, TSP_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)              \
+	T(ts_noff, 8, TSP_F | NOFF_F)                                                              \
+	T(ts_noff_l3l4csum, 8, TSP_F | NOFF_F | L3L4CSUM_F)                                        \
+	T(ts_noff_ol3ol4csum, 8, TSP_F | NOFF_F | OL3OL4CSUM_F)                                    \
+	T(ts_noff_ol3ol4csum_l3l4csum, 8, TSP_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)              \
+	T(ts_noff_vlan, 8, TSP_F | NOFF_F | VLAN_F)                                                \
+	T(ts_noff_vlan_l3l4csum, 8, TSP_F | NOFF_F | VLAN_F | L3L4CSUM_F)                          \
+	T(ts_noff_vlan_ol3ol4csum, 8, TSP_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)                      \
+	T(ts_noff_vlan_ol3ol4csum_l3l4csum, 8, TSP_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_48_63                                                                \
+	T(ts_tso, 8, TSP_F | TSO_F)                                                                \
+	T(ts_tso_l3l4csum, 8, TSP_F | TSO_F | L3L4CSUM_F)                                          \
+	T(ts_tso_ol3ol4csum, 8, TSP_F | TSO_F | OL3OL4CSUM_F)                                      \
+	T(ts_tso_ol3ol4csum_l3l4csum, 8, TSP_F | TSO_F | OL3OL4CSUM_F | L3L4CSUM_F)                \
+	T(ts_tso_vlan, 8, TSP_F | TSO_F | VLAN_F)                                                  \
+	T(ts_tso_vlan_l3l4csum, 8, TSP_F | TSO_F | VLAN_F | L3L4CSUM_F)                            \
+	T(ts_tso_vlan_ol3ol4csum, 8, TSP_F | TSO_F | VLAN_F | OL3OL4CSUM_F)                        \
+	T(ts_tso_vlan_ol3ol4csum_l3l4csum, 8, TSP_F | TSO_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)  \
+	T(ts_tso_noff, 8, TSP_F | TSO_F | NOFF_F)                                                  \
+	T(ts_tso_noff_l3l4csum, 8, TSP_F | TSO_F | NOFF_F | L3L4CSUM_F)                            \
+	T(ts_tso_noff_ol3ol4csum, 8, TSP_F | TSO_F | NOFF_F | OL3OL4CSUM_F)                        \
+	T(ts_tso_noff_ol3ol4csum_l3l4csum, 8, TSP_F | TSO_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)  \
+	T(ts_tso_noff_vlan, 8, TSP_F | TSO_F | NOFF_F | VLAN_F)                                    \
+	T(ts_tso_noff_vlan_l3l4csum, 8, TSP_F | TSO_F | NOFF_F | VLAN_F | L3L4CSUM_F)              \
+	T(ts_tso_noff_vlan_ol3ol4csum, 8, TSP_F | TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)          \
+	T(ts_tso_noff_vlan_ol3ol4csum_l3l4csum, 8,                                                 \
+	  TSP_F | TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_64_79                                                                \
+	T(sec, 6, T_SEC_F)                                                                         \
+	T(sec_l3l4csum, 6, T_SEC_F | L3L4CSUM_F)                                                   \
+	T(sec_ol3ol4csum, 6, T_SEC_F | OL3OL4CSUM_F)                                               \
+	T(sec_ol3ol4csum_l3l4csum, 6, T_SEC_F | OL3OL4CSUM_F | L3L4CSUM_F)                         \
+	T(sec_vlan, 6, T_SEC_F | VLAN_F)                                                           \
+	T(sec_vlan_l3l4csum, 6, T_SEC_F | VLAN_F | L3L4CSUM_F)                                     \
+	T(sec_vlan_ol3ol4csum, 6, T_SEC_F | VLAN_F | OL3OL4CSUM_F)                                 \
+	T(sec_vlan_ol3ol4csum_l3l4csum, 6, T_SEC_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)           \
+	T(sec_noff, 6, T_SEC_F | NOFF_F)                                                           \
+	T(sec_noff_l3l4csum, 6, T_SEC_F | NOFF_F | L3L4CSUM_F)                                     \
+	T(sec_noff_ol3ol4csum, 6, T_SEC_F | NOFF_F | OL3OL4CSUM_F)                                 \
+	T(sec_noff_ol3ol4csum_l3l4csum, 6, T_SEC_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)           \
+	T(sec_noff_vlan, 6, T_SEC_F | NOFF_F | VLAN_F)                                             \
+	T(sec_noff_vlan_l3l4csum, 6, T_SEC_F | NOFF_F | VLAN_F | L3L4CSUM_F)                       \
+	T(sec_noff_vlan_ol3ol4csum, 6, T_SEC_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)                   \
+	T(sec_noff_vlan_ol3ol4csum_l3l4csum, 6,                                                    \
+	  T_SEC_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_80_95                                                                \
+	T(sec_tso, 6, T_SEC_F | TSO_F)                                                             \
+	T(sec_tso_l3l4csum, 6, T_SEC_F | TSO_F | L3L4CSUM_F)                                       \
+	T(sec_tso_ol3ol4csum, 6, T_SEC_F | TSO_F | OL3OL4CSUM_F)                                   \
+	T(sec_tso_ol3ol4csum_l3l4csum, 6, T_SEC_F | TSO_F | OL3OL4CSUM_F | L3L4CSUM_F)             \
+	T(sec_tso_vlan, 6, T_SEC_F | TSO_F | VLAN_F)                                               \
+	T(sec_tso_vlan_l3l4csum, 6, T_SEC_F | TSO_F | VLAN_F | L3L4CSUM_F)                         \
+	T(sec_tso_vlan_ol3ol4csum, 6, T_SEC_F | TSO_F | VLAN_F | OL3OL4CSUM_F)                     \
+	T(sec_tso_vlan_ol3ol4csum_l3l4csum, 6,                                                     \
+	  T_SEC_F | TSO_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)                                    \
+	T(sec_tso_noff, 6, T_SEC_F | TSO_F | NOFF_F)                                               \
+	T(sec_tso_noff_l3l4csum, 6, T_SEC_F | TSO_F | NOFF_F | L3L4CSUM_F)                         \
+	T(sec_tso_noff_ol3ol4csum, 6, T_SEC_F | TSO_F | NOFF_F | OL3OL4CSUM_F)                     \
+	T(sec_tso_noff_ol3ol4csum_l3l4csum, 6,                                                     \
+	  T_SEC_F | TSO_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)                                    \
+	T(sec_tso_noff_vlan, 6, T_SEC_F | TSO_F | NOFF_F | VLAN_F)                                 \
+	T(sec_tso_noff_vlan_l3l4csum, 6, T_SEC_F | TSO_F | NOFF_F | VLAN_F | L3L4CSUM_F)           \
+	T(sec_tso_noff_vlan_ol3ol4csum, 6, T_SEC_F | TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)       \
+	T(sec_tso_noff_vlan_ol3ol4csum_l3l4csum, 6,                                                \
+	  T_SEC_F | TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_96_111                                                               \
+	T(sec_ts, 8, T_SEC_F | TSP_F)                                                              \
+	T(sec_ts_l3l4csum, 8, T_SEC_F | TSP_F | L3L4CSUM_F)                                        \
+	T(sec_ts_ol3ol4csum, 8, T_SEC_F | TSP_F | OL3OL4CSUM_F)                                    \
+	T(sec_ts_ol3ol4csum_l3l4csum, 8, T_SEC_F | TSP_F | OL3OL4CSUM_F | L3L4CSUM_F)              \
+	T(sec_ts_vlan, 8, T_SEC_F | TSP_F | VLAN_F)                                                \
+	T(sec_ts_vlan_l3l4csum, 8, T_SEC_F | TSP_F | VLAN_F | L3L4CSUM_F)                          \
+	T(sec_ts_vlan_ol3ol4csum, 8, T_SEC_F | TSP_F | VLAN_F | OL3OL4CSUM_F)                      \
+	T(sec_ts_vlan_ol3ol4csum_l3l4csum, 8,                                                      \
+	  T_SEC_F | TSP_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)                                    \
+	T(sec_ts_noff, 8, T_SEC_F | TSP_F | NOFF_F)                                                \
+	T(sec_ts_noff_l3l4csum, 8, T_SEC_F | TSP_F | NOFF_F | L3L4CSUM_F)                          \
+	T(sec_ts_noff_ol3ol4csum, 8, T_SEC_F | TSP_F | NOFF_F | OL3OL4CSUM_F)                      \
+	T(sec_ts_noff_ol3ol4csum_l3l4csum, 8,                                                      \
+	  T_SEC_F | TSP_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)                                    \
+	T(sec_ts_noff_vlan, 8, T_SEC_F | TSP_F | NOFF_F | VLAN_F)                                  \
+	T(sec_ts_noff_vlan_l3l4csum, 8, T_SEC_F | TSP_F | NOFF_F | VLAN_F | L3L4CSUM_F)            \
+	T(sec_ts_noff_vlan_ol3ol4csum, 8, T_SEC_F | TSP_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)        \
+	T(sec_ts_noff_vlan_ol3ol4csum_l3l4csum, 8,                                                 \
+	  T_SEC_F | TSP_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES_112_127                                                              \
+	T(sec_ts_tso, 8, T_SEC_F | TSP_F | TSO_F)                                                  \
+	T(sec_ts_tso_l3l4csum, 8, T_SEC_F | TSP_F | TSO_F | L3L4CSUM_F)                            \
+	T(sec_ts_tso_ol3ol4csum, 8, T_SEC_F | TSP_F | TSO_F | OL3OL4CSUM_F)                        \
+	T(sec_ts_tso_ol3ol4csum_l3l4csum, 8, T_SEC_F | TSP_F | TSO_F | OL3OL4CSUM_F | L3L4CSUM_F)  \
+	T(sec_ts_tso_vlan, 8, T_SEC_F | TSP_F | TSO_F | VLAN_F)                                    \
+	T(sec_ts_tso_vlan_l3l4csum, 8, T_SEC_F | TSP_F | TSO_F | VLAN_F | L3L4CSUM_F)              \
+	T(sec_ts_tso_vlan_ol3ol4csum, 8, T_SEC_F | TSP_F | TSO_F | VLAN_F | OL3OL4CSUM_F)          \
+	T(sec_ts_tso_vlan_ol3ol4csum_l3l4csum, 8,                                                  \
+	  T_SEC_F | TSP_F | TSO_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)                            \
+	T(sec_ts_tso_noff, 8, T_SEC_F | TSP_F | TSO_F | NOFF_F)                                    \
+	T(sec_ts_tso_noff_l3l4csum, 8, T_SEC_F | TSP_F | TSO_F | NOFF_F | L3L4CSUM_F)              \
+	T(sec_ts_tso_noff_ol3ol4csum, 8, T_SEC_F | TSP_F | TSO_F | NOFF_F | OL3OL4CSUM_F)          \
+	T(sec_ts_tso_noff_ol3ol4csum_l3l4csum, 8,                                                  \
+	  T_SEC_F | TSP_F | TSO_F | NOFF_F | OL3OL4CSUM_F | L3L4CSUM_F)                            \
+	T(sec_ts_tso_noff_vlan, 8, T_SEC_F | TSP_F | TSO_F | NOFF_F | VLAN_F)                      \
+	T(sec_ts_tso_noff_vlan_l3l4csum, 8,                                                        \
+	  T_SEC_F | TSP_F | TSO_F | NOFF_F | VLAN_F | L3L4CSUM_F)                                  \
+	T(sec_ts_tso_noff_vlan_ol3ol4csum, 8,                                                      \
+	  T_SEC_F | TSP_F | TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F)                                \
+	T(sec_ts_tso_noff_vlan_ol3ol4csum_l3l4csum, 8,                                             \
+	  T_SEC_F | TSP_F | TSO_F | NOFF_F | VLAN_F | OL3OL4CSUM_F | L3L4CSUM_F)
+
+#define NIX_TX_FASTPATH_MODES                                                                      \
+	NIX_TX_FASTPATH_MODES_0_15                                                                 \
+	NIX_TX_FASTPATH_MODES_16_31                                                                \
+	NIX_TX_FASTPATH_MODES_32_47                                                                \
+	NIX_TX_FASTPATH_MODES_48_63                                                                \
+	NIX_TX_FASTPATH_MODES_64_79                                                                \
+	NIX_TX_FASTPATH_MODES_80_95                                                                \
+	NIX_TX_FASTPATH_MODES_96_111                                                               \
+	NIX_TX_FASTPATH_MODES_112_127
+
+#define T(name, sz, flags)                                                                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_##name(                              \
+		void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_mseg_##name(                         \
+		void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_vec_##name(                          \
+		void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts);                         \
+	uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_vec_mseg_##name(                     \
+		void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts);
+
+NIX_TX_FASTPATH_MODES
+#undef T
+
+#define NIX_TX_XMIT(fn, sz, flags)                                                                 \
+	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
+					     uint16_t pkts)                                        \
+	{                                                                                          \
+		RTE_SET_USED(tx_queue);                                                            \
+		RTE_SET_USED(tx_pkts);                                                             \
+		RTE_SET_USED(pkts);                                                                \
+		return 0;                                                                          \
+	}
+
+#define NIX_TX_XMIT_MSEG(fn, sz, flags)                                                            \
+	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
+					     uint16_t pkts)                                        \
+	{                                                                                          \
+		RTE_SET_USED(tx_queue);                                                            \
+		RTE_SET_USED(tx_pkts);                                                             \
+		RTE_SET_USED(pkts);                                                                \
+		return 0;                                                                          \
+	}
+
+#define NIX_TX_XMIT_VEC(fn, sz, flags)                                                             \
+	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
+					     uint16_t pkts)                                        \
+	{                                                                                          \
+		RTE_SET_USED(tx_queue);                                                            \
+		RTE_SET_USED(tx_pkts);                                                             \
+		RTE_SET_USED(pkts);                                                                \
+		return 0;                                                                          \
+	}
+
+#define NIX_TX_XMIT_VEC_MSEG(fn, sz, flags)                                                        \
+	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
+					     uint16_t pkts)                                        \
+	{                                                                                          \
+		RTE_SET_USED(tx_queue);                                                            \
+		RTE_SET_USED(tx_pkts);                                                             \
+		RTE_SET_USED(pkts);                                                                \
+		return 0;                                                                          \
+	}
+
+uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_all_offload(void *tx_queue,
+								  struct rte_mbuf **tx_pkts,
+								  uint16_t pkts);
+
+uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_vec_all_offload(void *tx_queue,
+								      struct rte_mbuf **tx_pkts,
+								      uint16_t pkts);
+
 #endif /* __CN20K_TX_H__ */
diff --git a/drivers/net/cnxk/cn20k_tx_select.c b/drivers/net/cnxk/cn20k_tx_select.c
new file mode 100644
index 0000000000..fb62b54a5f
--- /dev/null
+++ b/drivers/net/cnxk/cn20k_tx_select.c
@@ -0,0 +1,122 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_ethdev.h"
+#include "cn20k_tx.h"
+
+static __rte_used inline void
+pick_tx_func(struct rte_eth_dev *eth_dev, const eth_tx_burst_t tx_burst[NIX_TX_OFFLOAD_MAX])
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	/* [SEC] [TSP] [TSO] [NOFF] [VLAN] [OL3_OL4_CSUM] [IL3_IL4_CSUM] */
+	eth_dev->tx_pkt_burst = tx_burst[dev->tx_offload_flags & (NIX_TX_OFFLOAD_MAX - 1)];
+
+	if (eth_dev->data->dev_started)
+		rte_eth_fp_ops[eth_dev->data->port_id].tx_pkt_burst = eth_dev->tx_pkt_burst;
+}
+
+#if defined(RTE_ARCH_ARM64)
+static int
+cn20k_nix_tx_queue_count(void *tx_queue)
+{
+	struct cn20k_eth_txq *txq = (struct cn20k_eth_txq *)tx_queue;
+
+	return cnxk_nix_tx_queue_count(txq->fc_mem, txq->sqes_per_sqb_log2);
+}
+
+static int
+cn20k_nix_tx_queue_sec_count(void *tx_queue)
+{
+	struct cn20k_eth_txq *txq = (struct cn20k_eth_txq *)tx_queue;
+
+	return cnxk_nix_tx_queue_sec_count(txq->fc_mem, txq->sqes_per_sqb_log2, txq->cpt_fc);
+}
+
+static void
+cn20k_eth_set_tx_tmplt_func(struct rte_eth_dev *eth_dev)
+{
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	const eth_tx_burst_t nix_eth_tx_burst[NIX_TX_OFFLOAD_MAX] = {
+#define T(name, sz, flags) [flags] = cn20k_nix_xmit_pkts_##name,
+
+		NIX_TX_FASTPATH_MODES
+#undef T
+	};
+
+	const eth_tx_burst_t nix_eth_tx_burst_mseg[NIX_TX_OFFLOAD_MAX] = {
+#define T(name, sz, flags) [flags] = cn20k_nix_xmit_pkts_mseg_##name,
+
+		NIX_TX_FASTPATH_MODES
+#undef T
+	};
+
+	const eth_tx_burst_t nix_eth_tx_vec_burst[NIX_TX_OFFLOAD_MAX] = {
+#define T(name, sz, flags) [flags] = cn20k_nix_xmit_pkts_vec_##name,
+
+		NIX_TX_FASTPATH_MODES
+#undef T
+	};
+
+	const eth_tx_burst_t nix_eth_tx_vec_burst_mseg[NIX_TX_OFFLOAD_MAX] = {
+#define T(name, sz, flags) [flags] = cn20k_nix_xmit_pkts_vec_mseg_##name,
+
+		NIX_TX_FASTPATH_MODES
+#undef T
+	};
+
+	if (dev->scalar_ena || dev->tx_mark) {
+		pick_tx_func(eth_dev, nix_eth_tx_burst);
+		if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_MULTI_SEGS)
+			pick_tx_func(eth_dev, nix_eth_tx_burst_mseg);
+	} else {
+		pick_tx_func(eth_dev, nix_eth_tx_vec_burst);
+		if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_MULTI_SEGS)
+			pick_tx_func(eth_dev, nix_eth_tx_vec_burst_mseg);
+	}
+#else
+	RTE_SET_USED(eth_dev);
+#endif
+}
+
+static void
+cn20k_eth_set_tx_blk_func(struct rte_eth_dev *eth_dev)
+{
+#if defined(CNXK_DIS_TMPLT_FUNC)
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	if (dev->scalar_ena || dev->tx_mark)
+		eth_dev->tx_pkt_burst = cn20k_nix_xmit_pkts_all_offload;
+	else
+		eth_dev->tx_pkt_burst = cn20k_nix_xmit_pkts_vec_all_offload;
+
+	if (eth_dev->data->dev_started)
+		rte_eth_fp_ops[eth_dev->data->port_id].tx_pkt_burst = eth_dev->tx_pkt_burst;
+#else
+	RTE_SET_USED(eth_dev);
+#endif
+}
+#endif
+
+void
+cn20k_eth_set_tx_function(struct rte_eth_dev *eth_dev)
+{
+#if defined(RTE_ARCH_ARM64)
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	cn20k_eth_set_tx_blk_func(eth_dev);
+	cn20k_eth_set_tx_tmplt_func(eth_dev);
+
+	if (dev->tx_offloads & RTE_ETH_TX_OFFLOAD_SECURITY)
+		eth_dev->tx_queue_count = cn20k_nix_tx_queue_sec_count;
+	else
+		eth_dev->tx_queue_count = cn20k_nix_tx_queue_count;
+
+	rte_atomic_thread_fence(rte_memory_order_release);
+#else
+	RTE_SET_USED(eth_dev);
+#endif
+}
diff --git a/drivers/net/cnxk/meson.build b/drivers/net/cnxk/meson.build
index f41238be9c..fcf48f600a 100644
--- a/drivers/net/cnxk/meson.build
+++ b/drivers/net/cnxk/meson.build
@@ -237,6 +237,7 @@ if soc_type == 'cn20k' or soc_type == 'all'
 sources += files(
         'cn20k_ethdev.c',
         'cn20k_rx_select.c',
+        'cn20k_tx_select.c',
 )
 
 if host_machine.cpu_family().startswith('aarch') and not disable_template
@@ -276,9 +277,45 @@ sources += files(
         'rx/cn20k/rx_all_offload.c',
 )
 
+sources += files(
+        'tx/cn20k/tx_0_15.c',
+        'tx/cn20k/tx_16_31.c',
+        'tx/cn20k/tx_32_47.c',
+        'tx/cn20k/tx_48_63.c',
+        'tx/cn20k/tx_64_79.c',
+        'tx/cn20k/tx_80_95.c',
+        'tx/cn20k/tx_96_111.c',
+        'tx/cn20k/tx_112_127.c',
+        'tx/cn20k/tx_0_15_mseg.c',
+        'tx/cn20k/tx_16_31_mseg.c',
+        'tx/cn20k/tx_32_47_mseg.c',
+        'tx/cn20k/tx_48_63_mseg.c',
+        'tx/cn20k/tx_64_79_mseg.c',
+        'tx/cn20k/tx_80_95_mseg.c',
+        'tx/cn20k/tx_96_111_mseg.c',
+        'tx/cn20k/tx_112_127_mseg.c',
+        'tx/cn20k/tx_0_15_vec.c',
+        'tx/cn20k/tx_16_31_vec.c',
+        'tx/cn20k/tx_32_47_vec.c',
+        'tx/cn20k/tx_48_63_vec.c',
+        'tx/cn20k/tx_64_79_vec.c',
+        'tx/cn20k/tx_80_95_vec.c',
+        'tx/cn20k/tx_96_111_vec.c',
+        'tx/cn20k/tx_112_127_vec.c',
+        'tx/cn20k/tx_0_15_vec_mseg.c',
+        'tx/cn20k/tx_16_31_vec_mseg.c',
+        'tx/cn20k/tx_32_47_vec_mseg.c',
+        'tx/cn20k/tx_48_63_vec_mseg.c',
+        'tx/cn20k/tx_64_79_vec_mseg.c',
+        'tx/cn20k/tx_80_95_vec_mseg.c',
+        'tx/cn20k/tx_96_111_vec_mseg.c',
+        'tx/cn20k/tx_112_127_vec_mseg.c',
+        'tx/cn20k/tx_all_offload.c',
+)
 else
 sources += files(
         'rx/cn20k/rx_all_offload.c',
+        'tx/cn20k/tx_all_offload.c',
 )
 endif
 endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_0_15.c b/drivers/net/cnxk/tx/cn20k/tx_0_15.c
new file mode 100644
index 0000000000..2de434ccb4
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_0_15.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_0_15
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c
new file mode 100644
index 0000000000..c928902b02
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_0_15_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_0_15
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c b/drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c
new file mode 100644
index 0000000000..0e82451c7e
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_0_15_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_0_15
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c
new file mode 100644
index 0000000000..b0cd33f781
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_0_15_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_0_15
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_112_127.c b/drivers/net/cnxk/tx/cn20k/tx_112_127.c
new file mode 100644
index 0000000000..c116c48763
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_112_127.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_112_127
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c
new file mode 100644
index 0000000000..5d67426f2b
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_112_127_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_112_127
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c b/drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c
new file mode 100644
index 0000000000..5a3e5c660d
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_112_127_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_112_127
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c
new file mode 100644
index 0000000000..c6918de6df
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_112_127_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_112_127
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_16_31.c b/drivers/net/cnxk/tx/cn20k/tx_16_31.c
new file mode 100644
index 0000000000..953f63b192
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_16_31.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_16_31
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c
new file mode 100644
index 0000000000..cdfd6bf69c
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_16_31_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_16_31
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c b/drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c
new file mode 100644
index 0000000000..6e6ad7c968
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_16_31_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_16_31
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c
new file mode 100644
index 0000000000..a3a0fcace3
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_16_31_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_16_31
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_32_47.c b/drivers/net/cnxk/tx/cn20k/tx_32_47.c
new file mode 100644
index 0000000000..50295fcd16
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_32_47.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_32_47
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c
new file mode 100644
index 0000000000..8b4da505ad
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_32_47_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_32_47
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c b/drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c
new file mode 100644
index 0000000000..3a3298ffa6
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_32_47_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_32_47
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c
new file mode 100644
index 0000000000..93168990a8
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_32_47_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_32_47
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_48_63.c b/drivers/net/cnxk/tx/cn20k/tx_48_63.c
new file mode 100644
index 0000000000..5765b1fe57
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_48_63.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_48_63
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c
new file mode 100644
index 0000000000..5f591eee68
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_48_63_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_48_63
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c b/drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c
new file mode 100644
index 0000000000..06eec15976
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_48_63_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_48_63
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c
new file mode 100644
index 0000000000..220f117c47
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_48_63_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_48_63
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_64_79.c b/drivers/net/cnxk/tx/cn20k/tx_64_79.c
new file mode 100644
index 0000000000..c05ef2a238
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_64_79.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_64_79
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c
new file mode 100644
index 0000000000..79d40a09ed
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_64_79_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_64_79
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c b/drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c
new file mode 100644
index 0000000000..a4fac7e73e
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_64_79_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_64_79
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c
new file mode 100644
index 0000000000..90d6b4f2f9
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_64_79_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_64_79
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_80_95.c b/drivers/net/cnxk/tx/cn20k/tx_80_95.c
new file mode 100644
index 0000000000..8a09ff842b
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_80_95.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_80_95
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c
new file mode 100644
index 0000000000..59f959b29f
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_80_95_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_80_95
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c b/drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c
new file mode 100644
index 0000000000..ca78d42344
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_80_95_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_80_95
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c
new file mode 100644
index 0000000000..a3a9856783
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_80_95_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_80_95
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_96_111.c b/drivers/net/cnxk/tx/cn20k/tx_96_111.c
new file mode 100644
index 0000000000..fab39f8fcc
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_96_111.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT(cn20k_nix_xmit_pkts_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_96_111
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c
new file mode 100644
index 0000000000..11b6814223
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_96_111_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_MSEG(cn20k_nix_xmit_pkts_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_96_111
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c b/drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c
new file mode 100644
index 0000000000..e1e3b1bca3
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_96_111_vec.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC(cn20k_nix_xmit_pkts_vec_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_96_111
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c b/drivers/net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c
new file mode 100644
index 0000000000..b6af4e34c0
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_96_111_vec_mseg.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if !defined(CNXK_DIS_TMPLT_FUNC)
+
+#define T(name, sz, flags) NIX_TX_XMIT_VEC_MSEG(cn20k_nix_xmit_pkts_vec_mseg_##name, sz, flags)
+
+NIX_TX_FASTPATH_MODES_96_111
+#undef T
+
+#endif
diff --git a/drivers/net/cnxk/tx/cn20k/tx_all_offload.c b/drivers/net/cnxk/tx/cn20k/tx_all_offload.c
new file mode 100644
index 0000000000..711825b0dc
--- /dev/null
+++ b/drivers/net/cnxk/tx/cn20k/tx_all_offload.c
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2024 Marvell.
+ */
+
+#include "cn20k_tx.h"
+
+#ifdef _ROC_API_H_
+#error "roc_api.h is included"
+#endif
+
+#if defined(CNXK_DIS_TMPLT_FUNC)
+
+uint16_t __rte_noinline __rte_hot
+cn20k_nix_xmit_pkts_all_offload(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts)
+{
+	uint64_t cmd[8 + CNXK_NIX_TX_MSEG_SG_DWORDS - 2];
+
+	return cn20k_nix_xmit_pkts_mseg(tx_queue, NULL, tx_pkts, pkts, cmd,
+				NIX_TX_OFFLOAD_L3_L4_CSUM_F | NIX_TX_OFFLOAD_OL3_OL4_CSUM_F |
+				NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_MBUF_NOFF_F |
+				NIX_TX_OFFLOAD_TSO_F | NIX_TX_OFFLOAD_TSTAMP_F |
+				NIX_TX_OFFLOAD_SECURITY_F | NIX_TX_MULTI_SEG_F);
+}
+
+uint16_t __rte_noinline __rte_hot
+cn20k_nix_xmit_pkts_vec_all_offload(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts)
+{
+	uint64_t cmd[8 + CNXK_NIX_TX_MSEG_SG_DWORDS - 2];
+
+	return cn20k_nix_xmit_pkts_vector(tx_queue, NULL, tx_pkts, pkts, cmd,
+				NIX_TX_OFFLOAD_L3_L4_CSUM_F | NIX_TX_OFFLOAD_OL3_OL4_CSUM_F |
+				NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_MBUF_NOFF_F |
+				NIX_TX_OFFLOAD_TSO_F | NIX_TX_OFFLOAD_TSTAMP_F |
+				NIX_TX_OFFLOAD_SECURITY_F | NIX_TX_MULTI_SEG_F);
+}
+
+#endif
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v3 13/18] net/cnxk: support Rx burst scalar for cn20k
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
                     ` (11 preceding siblings ...)
  2024-10-01 12:40   ` [PATCH v3 12/18] net/cnxk: support Tx " Nithin Dabilpuram
@ 2024-10-01 12:40   ` Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 14/18] net/cnxk: support Rx burst vector " Nithin Dabilpuram
                     ` (5 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-10-01 12:40 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rahul Bhansali, Pavan Nikhilesh

Add Rx burst support scalar version for cn20k.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 drivers/net/cnxk/cn20k_ethdev.c    | 126 +++++++++
 drivers/net/cnxk/cn20k_rx.h        | 394 ++++++++++++++++++++++++++++-
 drivers/net/cnxk/cn20k_rx_select.c |   6 +-
 drivers/net/cnxk/cn20k_rxtx.h      | 156 ++++++++++++
 4 files changed, 674 insertions(+), 8 deletions(-)

diff --git a/drivers/net/cnxk/cn20k_ethdev.c b/drivers/net/cnxk/cn20k_ethdev.c
index 1c67de8fa9..ac2b0e1b50 100644
--- a/drivers/net/cnxk/cn20k_ethdev.c
+++ b/drivers/net/cnxk/cn20k_ethdev.c
@@ -330,6 +330,33 @@ cn20k_nix_rx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid, uint16_t nb_
 	return 0;
 }
 
+static void
+cn20k_nix_rx_queue_meta_aura_update(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct cnxk_eth_rxq_sp *rxq_sp;
+	struct cn20k_eth_rxq *rxq;
+	struct roc_nix_rq *rq;
+	int i;
+
+	/* Update Aura handle for fastpath rx queues */
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		rq = &dev->rqs[i];
+		rxq = eth_dev->data->rx_queues[i];
+		rxq->meta_aura = rq->meta_aura_handle;
+		rxq->meta_pool = dev->nix.meta_mempool;
+		/* Assume meta packet from normal aura if meta aura is not setup
+		 */
+		if (!rxq->meta_aura) {
+			rxq_sp = cnxk_eth_rxq_to_sp(rxq);
+			rxq->meta_aura = rxq_sp->qconf.mp->pool_id;
+			rxq->meta_pool = (uintptr_t)rxq_sp->qconf.mp;
+		}
+	}
+	/* Store mempool in lookup mem */
+	cnxk_nix_lookup_mem_metapool_set(dev);
+}
+
 static int
 cn20k_nix_tx_queue_stop(struct rte_eth_dev *eth_dev, uint16_t qidx)
 {
@@ -371,6 +398,74 @@ cn20k_nix_configure(struct rte_eth_dev *eth_dev)
 	return 0;
 }
 
+/* Function to enable ptp config for VFs */
+static void
+nix_ptp_enable_vf(struct rte_eth_dev *eth_dev)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+
+	if (nix_recalc_mtu(eth_dev))
+		plt_err("Failed to set MTU size for ptp");
+
+	dev->rx_offload_flags |= NIX_RX_OFFLOAD_TSTAMP_F;
+
+	/* Setting up the function pointers as per new offload flags */
+	cn20k_eth_set_rx_function(eth_dev);
+	cn20k_eth_set_tx_function(eth_dev);
+}
+
+static uint16_t
+nix_ptp_vf_burst(void *queue, struct rte_mbuf **mbufs, uint16_t pkts)
+{
+	struct cn20k_eth_rxq *rxq = queue;
+	struct cnxk_eth_rxq_sp *rxq_sp;
+	struct rte_eth_dev *eth_dev;
+
+	RTE_SET_USED(mbufs);
+	RTE_SET_USED(pkts);
+
+	rxq_sp = cnxk_eth_rxq_to_sp(rxq);
+	eth_dev = rxq_sp->dev->eth_dev;
+	nix_ptp_enable_vf(eth_dev);
+
+	return 0;
+}
+
+static int
+cn20k_nix_ptp_info_update_cb(struct roc_nix *nix, bool ptp_en)
+{
+	struct cnxk_eth_dev *dev = (struct cnxk_eth_dev *)nix;
+	struct rte_eth_dev *eth_dev;
+	struct cn20k_eth_rxq *rxq;
+	int i;
+
+	if (!dev)
+		return -EINVAL;
+
+	eth_dev = dev->eth_dev;
+	if (!eth_dev)
+		return -EINVAL;
+
+	dev->ptp_en = ptp_en;
+
+	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
+		rxq = eth_dev->data->rx_queues[i];
+		rxq->mbuf_initializer = cnxk_nix_rxq_mbuf_setup(dev);
+	}
+
+	if (roc_nix_is_vf_or_sdp(nix) && !(roc_nix_is_sdp(nix)) && !(roc_nix_is_lbk(nix))) {
+		/* In case of VF, setting of MTU cannot be done directly in this
+		 * function as this is running as part of MBOX request(PF->VF)
+		 * and MTU setting also requires MBOX message to be
+		 * sent(VF->PF)
+		 */
+		eth_dev->rx_pkt_burst = nix_ptp_vf_burst;
+		rte_mb();
+	}
+
+	return 0;
+}
+
 static int
 cn20k_nix_timesync_enable(struct rte_eth_dev *eth_dev)
 {
@@ -451,11 +546,21 @@ cn20k_nix_dev_start(struct rte_eth_dev *eth_dev)
 	if (rc)
 		return rc;
 
+	/* Update VF about data off shifted by 8 bytes if PTP already
+	 * enabled in PF owning this VF
+	 */
+	if (dev->ptp_en && (!roc_nix_is_pf(nix) && (!roc_nix_is_sdp(nix))))
+		nix_ptp_enable_vf(eth_dev);
+
 	/* Setting up the rx[tx]_offload_flags due to change
 	 * in rx[tx]_offloads.
 	 */
 	dev->rx_offload_flags |= nix_rx_offload_flags(eth_dev);
 	dev->tx_offload_flags |= nix_tx_offload_flags(eth_dev);
+
+	if (dev->rx_offload_flags & NIX_RX_OFFLOAD_SECURITY_F)
+		cn20k_nix_rx_queue_meta_aura_update(eth_dev);
+
 	/* Set flags for Rx Inject feature */
 	if (roc_idev_nix_rx_inject_get(nix->port_id))
 		dev->rx_offload_flags |= NIX_RX_SEC_REASSEMBLY_F;
@@ -621,6 +726,20 @@ nix_tm_ops_override(void)
 	if (init_once)
 		return;
 	init_once = 1;
+
+	/* Update platform specific ops */
+}
+
+static void
+npc_flow_ops_override(void)
+{
+	static int init_once;
+
+	if (init_once)
+		return;
+	init_once = 1;
+
+	/* Update platform specific ops */
 }
 
 static int
@@ -633,6 +752,7 @@ static int
 cn20k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 {
 	struct rte_eth_dev *eth_dev;
+	struct cnxk_eth_dev *dev;
 	int rc;
 
 	rc = roc_plt_init();
@@ -643,6 +763,7 @@ cn20k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 
 	nix_eth_dev_ops_override();
 	nix_tm_ops_override();
+	npc_flow_ops_override();
 
 	/* Common probe */
 	rc = cnxk_nix_probe(pci_drv, pci_dev);
@@ -665,6 +786,11 @@ cn20k_nix_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 		return 0;
 	}
 
+	dev = cnxk_eth_pmd_priv(eth_dev);
+
+	/* Register up msg callbacks for PTP information */
+	roc_nix_ptp_info_cb_register(&dev->nix, cn20k_nix_ptp_info_update_cb);
+
 	return 0;
 }
 
diff --git a/drivers/net/cnxk/cn20k_rx.h b/drivers/net/cnxk/cn20k_rx.h
index 2cb77c0b46..22abf7bbd8 100644
--- a/drivers/net/cnxk/cn20k_rx.h
+++ b/drivers/net/cnxk/cn20k_rx.h
@@ -29,8 +29,397 @@
 #define NIX_RX_VWQE_F	   BIT(13)
 #define NIX_RX_MULTI_SEG_F BIT(14)
 
+#define CNXK_NIX_CQ_ENTRY_SZ 128
+#define NIX_DESCS_PER_LOOP   4
+#define CQE_CAST(x)	     ((struct nix_cqe_hdr_s *)(x))
+#define CQE_SZ(x)	     ((x) * CNXK_NIX_CQ_ENTRY_SZ)
+
+#define CQE_PTR_OFF(b, i, o, f)                                                                    \
+	(((f) & NIX_RX_VWQE_F) ? (uint64_t *)(((uintptr_t)((uint64_t *)(b))[i]) + (o)) :           \
+				 (uint64_t *)(((uintptr_t)(b)) + CQE_SZ(i) + (o)))
+#define CQE_PTR_DIFF(b, i, o, f)                                                                   \
+	(((f) & NIX_RX_VWQE_F) ? (uint64_t *)(((uintptr_t)((uint64_t *)(b))[i]) - (o)) :           \
+				 (uint64_t *)(((uintptr_t)(b)) + CQE_SZ(i) - (o)))
+
+#define NIX_RX_SEC_UCC_CONST                                                                       \
+	((RTE_MBUF_F_RX_IP_CKSUM_BAD >> 1) |                                                       \
+	 ((RTE_MBUF_F_RX_IP_CKSUM_GOOD | RTE_MBUF_F_RX_L4_CKSUM_GOOD) >> 1) << 8 |                 \
+	 ((RTE_MBUF_F_RX_IP_CKSUM_GOOD | RTE_MBUF_F_RX_L4_CKSUM_BAD) >> 1) << 16 |                 \
+	 ((RTE_MBUF_F_RX_IP_CKSUM_GOOD | RTE_MBUF_F_RX_L4_CKSUM_GOOD) >> 1) << 32 |                \
+	 ((RTE_MBUF_F_RX_IP_CKSUM_GOOD | RTE_MBUF_F_RX_L4_CKSUM_GOOD) >> 1) << 48)
+
+#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
+static inline void
+nix_mbuf_validate_next(struct rte_mbuf *m)
+{
+	if (m->nb_segs == 1 && m->next) {
+		rte_panic("mbuf->next[%p] valid when mbuf->nb_segs is %d", m->next, m->nb_segs);
+	}
+}
+#else
+static inline void
+nix_mbuf_validate_next(struct rte_mbuf *m)
+{
+	RTE_SET_USED(m);
+}
+#endif
+
 #define NIX_RX_SEC_REASSEMBLY_F (NIX_RX_REAS_F | NIX_RX_OFFLOAD_SECURITY_F)
 
+static inline rte_eth_ip_reassembly_dynfield_t *
+cnxk_ip_reassembly_dynfield(struct rte_mbuf *mbuf, int ip_reassembly_dynfield_offset)
+{
+	return RTE_MBUF_DYNFIELD(mbuf, ip_reassembly_dynfield_offset,
+				 rte_eth_ip_reassembly_dynfield_t *);
+}
+
+union mbuf_initializer {
+	struct {
+		uint16_t data_off;
+		uint16_t refcnt;
+		uint16_t nb_segs;
+		uint16_t port;
+	} fields;
+	uint64_t value;
+};
+
+static __rte_always_inline uint64_t
+nix_clear_data_off(uint64_t oldval)
+{
+	union mbuf_initializer mbuf_init = {.value = oldval};
+
+	mbuf_init.fields.data_off = 0;
+	return mbuf_init.value;
+}
+
+static __rte_always_inline struct rte_mbuf *
+nix_get_mbuf_from_cqe(void *cq, const uint64_t data_off)
+{
+	rte_iova_t buff;
+
+	/* Skip CQE, NIX_RX_PARSE_S and SG HDR(9 DWORDs) and peek buff addr */
+	buff = *((rte_iova_t *)((uint64_t *)cq + 9));
+	return (struct rte_mbuf *)(buff - data_off);
+}
+
+static __rte_always_inline uint32_t
+nix_ptype_get(const void *const lookup_mem, const uint64_t in)
+{
+	const uint16_t *const ptype = lookup_mem;
+	const uint16_t lh_lg_lf = (in & 0xFFF0000000000000) >> 52;
+	const uint16_t tu_l2 = ptype[(in & 0x000FFFF000000000) >> 36];
+	const uint16_t il4_tu = ptype[PTYPE_NON_TUNNEL_ARRAY_SZ + lh_lg_lf];
+
+	return (il4_tu << PTYPE_NON_TUNNEL_WIDTH) | tu_l2;
+}
+
+static __rte_always_inline uint32_t
+nix_rx_olflags_get(const void *const lookup_mem, const uint64_t in)
+{
+	const uint32_t *const ol_flags =
+		(const uint32_t *)((const uint8_t *)lookup_mem + PTYPE_ARRAY_SZ);
+
+	return ol_flags[(in & 0xfff00000) >> 20];
+}
+
+static inline uint64_t
+nix_update_match_id(const uint16_t match_id, uint64_t ol_flags, struct rte_mbuf *mbuf)
+{
+	/* There is no separate bit to check match_id
+	 * is valid or not? and no flag to identify it is an
+	 * RTE_FLOW_ACTION_TYPE_FLAG vs RTE_FLOW_ACTION_TYPE_MARK
+	 * action. The former case addressed through 0 being invalid
+	 * value and inc/dec match_id pair when MARK is activated.
+	 * The later case addressed through defining
+	 * CNXK_FLOW_MARK_DEFAULT as value for
+	 * RTE_FLOW_ACTION_TYPE_MARK.
+	 * This would translate to not use
+	 * CNXK_FLOW_ACTION_FLAG_DEFAULT - 1 and
+	 * CNXK_FLOW_ACTION_FLAG_DEFAULT for match_id.
+	 * i.e valid mark_id's are from
+	 * 0 to CNXK_FLOW_ACTION_FLAG_DEFAULT - 2
+	 */
+	if (likely(match_id)) {
+		ol_flags |= RTE_MBUF_F_RX_FDIR;
+		if (match_id != CNXK_FLOW_ACTION_FLAG_DEFAULT) {
+			ol_flags |= RTE_MBUF_F_RX_FDIR_ID;
+			mbuf->hash.fdir.hi = match_id - 1;
+		}
+	}
+
+	return ol_flags;
+}
+
+static __rte_always_inline void
+nix_cqe_xtract_mseg(const union nix_rx_parse_u *rx, struct rte_mbuf *mbuf, uint64_t rearm,
+		    uintptr_t cpth, uintptr_t sa_base, const uint16_t flags)
+{
+	const rte_iova_t *iova_list;
+	uint16_t later_skip = 0;
+	struct rte_mbuf *head;
+	const rte_iova_t *eol;
+	uint8_t nb_segs;
+	uint16_t sg_len;
+	int64_t len;
+	uint64_t sg;
+	uintptr_t p;
+
+	(void)cpth;
+	(void)sa_base;
+
+	sg = *(const uint64_t *)(rx + 1);
+	nb_segs = (sg >> 48) & 0x3;
+
+	if (nb_segs == 1)
+		return;
+
+	len = rx->pkt_lenm1 + 1;
+
+	mbuf->pkt_len = len - (flags & NIX_RX_OFFLOAD_TSTAMP_F ? CNXK_NIX_TIMESYNC_RX_OFFSET : 0);
+	mbuf->nb_segs = nb_segs;
+	head = mbuf;
+	mbuf->data_len =
+		(sg & 0xFFFF) - (flags & NIX_RX_OFFLOAD_TSTAMP_F ? CNXK_NIX_TIMESYNC_RX_OFFSET : 0);
+	eol = ((const rte_iova_t *)(rx + 1) + ((rx->desc_sizem1 + 1) << 1));
+
+	len -= mbuf->data_len;
+	sg = sg >> 16;
+	/* Skip SG_S and first IOVA*/
+	iova_list = ((const rte_iova_t *)(rx + 1)) + 2;
+	nb_segs--;
+
+	later_skip = (uintptr_t)mbuf->buf_addr - (uintptr_t)mbuf;
+
+	while (nb_segs) {
+		mbuf->next = (struct rte_mbuf *)(*iova_list - later_skip);
+		mbuf = mbuf->next;
+
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf->pool, (void **)&mbuf, 1, 1);
+
+		sg_len = sg & 0XFFFF;
+
+		mbuf->data_len = sg_len;
+		sg = sg >> 16;
+		p = (uintptr_t)&mbuf->rearm_data;
+		*(uint64_t *)p = rearm & ~0xFFFF;
+		nb_segs--;
+		iova_list++;
+
+		if (!nb_segs && (iova_list + 1 < eol)) {
+			sg = *(const uint64_t *)(iova_list);
+			nb_segs = (sg >> 48) & 0x3;
+			head->nb_segs += nb_segs;
+			iova_list = (const rte_iova_t *)(iova_list + 1);
+		}
+	}
+}
+
+static __rte_always_inline void
+cn20k_nix_cqe_to_mbuf(const struct nix_cqe_hdr_s *cq, const uint32_t tag, struct rte_mbuf *mbuf,
+		      const void *lookup_mem, const uint64_t val, const uintptr_t cpth,
+		      const uintptr_t sa_base, const uint16_t flag)
+{
+	const union nix_rx_parse_u *rx = (const union nix_rx_parse_u *)((const uint64_t *)cq + 1);
+	const uint64_t w1 = *(const uint64_t *)rx;
+	uint16_t len = rx->pkt_lenm1 + 1;
+	uint64_t ol_flags = 0;
+	uintptr_t p;
+
+	if (flag & NIX_RX_OFFLOAD_PTYPE_F)
+		mbuf->packet_type = nix_ptype_get(lookup_mem, w1);
+	else
+		mbuf->packet_type = 0;
+
+	if (flag & NIX_RX_OFFLOAD_RSS_F) {
+		mbuf->hash.rss = tag;
+		ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
+	}
+
+	/* Skip rx ol flags extraction for Security packets */
+	ol_flags |= (uint64_t)nix_rx_olflags_get(lookup_mem, w1);
+
+	if (flag & NIX_RX_OFFLOAD_VLAN_STRIP_F) {
+		if (rx->vtag0_gone) {
+			ol_flags |= RTE_MBUF_F_RX_VLAN | RTE_MBUF_F_RX_VLAN_STRIPPED;
+			mbuf->vlan_tci = rx->vtag0_tci;
+		}
+		if (rx->vtag1_gone) {
+			ol_flags |= RTE_MBUF_F_RX_QINQ | RTE_MBUF_F_RX_QINQ_STRIPPED;
+			mbuf->vlan_tci_outer = rx->vtag1_tci;
+		}
+	}
+
+	if (flag & NIX_RX_OFFLOAD_MARK_UPDATE_F)
+		ol_flags = nix_update_match_id(rx->match_id, ol_flags, mbuf);
+
+	mbuf->ol_flags = ol_flags;
+	mbuf->pkt_len = len;
+	mbuf->data_len = len;
+	p = (uintptr_t)&mbuf->rearm_data;
+	*(uint64_t *)p = val;
+
+	if (flag & NIX_RX_MULTI_SEG_F)
+		/*
+		 * For multi segment packets, mbuf length correction according
+		 * to Rx timestamp length will be handled later during
+		 * timestamp data process.
+		 * Hence, timestamp flag argument is not required.
+		 */
+		nix_cqe_xtract_mseg(rx, mbuf, val, cpth, sa_base, flag & ~NIX_RX_OFFLOAD_TSTAMP_F);
+}
+
+static inline uint16_t
+nix_rx_nb_pkts(struct cn20k_eth_rxq *rxq, const uint64_t wdata, const uint16_t pkts,
+	       const uint32_t qmask)
+{
+	uint32_t available = rxq->available;
+
+	/* Update the available count if cached value is not enough */
+	if (unlikely(available < pkts)) {
+		uint64_t reg, head, tail;
+
+		/* Use LDADDA version to avoid reorder */
+		reg = roc_atomic64_add_sync(wdata, rxq->cq_status);
+		/* CQ_OP_STATUS operation error */
+		if (reg & BIT_ULL(NIX_CQ_OP_STAT_OP_ERR) || reg & BIT_ULL(NIX_CQ_OP_STAT_CQ_ERR))
+			return 0;
+
+		tail = reg & 0xFFFFF;
+		head = (reg >> 20) & 0xFFFFF;
+		if (tail < head)
+			available = tail - head + qmask + 1;
+		else
+			available = tail - head;
+
+		rxq->available = available;
+	}
+
+	return RTE_MIN(pkts, available);
+}
+
+static __rte_always_inline void
+cn20k_nix_mbuf_to_tstamp(struct rte_mbuf *mbuf, struct cnxk_timesync_info *tstamp,
+			 const uint8_t ts_enable, uint64_t *tstamp_ptr)
+{
+	if (ts_enable) {
+		mbuf->pkt_len -= CNXK_NIX_TIMESYNC_RX_OFFSET;
+		mbuf->data_len -= CNXK_NIX_TIMESYNC_RX_OFFSET;
+
+		/* Reading the rx timestamp inserted by CGX, viz at
+		 * starting of the packet data.
+		 */
+		*tstamp_ptr = ((*tstamp_ptr >> 32) * NSEC_PER_SEC) + (*tstamp_ptr & 0xFFFFFFFFUL);
+		*cnxk_nix_timestamp_dynfield(mbuf, tstamp) = rte_be_to_cpu_64(*tstamp_ptr);
+		/* RTE_MBUF_F_RX_IEEE1588_TMST flag needs to be set only in case
+		 * PTP packets are received.
+		 */
+		if (mbuf->packet_type == RTE_PTYPE_L2_ETHER_TIMESYNC) {
+			tstamp->rx_tstamp = *cnxk_nix_timestamp_dynfield(mbuf, tstamp);
+			tstamp->rx_ready = 1;
+			mbuf->ol_flags |= RTE_MBUF_F_RX_IEEE1588_PTP | RTE_MBUF_F_RX_IEEE1588_TMST |
+					  tstamp->rx_tstamp_dynflag;
+		}
+	}
+}
+
+static __rte_always_inline uint16_t
+cn20k_nix_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts, const uint16_t flags)
+{
+	struct cn20k_eth_rxq *rxq = rx_queue;
+	const uint64_t mbuf_init = rxq->mbuf_initializer;
+	const void *lookup_mem = rxq->lookup_mem;
+	const uint64_t data_off = rxq->data_off;
+	const uintptr_t desc = rxq->desc;
+	const uint64_t wdata = rxq->wdata;
+	const uint32_t qmask = rxq->qmask;
+	uint16_t packets = 0, nb_pkts;
+	uint32_t head = rxq->head;
+	struct nix_cqe_hdr_s *cq;
+	struct rte_mbuf *mbuf;
+	uint64_t sa_base = 0;
+	uintptr_t cpth = 0;
+
+	nb_pkts = nix_rx_nb_pkts(rxq, wdata, pkts, qmask);
+
+	while (packets < nb_pkts) {
+		/* Prefetch N desc ahead */
+		rte_prefetch_non_temporal((void *)(desc + (CQE_SZ((head + 2) & qmask))));
+		cq = (struct nix_cqe_hdr_s *)(desc + CQE_SZ(head));
+
+		mbuf = nix_get_mbuf_from_cqe(cq, data_off);
+
+		/* Mark mempool obj as "get" as it is alloc'ed by NIX */
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf->pool, (void **)&mbuf, 1, 1);
+
+		cn20k_nix_cqe_to_mbuf(cq, cq->tag, mbuf, lookup_mem, mbuf_init, cpth, sa_base,
+				      flags);
+		cn20k_nix_mbuf_to_tstamp(mbuf, rxq->tstamp, (flags & NIX_RX_OFFLOAD_TSTAMP_F),
+					 (uint64_t *)((uint8_t *)mbuf + data_off));
+		rx_pkts[packets++] = mbuf;
+		roc_prefetch_store_keep(mbuf);
+		head++;
+		head &= qmask;
+	}
+
+	rxq->head = head;
+	rxq->available -= nb_pkts;
+
+	/* Free all the CQs that we've processed */
+	plt_write64((wdata | nb_pkts), rxq->cq_door);
+
+	return nb_pkts;
+}
+
+static __rte_always_inline uint16_t
+cn20k_nix_flush_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts,
+			  const uint16_t flags)
+{
+	struct cn20k_eth_rxq *rxq = rx_queue;
+	const uint64_t mbuf_init = rxq->mbuf_initializer;
+	const void *lookup_mem = rxq->lookup_mem;
+	const uint64_t data_off = rxq->data_off;
+	const uint64_t wdata = rxq->wdata;
+	const uint32_t qmask = rxq->qmask;
+	const uintptr_t desc = rxq->desc;
+	uint16_t packets = 0, nb_pkts;
+	uint16_t lmt_id __rte_unused;
+	uint32_t head = rxq->head;
+	struct nix_cqe_hdr_s *cq;
+	struct rte_mbuf *mbuf;
+	uint64_t sa_base = 0;
+	uintptr_t cpth = 0;
+
+	nb_pkts = nix_rx_nb_pkts(rxq, wdata, pkts, qmask);
+
+	while (packets < nb_pkts) {
+		/* Prefetch N desc ahead */
+		rte_prefetch_non_temporal((void *)(desc + (CQE_SZ((head + 2) & qmask))));
+		cq = (struct nix_cqe_hdr_s *)(desc + CQE_SZ(head));
+
+		mbuf = nix_get_mbuf_from_cqe(cq, data_off);
+
+		/* Mark mempool obj as "get" as it is alloc'ed by NIX */
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf->pool, (void **)&mbuf, 1, 1);
+
+		cn20k_nix_cqe_to_mbuf(cq, cq->tag, mbuf, lookup_mem, mbuf_init, cpth, sa_base,
+				      flags);
+		cn20k_nix_mbuf_to_tstamp(mbuf, rxq->tstamp, (flags & NIX_RX_OFFLOAD_TSTAMP_F),
+					 (uint64_t *)((uint8_t *)mbuf + data_off));
+		rx_pkts[packets++] = mbuf;
+		roc_prefetch_store_keep(mbuf);
+		head++;
+		head &= qmask;
+	}
+
+	rxq->head = head;
+	rxq->available -= nb_pkts;
+
+	/* Free all the CQs that we've processed */
+	plt_write64((wdata | nb_pkts), rxq->cq_door);
+
+	return nb_pkts;
+}
+
 #define RSS_F	  NIX_RX_OFFLOAD_RSS_F
 #define PTYPE_F	  NIX_RX_OFFLOAD_PTYPE_F
 #define CKSUM_F	  NIX_RX_OFFLOAD_CHECKSUM_F
@@ -220,10 +609,7 @@ NIX_RX_FASTPATH_MODES
 	uint16_t __rte_noinline __rte_hot fn(void *rx_queue, struct rte_mbuf **rx_pkts,            \
 					     uint16_t pkts)                                        \
 	{                                                                                          \
-		RTE_SET_USED(rx_queue);                                                            \
-		RTE_SET_USED(rx_pkts);                                                             \
-		RTE_SET_USED(pkts);                                                                \
-		return 0;                                                                          \
+		return cn20k_nix_recv_pkts(rx_queue, rx_pkts, pkts, (flags));                      \
 	}
 
 #define NIX_RX_RECV_MSEG(fn, flags) NIX_RX_RECV(fn, flags | NIX_RX_MULTI_SEG_F)
diff --git a/drivers/net/cnxk/cn20k_rx_select.c b/drivers/net/cnxk/cn20k_rx_select.c
index 82e06a62ef..25c79434cd 100644
--- a/drivers/net/cnxk/cn20k_rx_select.c
+++ b/drivers/net/cnxk/cn20k_rx_select.c
@@ -22,10 +22,8 @@ pick_rx_func(struct rte_eth_dev *eth_dev, const eth_rx_burst_t rx_burst[NIX_RX_O
 static uint16_t __rte_noinline __rte_hot __rte_unused
 cn20k_nix_flush_rx(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts)
 {
-	RTE_SET_USED(rx_queue);
-	RTE_SET_USED(rx_pkts);
-	RTE_SET_USED(pkts);
-	return 0;
+	const uint16_t flags = NIX_RX_MULTI_SEG_F | NIX_RX_REAS_F | NIX_RX_OFFLOAD_SECURITY_F;
+	return cn20k_nix_flush_recv_pkts(rx_queue, rx_pkts, pkts, flags);
 }
 
 #if defined(RTE_ARCH_ARM64)
diff --git a/drivers/net/cnxk/cn20k_rxtx.h b/drivers/net/cnxk/cn20k_rxtx.h
index 5cc445d4b1..03eaf34d64 100644
--- a/drivers/net/cnxk/cn20k_rxtx.h
+++ b/drivers/net/cnxk/cn20k_rxtx.h
@@ -83,7 +83,163 @@ struct cn20k_eth_rxq {
 	struct cnxk_timesync_info *tstamp;
 } __plt_cache_aligned;
 
+/* Private data in sw rsvd area of struct roc_ot_ipsec_inb_sa */
+struct cn20k_inb_priv_data {
+	void *userdata;
+	int reass_dynfield_off;
+	int reass_dynflag_bit;
+	struct cnxk_eth_sec_sess *eth_sec;
+};
+
+struct cn20k_sec_sess_priv {
+	union {
+		struct {
+			uint32_t sa_idx;
+			uint8_t inb_sa : 1;
+			uint8_t outer_ip_ver : 1;
+			uint8_t mode : 1;
+			uint8_t roundup_byte : 5;
+			uint8_t roundup_len;
+			uint16_t partial_len : 10;
+			uint16_t chksum : 2;
+			uint16_t dec_ttl : 1;
+			uint16_t nixtx_off : 1;
+			uint16_t rsvd : 2;
+		};
+
+		uint64_t u64;
+	};
+} __rte_packed;
+
 #define LMT_OFF(lmt_addr, lmt_num, offset)                                                         \
 	(void *)((uintptr_t)(lmt_addr) + ((uint64_t)(lmt_num) << ROC_LMT_LINE_SIZE_LOG2) + (offset))
 
+static inline uint16_t
+nix_tx_compl_nb_pkts(struct cn20k_eth_txq *txq, const uint64_t wdata, const uint32_t qmask)
+{
+	uint16_t available = txq->tx_compl.available;
+
+	/* Update the available count if cached value is not enough */
+	if (!unlikely(available)) {
+		uint64_t reg, head, tail;
+
+		/* Use LDADDA version to avoid reorder */
+		reg = roc_atomic64_add_sync(wdata, txq->tx_compl.cq_status);
+		/* CQ_OP_STATUS operation error */
+		if (reg & BIT_ULL(NIX_CQ_OP_STAT_OP_ERR) || reg & BIT_ULL(NIX_CQ_OP_STAT_CQ_ERR))
+			return 0;
+
+		tail = reg & 0xFFFFF;
+		head = (reg >> 20) & 0xFFFFF;
+		if (tail < head)
+			available = tail - head + qmask + 1;
+		else
+			available = tail - head;
+
+		txq->tx_compl.available = available;
+	}
+	return available;
+}
+
+static inline void
+handle_tx_completion_pkts(struct cn20k_eth_txq *txq, uint8_t mt_safe)
+{
+#define CNXK_NIX_CQ_ENTRY_SZ 128
+#define CQE_SZ(x)	     ((x) * CNXK_NIX_CQ_ENTRY_SZ)
+
+	uint16_t tx_pkts = 0, nb_pkts;
+	const uintptr_t desc = txq->tx_compl.desc_base;
+	const uint64_t wdata = txq->tx_compl.wdata;
+	const uint32_t qmask = txq->tx_compl.qmask;
+	uint32_t head = txq->tx_compl.head;
+	struct nix_cqe_hdr_s *tx_compl_cq;
+	struct nix_send_comp_s *tx_compl_s0;
+	struct rte_mbuf *m_next, *m;
+
+	if (mt_safe)
+		rte_spinlock_lock(&txq->tx_compl.ext_buf_lock);
+
+	nb_pkts = nix_tx_compl_nb_pkts(txq, wdata, qmask);
+	while (tx_pkts < nb_pkts) {
+		rte_prefetch_non_temporal((void *)(desc + (CQE_SZ((head + 2) & qmask))));
+		tx_compl_cq = (struct nix_cqe_hdr_s *)(desc + CQE_SZ(head));
+		tx_compl_s0 = (struct nix_send_comp_s *)((uint64_t *)tx_compl_cq + 1);
+		m = txq->tx_compl.ptr[tx_compl_s0->sqe_id];
+		while (m->next != NULL) {
+			m_next = m->next;
+			rte_pktmbuf_free_seg(m);
+			m = m_next;
+		}
+		rte_pktmbuf_free_seg(m);
+		txq->tx_compl.ptr[tx_compl_s0->sqe_id] = NULL;
+
+		head++;
+		head &= qmask;
+		tx_pkts++;
+	}
+	txq->tx_compl.head = head;
+	txq->tx_compl.available -= nb_pkts;
+
+	plt_write64((wdata | nb_pkts), txq->tx_compl.cq_door);
+
+	if (mt_safe)
+		rte_spinlock_unlock(&txq->tx_compl.ext_buf_lock);
+}
+
+static __rte_always_inline uint64_t
+cn20k_cpt_tx_steor_data(void)
+{
+	/* We have two CPT instructions per LMTLine TODO */
+	const uint64_t dw_m1 = ROC_CN10K_TWO_CPT_INST_DW_M1;
+	uint64_t data;
+
+	/* This will be moved to addr area */
+	data = dw_m1 << 16;
+	data |= dw_m1 << 19;
+	data |= dw_m1 << 22;
+	data |= dw_m1 << 25;
+	data |= dw_m1 << 28;
+	data |= dw_m1 << 31;
+	data |= dw_m1 << 34;
+	data |= dw_m1 << 37;
+	data |= dw_m1 << 40;
+	data |= dw_m1 << 43;
+	data |= dw_m1 << 46;
+	data |= dw_m1 << 49;
+	data |= dw_m1 << 52;
+	data |= dw_m1 << 55;
+	data |= dw_m1 << 58;
+	data |= dw_m1 << 61;
+
+	return data;
+}
+
+static __rte_always_inline void
+cn20k_nix_sec_steorl(uintptr_t io_addr, uint32_t lmt_id, uint8_t lnum, uint8_t loff, uint8_t shft)
+{
+	uint64_t data;
+	uintptr_t pa;
+
+	/* Check if there is any CPT instruction to submit */
+	if (!lnum && !loff)
+		return;
+
+	data = cn20k_cpt_tx_steor_data();
+	/* Update lmtline use for partial end line */
+	if (loff) {
+		data &= ~(0x7ULL << shft);
+		/* Update it to half full i.e 64B */
+		data |= (0x3UL << shft);
+	}
+
+	pa = io_addr | ((data >> 16) & 0x7) << 4;
+	data &= ~(0x7ULL << 16);
+	/* Update lines - 1 that contain valid data */
+	data |= ((uint64_t)(lnum + loff - 1)) << 12;
+	data |= (uint64_t)lmt_id;
+
+	/* STEOR */
+	roc_lmt_submit_steorl(data, pa);
+}
+
 #endif /* __CN20K_RXTX_H__ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v3 14/18] net/cnxk: support Rx burst vector for cn20k
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
                     ` (12 preceding siblings ...)
  2024-10-01 12:40   ` [PATCH v3 13/18] net/cnxk: support Rx burst scalar " Nithin Dabilpuram
@ 2024-10-01 12:40   ` Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 15/18] net/cnxk: support Tx burst scalar " Nithin Dabilpuram
                     ` (4 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-10-01 12:40 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rahul Bhansali, Pavan Nikhilesh

Add Rx vector support for cn20k

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 drivers/net/cnxk/cn20k_rx.h | 463 +++++++++++++++++++++++++++++++++++-
 1 file changed, 459 insertions(+), 4 deletions(-)

diff --git a/drivers/net/cnxk/cn20k_rx.h b/drivers/net/cnxk/cn20k_rx.h
index 22abf7bbd8..d1bf0c615e 100644
--- a/drivers/net/cnxk/cn20k_rx.h
+++ b/drivers/net/cnxk/cn20k_rx.h
@@ -420,6 +420,463 @@ cn20k_nix_flush_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pk
 	return nb_pkts;
 }
 
+#if defined(RTE_ARCH_ARM64)
+
+static __rte_always_inline uint64_t
+nix_vlan_update(const uint64_t w2, uint64_t ol_flags, uint8x16_t *f)
+{
+	if (w2 & BIT_ULL(21) /* vtag0_gone */) {
+		ol_flags |= RTE_MBUF_F_RX_VLAN | RTE_MBUF_F_RX_VLAN_STRIPPED;
+		*f = vsetq_lane_u16((uint16_t)(w2 >> 32), *f, 5);
+	}
+
+	return ol_flags;
+}
+
+static __rte_always_inline uint64_t
+nix_qinq_update(const uint64_t w2, uint64_t ol_flags, struct rte_mbuf *mbuf)
+{
+	if (w2 & BIT_ULL(23) /* vtag1_gone */) {
+		ol_flags |= RTE_MBUF_F_RX_QINQ | RTE_MBUF_F_RX_QINQ_STRIPPED;
+		mbuf->vlan_tci_outer = (uint16_t)(w2 >> 48);
+	}
+
+	return ol_flags;
+}
+
+#define NIX_PUSH_META_TO_FREE(_mbuf, _laddr, _loff_p)                                              \
+	do {                                                                                       \
+		*(uint64_t *)((_laddr) + (*(_loff_p) << 3)) = (uint64_t)_mbuf;                     \
+		*(_loff_p) = *(_loff_p) + 1;                                                       \
+		/* Mark meta mbuf as put */                                                        \
+		RTE_MEMPOOL_CHECK_COOKIES(_mbuf->pool, (void **)&_mbuf, 1, 0);                     \
+	} while (0)
+
+static __rte_always_inline uint16_t
+cn20k_nix_recv_pkts_vector(void *args, struct rte_mbuf **mbufs, uint16_t pkts, const uint16_t flags,
+			   void *lookup_mem, struct cnxk_timesync_info *tstamp, uintptr_t lmt_base,
+			   uint64_t meta_aura)
+{
+	struct cn20k_eth_rxq *rxq = args;
+	const uint64_t mbuf_initializer =
+		(flags & NIX_RX_VWQE_F) ? *(uint64_t *)args : rxq->mbuf_initializer;
+	const uint64x2_t data_off = flags & NIX_RX_VWQE_F ? vdupq_n_u64(RTE_PKTMBUF_HEADROOM) :
+							    vdupq_n_u64(rxq->data_off);
+	const uint32_t qmask = flags & NIX_RX_VWQE_F ? 0 : rxq->qmask;
+	const uint64_t wdata = flags & NIX_RX_VWQE_F ? 0 : rxq->wdata;
+	const uintptr_t desc = flags & NIX_RX_VWQE_F ? 0 : rxq->desc;
+	uint64x2_t cq0_w8, cq1_w8, cq2_w8, cq3_w8, mbuf01, mbuf23;
+	uintptr_t cpth0 = 0, cpth1 = 0, cpth2 = 0, cpth3 = 0;
+	uint64_t ol_flags0, ol_flags1, ol_flags2, ol_flags3;
+	uint64x2_t rearm0 = vdupq_n_u64(mbuf_initializer);
+	uint64x2_t rearm1 = vdupq_n_u64(mbuf_initializer);
+	uint64x2_t rearm2 = vdupq_n_u64(mbuf_initializer);
+	uint64x2_t rearm3 = vdupq_n_u64(mbuf_initializer);
+	struct rte_mbuf *mbuf0, *mbuf1, *mbuf2, *mbuf3;
+	uint8x16_t f0, f1, f2, f3;
+	uintptr_t sa_base = 0;
+	uint16_t packets = 0;
+	uint16_t pkts_left;
+	uint32_t head;
+	uintptr_t cq0;
+
+	(void)lmt_base;
+	(void)meta_aura;
+
+	if (!(flags & NIX_RX_VWQE_F)) {
+		lookup_mem = rxq->lookup_mem;
+		head = rxq->head;
+
+		pkts = nix_rx_nb_pkts(rxq, wdata, pkts, qmask);
+		pkts_left = pkts & (NIX_DESCS_PER_LOOP - 1);
+		/* Packets has to be floor-aligned to NIX_DESCS_PER_LOOP */
+		pkts = RTE_ALIGN_FLOOR(pkts, NIX_DESCS_PER_LOOP);
+		if (flags & NIX_RX_OFFLOAD_TSTAMP_F)
+			tstamp = rxq->tstamp;
+
+		cq0 = desc + CQE_SZ(head);
+		rte_prefetch0(CQE_PTR_OFF(cq0, 0, 64, flags));
+		rte_prefetch0(CQE_PTR_OFF(cq0, 1, 64, flags));
+		rte_prefetch0(CQE_PTR_OFF(cq0, 2, 64, flags));
+		rte_prefetch0(CQE_PTR_OFF(cq0, 3, 64, flags));
+	} else {
+		RTE_SET_USED(head);
+	}
+
+	while (packets < pkts) {
+		if (!(flags & NIX_RX_VWQE_F)) {
+			/* Exit loop if head is about to wrap and become
+			 * unaligned.
+			 */
+			if (((head + NIX_DESCS_PER_LOOP - 1) & qmask) < NIX_DESCS_PER_LOOP) {
+				pkts_left += (pkts - packets);
+				break;
+			}
+
+			cq0 = desc + CQE_SZ(head);
+		} else {
+			cq0 = (uintptr_t)&mbufs[packets];
+		}
+
+		if (flags & NIX_RX_VWQE_F) {
+			if (pkts - packets > 4) {
+				rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 4, 0, flags));
+				rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 5, 0, flags));
+				rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 6, 0, flags));
+				rte_prefetch_non_temporal(CQE_PTR_OFF(cq0, 7, 0, flags));
+
+				if (likely(pkts - packets > 8)) {
+					rte_prefetch1(CQE_PTR_OFF(cq0, 8, 0, flags));
+					rte_prefetch1(CQE_PTR_OFF(cq0, 9, 0, flags));
+					rte_prefetch1(CQE_PTR_OFF(cq0, 10, 0, flags));
+					rte_prefetch1(CQE_PTR_OFF(cq0, 11, 0, flags));
+					if (pkts - packets > 12) {
+						rte_prefetch1(CQE_PTR_OFF(cq0, 12, 0, flags));
+						rte_prefetch1(CQE_PTR_OFF(cq0, 13, 0, flags));
+						rte_prefetch1(CQE_PTR_OFF(cq0, 14, 0, flags));
+						rte_prefetch1(CQE_PTR_OFF(cq0, 15, 0, flags));
+					}
+				}
+
+				rte_prefetch0(CQE_PTR_DIFF(cq0, 4, RTE_PKTMBUF_HEADROOM, flags));
+				rte_prefetch0(CQE_PTR_DIFF(cq0, 5, RTE_PKTMBUF_HEADROOM, flags));
+				rte_prefetch0(CQE_PTR_DIFF(cq0, 6, RTE_PKTMBUF_HEADROOM, flags));
+				rte_prefetch0(CQE_PTR_DIFF(cq0, 7, RTE_PKTMBUF_HEADROOM, flags));
+
+				if (likely(pkts - packets > 8)) {
+					rte_prefetch0(
+						CQE_PTR_DIFF(cq0, 8, RTE_PKTMBUF_HEADROOM, flags));
+					rte_prefetch0(
+						CQE_PTR_DIFF(cq0, 9, RTE_PKTMBUF_HEADROOM, flags));
+					rte_prefetch0(
+						CQE_PTR_DIFF(cq0, 10, RTE_PKTMBUF_HEADROOM, flags));
+					rte_prefetch0(
+						CQE_PTR_DIFF(cq0, 11, RTE_PKTMBUF_HEADROOM, flags));
+				}
+			}
+		} else {
+			if (pkts - packets > 8) {
+				if (flags) {
+					rte_prefetch0(CQE_PTR_OFF(cq0, 8, 0, flags));
+					rte_prefetch0(CQE_PTR_OFF(cq0, 9, 0, flags));
+					rte_prefetch0(CQE_PTR_OFF(cq0, 10, 0, flags));
+					rte_prefetch0(CQE_PTR_OFF(cq0, 11, 0, flags));
+				}
+				rte_prefetch0(CQE_PTR_OFF(cq0, 8, 64, flags));
+				rte_prefetch0(CQE_PTR_OFF(cq0, 9, 64, flags));
+				rte_prefetch0(CQE_PTR_OFF(cq0, 10, 64, flags));
+				rte_prefetch0(CQE_PTR_OFF(cq0, 11, 64, flags));
+			}
+		}
+
+		if (!(flags & NIX_RX_VWQE_F)) {
+			/* Get NIX_RX_SG_S for size and buffer pointer */
+			cq0_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 0, 64, flags));
+			cq1_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 1, 64, flags));
+			cq2_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 2, 64, flags));
+			cq3_w8 = vld1q_u64(CQE_PTR_OFF(cq0, 3, 64, flags));
+
+			/* Extract mbuf from NIX_RX_SG_S */
+			mbuf01 = vzip2q_u64(cq0_w8, cq1_w8);
+			mbuf23 = vzip2q_u64(cq2_w8, cq3_w8);
+			mbuf01 = vqsubq_u64(mbuf01, data_off);
+			mbuf23 = vqsubq_u64(mbuf23, data_off);
+		} else {
+			mbuf01 = vsubq_u64(vld1q_u64((uint64_t *)cq0),
+					   vdupq_n_u64(sizeof(struct rte_mbuf)));
+			mbuf23 = vsubq_u64(vld1q_u64((uint64_t *)(cq0 + 16)),
+					   vdupq_n_u64(sizeof(struct rte_mbuf)));
+		}
+
+		/* Move mbufs to scalar registers for future use */
+		mbuf0 = (struct rte_mbuf *)vgetq_lane_u64(mbuf01, 0);
+		mbuf1 = (struct rte_mbuf *)vgetq_lane_u64(mbuf01, 1);
+		mbuf2 = (struct rte_mbuf *)vgetq_lane_u64(mbuf23, 0);
+		mbuf3 = (struct rte_mbuf *)vgetq_lane_u64(mbuf23, 1);
+
+		/* Mark mempool obj as "get" as it is alloc'ed by NIX */
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf0->pool, (void **)&mbuf0, 1, 1);
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf1->pool, (void **)&mbuf1, 1, 1);
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf2->pool, (void **)&mbuf2, 1, 1);
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf3->pool, (void **)&mbuf3, 1, 1);
+
+		if (!(flags & NIX_RX_VWQE_F)) {
+			/* Mask to get packet len from NIX_RX_SG_S */
+			const uint8x16_t shuf_msk = {
+				0xFF, 0xFF, /* pkt_type set as unknown */
+				0xFF, 0xFF, /* pkt_type set as unknown */
+				0,    1,    /* octet 1~0, low 16 bits pkt_len */
+				0xFF, 0xFF, /* skip high 16it pkt_len, zero out */
+				0,    1,    /* octet 1~0, 16 bits data_len */
+				0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF};
+
+			/* Form the rx_descriptor_fields1 with pkt_len and data_len */
+			f0 = vqtbl1q_u8(cq0_w8, shuf_msk);
+			f1 = vqtbl1q_u8(cq1_w8, shuf_msk);
+			f2 = vqtbl1q_u8(cq2_w8, shuf_msk);
+			f3 = vqtbl1q_u8(cq3_w8, shuf_msk);
+		}
+
+		/* Load CQE word0 and word 1 */
+		const uint64_t cq0_w0 = *CQE_PTR_OFF(cq0, 0, 0, flags);
+		const uint64_t cq0_w1 = *CQE_PTR_OFF(cq0, 0, 8, flags);
+		const uint64_t cq0_w2 = *CQE_PTR_OFF(cq0, 0, 16, flags);
+		const uint64_t cq1_w0 = *CQE_PTR_OFF(cq0, 1, 0, flags);
+		const uint64_t cq1_w1 = *CQE_PTR_OFF(cq0, 1, 8, flags);
+		const uint64_t cq1_w2 = *CQE_PTR_OFF(cq0, 1, 16, flags);
+		const uint64_t cq2_w0 = *CQE_PTR_OFF(cq0, 2, 0, flags);
+		const uint64_t cq2_w1 = *CQE_PTR_OFF(cq0, 2, 8, flags);
+		const uint64_t cq2_w2 = *CQE_PTR_OFF(cq0, 2, 16, flags);
+		const uint64_t cq3_w0 = *CQE_PTR_OFF(cq0, 3, 0, flags);
+		const uint64_t cq3_w1 = *CQE_PTR_OFF(cq0, 3, 8, flags);
+		const uint64_t cq3_w2 = *CQE_PTR_OFF(cq0, 3, 16, flags);
+
+		if (flags & NIX_RX_VWQE_F) {
+			uint16_t psize0, psize1, psize2, psize3;
+
+			psize0 = (cq0_w2 & 0xFFFF) + 1;
+			psize1 = (cq1_w2 & 0xFFFF) + 1;
+			psize2 = (cq2_w2 & 0xFFFF) + 1;
+			psize3 = (cq3_w2 & 0xFFFF) + 1;
+
+			f0 = vdupq_n_u64(0);
+			f1 = vdupq_n_u64(0);
+			f2 = vdupq_n_u64(0);
+			f3 = vdupq_n_u64(0);
+
+			f0 = vsetq_lane_u16(psize0, f0, 2);
+			f0 = vsetq_lane_u16(psize0, f0, 4);
+
+			f1 = vsetq_lane_u16(psize1, f1, 2);
+			f1 = vsetq_lane_u16(psize1, f1, 4);
+
+			f2 = vsetq_lane_u16(psize2, f2, 2);
+			f2 = vsetq_lane_u16(psize2, f2, 4);
+
+			f3 = vsetq_lane_u16(psize3, f3, 2);
+			f3 = vsetq_lane_u16(psize3, f3, 4);
+		}
+
+		if (flags & NIX_RX_OFFLOAD_RSS_F) {
+			/* Fill rss in the rx_descriptor_fields1 */
+			f0 = vsetq_lane_u32(cq0_w0, f0, 3);
+			f1 = vsetq_lane_u32(cq1_w0, f1, 3);
+			f2 = vsetq_lane_u32(cq2_w0, f2, 3);
+			f3 = vsetq_lane_u32(cq3_w0, f3, 3);
+			ol_flags0 = RTE_MBUF_F_RX_RSS_HASH;
+			ol_flags1 = RTE_MBUF_F_RX_RSS_HASH;
+			ol_flags2 = RTE_MBUF_F_RX_RSS_HASH;
+			ol_flags3 = RTE_MBUF_F_RX_RSS_HASH;
+		} else {
+			ol_flags0 = 0;
+			ol_flags1 = 0;
+			ol_flags2 = 0;
+			ol_flags3 = 0;
+		}
+
+		if (flags & NIX_RX_OFFLOAD_PTYPE_F) {
+			/* Fill packet_type in the rx_descriptor_fields1 */
+			f0 = vsetq_lane_u32(nix_ptype_get(lookup_mem, cq0_w1), f0, 0);
+			f1 = vsetq_lane_u32(nix_ptype_get(lookup_mem, cq1_w1), f1, 0);
+			f2 = vsetq_lane_u32(nix_ptype_get(lookup_mem, cq2_w1), f2, 0);
+			f3 = vsetq_lane_u32(nix_ptype_get(lookup_mem, cq3_w1), f3, 0);
+		}
+
+		if (flags & NIX_RX_OFFLOAD_CHECKSUM_F) {
+			ol_flags0 |= (uint64_t)nix_rx_olflags_get(lookup_mem, cq0_w1);
+			ol_flags1 |= (uint64_t)nix_rx_olflags_get(lookup_mem, cq1_w1);
+			ol_flags2 |= (uint64_t)nix_rx_olflags_get(lookup_mem, cq2_w1);
+			ol_flags3 |= (uint64_t)nix_rx_olflags_get(lookup_mem, cq3_w1);
+		}
+
+		if (flags & NIX_RX_OFFLOAD_VLAN_STRIP_F) {
+			ol_flags0 = nix_vlan_update(cq0_w2, ol_flags0, &f0);
+			ol_flags1 = nix_vlan_update(cq1_w2, ol_flags1, &f1);
+			ol_flags2 = nix_vlan_update(cq2_w2, ol_flags2, &f2);
+			ol_flags3 = nix_vlan_update(cq3_w2, ol_flags3, &f3);
+
+			ol_flags0 = nix_qinq_update(cq0_w2, ol_flags0, mbuf0);
+			ol_flags1 = nix_qinq_update(cq1_w2, ol_flags1, mbuf1);
+			ol_flags2 = nix_qinq_update(cq2_w2, ol_flags2, mbuf2);
+			ol_flags3 = nix_qinq_update(cq3_w2, ol_flags3, mbuf3);
+		}
+
+		if (flags & NIX_RX_OFFLOAD_MARK_UPDATE_F) {
+			ol_flags0 = nix_update_match_id(*(uint16_t *)CQE_PTR_OFF(cq0, 0, 38, flags),
+							ol_flags0, mbuf0);
+			ol_flags1 = nix_update_match_id(*(uint16_t *)CQE_PTR_OFF(cq0, 1, 38, flags),
+							ol_flags1, mbuf1);
+			ol_flags2 = nix_update_match_id(*(uint16_t *)CQE_PTR_OFF(cq0, 2, 38, flags),
+							ol_flags2, mbuf2);
+			ol_flags3 = nix_update_match_id(*(uint16_t *)CQE_PTR_OFF(cq0, 3, 38, flags),
+							ol_flags3, mbuf3);
+		}
+
+		if ((flags & NIX_RX_OFFLOAD_TSTAMP_F) && ((flags & NIX_RX_VWQE_F) && tstamp)) {
+			const uint16x8_t len_off = {0,				 /* ptype   0:15 */
+						    0,				 /* ptype  16:32 */
+						    CNXK_NIX_TIMESYNC_RX_OFFSET, /* pktlen  0:15*/
+						    0,				 /* pktlen 16:32 */
+						    CNXK_NIX_TIMESYNC_RX_OFFSET, /* datalen 0:15 */
+						    0,
+						    0,
+						    0};
+			const uint32x4_t ptype = {
+				RTE_PTYPE_L2_ETHER_TIMESYNC, RTE_PTYPE_L2_ETHER_TIMESYNC,
+				RTE_PTYPE_L2_ETHER_TIMESYNC, RTE_PTYPE_L2_ETHER_TIMESYNC};
+			const uint64_t ts_olf = RTE_MBUF_F_RX_IEEE1588_PTP |
+						RTE_MBUF_F_RX_IEEE1588_TMST |
+						tstamp->rx_tstamp_dynflag;
+			const uint32x4_t and_mask = {0x1, 0x2, 0x4, 0x8};
+			uint64x2_t ts01, ts23, mask;
+			uint64_t ts[4];
+			uint8_t res;
+
+			/* Subtract timesync length from total pkt length. */
+			f0 = vsubq_u16(f0, len_off);
+			f1 = vsubq_u16(f1, len_off);
+			f2 = vsubq_u16(f2, len_off);
+			f3 = vsubq_u16(f3, len_off);
+
+			/* Get the address of actual timestamp. */
+			ts01 = vaddq_u64(mbuf01, data_off);
+			ts23 = vaddq_u64(mbuf23, data_off);
+			/* Load timestamp from address. */
+			ts01 = vsetq_lane_u64(*(uint64_t *)vgetq_lane_u64(ts01, 0), ts01, 0);
+			ts01 = vsetq_lane_u64(*(uint64_t *)vgetq_lane_u64(ts01, 1), ts01, 1);
+			ts23 = vsetq_lane_u64(*(uint64_t *)vgetq_lane_u64(ts23, 0), ts23, 0);
+			ts23 = vsetq_lane_u64(*(uint64_t *)vgetq_lane_u64(ts23, 1), ts23, 1);
+			/* Convert from be to cpu byteorder. */
+			ts01 = vrev64q_u8(ts01);
+			ts23 = vrev64q_u8(ts23);
+			/* Store timestamp into scalar for later use. */
+			ts[0] = vgetq_lane_u64(ts01, 0);
+			ts[1] = vgetq_lane_u64(ts01, 1);
+			ts[2] = vgetq_lane_u64(ts23, 0);
+			ts[3] = vgetq_lane_u64(ts23, 1);
+
+			/* Store timestamp into dynfield. */
+			*cnxk_nix_timestamp_dynfield(mbuf0, tstamp) = ts[0];
+			*cnxk_nix_timestamp_dynfield(mbuf1, tstamp) = ts[1];
+			*cnxk_nix_timestamp_dynfield(mbuf2, tstamp) = ts[2];
+			*cnxk_nix_timestamp_dynfield(mbuf3, tstamp) = ts[3];
+
+			/* Generate ptype mask to filter L2 ether timesync */
+			mask = vdupq_n_u32(vgetq_lane_u32(f0, 0));
+			mask = vsetq_lane_u32(vgetq_lane_u32(f1, 0), mask, 1);
+			mask = vsetq_lane_u32(vgetq_lane_u32(f2, 0), mask, 2);
+			mask = vsetq_lane_u32(vgetq_lane_u32(f3, 0), mask, 3);
+
+			/* Match against L2 ether timesync. */
+			mask = vceqq_u32(mask, ptype);
+			/* Convert from vector from scalar mask */
+			res = vaddvq_u32(vandq_u32(mask, and_mask));
+			res &= 0xF;
+
+			if (res) {
+				/* Fill in the ol_flags for any packets that
+				 * matched.
+				 */
+				ol_flags0 |= ((res & 0x1) ? ts_olf : 0);
+				ol_flags1 |= ((res & 0x2) ? ts_olf : 0);
+				ol_flags2 |= ((res & 0x4) ? ts_olf : 0);
+				ol_flags3 |= ((res & 0x8) ? ts_olf : 0);
+
+				/* Update Rxq timestamp with the latest
+				 * timestamp.
+				 */
+				tstamp->rx_ready = 1;
+				tstamp->rx_tstamp = ts[31 - rte_clz32(res)];
+			}
+		}
+
+		/* Form rearm_data with ol_flags */
+		rearm0 = vsetq_lane_u64(ol_flags0, rearm0, 1);
+		rearm1 = vsetq_lane_u64(ol_flags1, rearm1, 1);
+		rearm2 = vsetq_lane_u64(ol_flags2, rearm2, 1);
+		rearm3 = vsetq_lane_u64(ol_flags3, rearm3, 1);
+
+		/* Update rx_descriptor_fields1 */
+		vst1q_u64((uint64_t *)mbuf0->rx_descriptor_fields1, f0);
+		vst1q_u64((uint64_t *)mbuf1->rx_descriptor_fields1, f1);
+		vst1q_u64((uint64_t *)mbuf2->rx_descriptor_fields1, f2);
+		vst1q_u64((uint64_t *)mbuf3->rx_descriptor_fields1, f3);
+
+		/* Update rearm_data */
+		vst1q_u64((uint64_t *)mbuf0->rearm_data, rearm0);
+		vst1q_u64((uint64_t *)mbuf1->rearm_data, rearm1);
+		vst1q_u64((uint64_t *)mbuf2->rearm_data, rearm2);
+		vst1q_u64((uint64_t *)mbuf3->rearm_data, rearm3);
+
+		if (flags & NIX_RX_MULTI_SEG_F) {
+			/* Multi segment is enable build mseg list for
+			 * individual mbufs in scalar mode.
+			 */
+			nix_cqe_xtract_mseg((union nix_rx_parse_u *)(CQE_PTR_OFF(cq0, 0, 8, flags)),
+					    mbuf0, mbuf_initializer, cpth0, sa_base, flags);
+			nix_cqe_xtract_mseg((union nix_rx_parse_u *)(CQE_PTR_OFF(cq0, 1, 8, flags)),
+					    mbuf1, mbuf_initializer, cpth1, sa_base, flags);
+			nix_cqe_xtract_mseg((union nix_rx_parse_u *)(CQE_PTR_OFF(cq0, 2, 8, flags)),
+					    mbuf2, mbuf_initializer, cpth2, sa_base, flags);
+			nix_cqe_xtract_mseg((union nix_rx_parse_u *)(CQE_PTR_OFF(cq0, 3, 8, flags)),
+					    mbuf3, mbuf_initializer, cpth3, sa_base, flags);
+		}
+
+		/* Store the mbufs to rx_pkts */
+		vst1q_u64((uint64_t *)&mbufs[packets], mbuf01);
+		vst1q_u64((uint64_t *)&mbufs[packets + 2], mbuf23);
+
+		nix_mbuf_validate_next(mbuf0);
+		nix_mbuf_validate_next(mbuf1);
+		nix_mbuf_validate_next(mbuf2);
+		nix_mbuf_validate_next(mbuf3);
+
+		packets += NIX_DESCS_PER_LOOP;
+
+		if (!(flags & NIX_RX_VWQE_F)) {
+			/* Advance head pointer and packets */
+			head += NIX_DESCS_PER_LOOP;
+			head &= qmask;
+		}
+	}
+
+	if (flags & NIX_RX_VWQE_F)
+		return packets;
+
+	rxq->head = head;
+	rxq->available -= packets;
+
+	rte_io_wmb();
+	/* Free all the CQs that we've processed */
+	plt_write64((rxq->wdata | packets), rxq->cq_door);
+
+	if (unlikely(pkts_left))
+		packets += cn20k_nix_recv_pkts(args, &mbufs[packets], pkts_left, flags);
+
+	return packets;
+}
+
+#else
+
+static inline uint16_t
+cn20k_nix_recv_pkts_vector(void *args, struct rte_mbuf **mbufs, uint16_t pkts, const uint16_t flags,
+			   void *lookup_mem, struct cnxk_timesync_info *tstamp, uintptr_t lmt_base,
+			   uint64_t meta_aura)
+{
+	RTE_SET_USED(args);
+	RTE_SET_USED(mbufs);
+	RTE_SET_USED(pkts);
+	RTE_SET_USED(flags);
+	RTE_SET_USED(lookup_mem);
+	RTE_SET_USED(tstamp);
+	RTE_SET_USED(lmt_base);
+	RTE_SET_USED(meta_aura);
+
+	return 0;
+}
+
+#endif
+
 #define RSS_F	  NIX_RX_OFFLOAD_RSS_F
 #define PTYPE_F	  NIX_RX_OFFLOAD_PTYPE_F
 #define CKSUM_F	  NIX_RX_OFFLOAD_CHECKSUM_F
@@ -618,10 +1075,8 @@ NIX_RX_FASTPATH_MODES
 	uint16_t __rte_noinline __rte_hot fn(void *rx_queue, struct rte_mbuf **rx_pkts,            \
 					     uint16_t pkts)                                        \
 	{                                                                                          \
-		RTE_SET_USED(rx_queue);                                                            \
-		RTE_SET_USED(rx_pkts);                                                             \
-		RTE_SET_USED(pkts);                                                                \
-		return 0;                                                                          \
+		return cn20k_nix_recv_pkts_vector(rx_queue, rx_pkts, pkts, (flags), NULL, NULL, 0, \
+						  0);                                              \
 	}
 
 #define NIX_RX_RECV_VEC_MSEG(fn, flags) NIX_RX_RECV_VEC(fn, flags | NIX_RX_MULTI_SEG_F)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v3 15/18] net/cnxk: support Tx burst scalar for cn20k
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
                     ` (13 preceding siblings ...)
  2024-10-01 12:40   ` [PATCH v3 14/18] net/cnxk: support Rx burst vector " Nithin Dabilpuram
@ 2024-10-01 12:40   ` Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 16/18] net/cnxk: support Tx multi-seg in cn20k Nithin Dabilpuram
                     ` (3 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-10-01 12:40 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rahul Bhansali, Pavan Nikhilesh

Add scalar Tx support for cn20k.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 drivers/net/cnxk/cn20k_ethdev.c | 127 ++++
 drivers/net/cnxk/cn20k_tx.h     | 986 +++++++++++++++++++++++++++++++-
 2 files changed, 1109 insertions(+), 4 deletions(-)

diff --git a/drivers/net/cnxk/cn20k_ethdev.c b/drivers/net/cnxk/cn20k_ethdev.c
index ac2b0e1b50..e182f7a40a 100644
--- a/drivers/net/cnxk/cn20k_ethdev.c
+++ b/drivers/net/cnxk/cn20k_ethdev.c
@@ -361,6 +361,10 @@ static int
 cn20k_nix_tx_queue_stop(struct rte_eth_dev *eth_dev, uint16_t qidx)
 {
 	struct cn20k_eth_txq *txq = eth_dev->data->tx_queues[qidx];
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	uint16_t flags = dev->tx_offload_flags;
+	struct roc_nix *nix = &dev->nix;
+	uint32_t head = 0, tail = 0;
 	int rc;
 
 	rc = cnxk_nix_tx_queue_stop(eth_dev, qidx);
@@ -370,6 +374,20 @@ cn20k_nix_tx_queue_stop(struct rte_eth_dev *eth_dev, uint16_t qidx)
 	/* Clear fc cache pkts to trigger worker stop */
 	txq->fc_cache_pkts = 0;
 
+	if ((flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) && txq->tx_compl.ena) {
+		struct roc_nix_sq *sq = &dev->sqs[qidx];
+		do {
+			handle_tx_completion_pkts(txq, flags & NIX_TX_VWQE_F);
+			/* Check if SQ is empty */
+			roc_nix_sq_head_tail_get(nix, sq->qid, &head, &tail);
+			if (head != tail)
+				continue;
+
+			/* Check if completion CQ is empty */
+			roc_nix_cq_head_tail_get(nix, sq->cqid, &head, &tail);
+		} while (head != tail);
+	}
+
 	return 0;
 }
 
@@ -690,6 +708,112 @@ cn20k_rx_descriptor_dump(const struct rte_eth_dev *eth_dev, uint16_t qid, uint16
 	return 0;
 }
 
+static int
+cn20k_nix_tm_mark_vlan_dei(struct rte_eth_dev *eth_dev, int mark_green, int mark_yellow,
+			   int mark_red, struct rte_tm_error *error)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct roc_nix *roc_nix = &dev->nix;
+	uint64_t mark_fmt, mark_flag;
+	int rc, i;
+
+	rc = cnxk_nix_tm_mark_vlan_dei(eth_dev, mark_green, mark_yellow, mark_red, error);
+
+	if (rc)
+		goto exit;
+
+	mark_fmt = roc_nix_tm_mark_format_get(roc_nix, &mark_flag);
+	if (mark_flag) {
+		dev->tx_offload_flags |= NIX_TX_OFFLOAD_VLAN_QINQ_F;
+		dev->tx_mark = true;
+	} else {
+		dev->tx_mark = false;
+		if (!(dev->tx_offloads & RTE_ETH_TX_OFFLOAD_VLAN_INSERT ||
+		      dev->tx_offloads & RTE_ETH_TX_OFFLOAD_QINQ_INSERT))
+			dev->tx_offload_flags &= ~NIX_TX_OFFLOAD_VLAN_QINQ_F;
+	}
+
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		struct cn20k_eth_txq *txq = eth_dev->data->tx_queues[i];
+
+		txq->mark_flag = mark_flag & CNXK_TM_MARK_MASK;
+		txq->mark_fmt = mark_fmt & CNXK_TX_MARK_FMT_MASK;
+	}
+	cn20k_eth_set_tx_function(eth_dev);
+exit:
+	return rc;
+}
+
+static int
+cn20k_nix_tm_mark_ip_ecn(struct rte_eth_dev *eth_dev, int mark_green, int mark_yellow, int mark_red,
+			 struct rte_tm_error *error)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct roc_nix *roc_nix = &dev->nix;
+	uint64_t mark_fmt, mark_flag;
+	int rc, i;
+
+	rc = cnxk_nix_tm_mark_ip_ecn(eth_dev, mark_green, mark_yellow, mark_red, error);
+	if (rc)
+		goto exit;
+
+	mark_fmt = roc_nix_tm_mark_format_get(roc_nix, &mark_flag);
+	if (mark_flag) {
+		dev->tx_offload_flags |= NIX_TX_OFFLOAD_VLAN_QINQ_F;
+		dev->tx_mark = true;
+	} else {
+		dev->tx_mark = false;
+		if (!(dev->tx_offloads & RTE_ETH_TX_OFFLOAD_VLAN_INSERT ||
+		      dev->tx_offloads & RTE_ETH_TX_OFFLOAD_QINQ_INSERT))
+			dev->tx_offload_flags &= ~NIX_TX_OFFLOAD_VLAN_QINQ_F;
+	}
+
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		struct cn20k_eth_txq *txq = eth_dev->data->tx_queues[i];
+
+		txq->mark_flag = mark_flag & CNXK_TM_MARK_MASK;
+		txq->mark_fmt = mark_fmt & CNXK_TX_MARK_FMT_MASK;
+	}
+	cn20k_eth_set_tx_function(eth_dev);
+exit:
+	return rc;
+}
+
+static int
+cn20k_nix_tm_mark_ip_dscp(struct rte_eth_dev *eth_dev, int mark_green, int mark_yellow,
+			  int mark_red, struct rte_tm_error *error)
+{
+	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
+	struct roc_nix *roc_nix = &dev->nix;
+	uint64_t mark_fmt, mark_flag;
+	int rc, i;
+
+	rc = cnxk_nix_tm_mark_ip_dscp(eth_dev, mark_green, mark_yellow, mark_red, error);
+	if (rc)
+		goto exit;
+
+	mark_fmt = roc_nix_tm_mark_format_get(roc_nix, &mark_flag);
+	if (mark_flag) {
+		dev->tx_offload_flags |= NIX_TX_OFFLOAD_VLAN_QINQ_F;
+		dev->tx_mark = true;
+	} else {
+		dev->tx_mark = false;
+		if (!(dev->tx_offloads & RTE_ETH_TX_OFFLOAD_VLAN_INSERT ||
+		      dev->tx_offloads & RTE_ETH_TX_OFFLOAD_QINQ_INSERT))
+			dev->tx_offload_flags &= ~NIX_TX_OFFLOAD_VLAN_QINQ_F;
+	}
+
+	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
+		struct cn20k_eth_txq *txq = eth_dev->data->tx_queues[i];
+
+		txq->mark_flag = mark_flag & CNXK_TM_MARK_MASK;
+		txq->mark_fmt = mark_fmt & CNXK_TX_MARK_FMT_MASK;
+	}
+	cn20k_eth_set_tx_function(eth_dev);
+exit:
+	return rc;
+}
+
 /* Update platform specific eth dev ops */
 static void
 nix_eth_dev_ops_override(void)
@@ -728,6 +852,9 @@ nix_tm_ops_override(void)
 	init_once = 1;
 
 	/* Update platform specific ops */
+	cnxk_tm_ops.mark_vlan_dei = cn20k_nix_tm_mark_vlan_dei;
+	cnxk_tm_ops.mark_ip_ecn = cn20k_nix_tm_mark_ip_ecn;
+	cnxk_tm_ops.mark_ip_dscp = cn20k_nix_tm_mark_ip_dscp;
 }
 
 static void
diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
index 9fd925ac34..dda745abf4 100644
--- a/drivers/net/cnxk/cn20k_tx.h
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -32,6 +32,983 @@
 #define NIX_TX_NEED_EXT_HDR                                                                        \
 	(NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSTAMP_F | NIX_TX_OFFLOAD_TSO_F)
 
+#define NIX_XMIT_FC_OR_RETURN(txq, pkts)                                                           \
+	do {                                                                                       \
+		int64_t avail;                                                                     \
+		/* Cached value is low, Update the fc_cache_pkts */                                \
+		if (unlikely((txq)->fc_cache_pkts < (pkts))) {                                     \
+			avail = txq->nb_sqb_bufs_adj - *txq->fc_mem;                               \
+			/* Multiply with sqe_per_sqb to express in pkts */                         \
+			(txq)->fc_cache_pkts = (avail << (txq)->sqes_per_sqb_log2) - avail;        \
+			/* Check it again for the room */                                          \
+			if (unlikely((txq)->fc_cache_pkts < (pkts)))                               \
+				return 0;                                                          \
+		}                                                                                  \
+	} while (0)
+
+#define NIX_XMIT_FC_OR_RETURN_MTS(txq, pkts)                                                       \
+	do {                                                                                       \
+		int64_t *fc_cache = &(txq)->fc_cache_pkts;                                         \
+		uint8_t retry_count = 8;                                                           \
+		int64_t val, newval;                                                               \
+	retry:                                                                                     \
+		/* Reduce the cached count */                                                      \
+		val = (int64_t)__atomic_fetch_sub(fc_cache, pkts, __ATOMIC_RELAXED);               \
+		val -= pkts;                                                                       \
+		/* Cached value is low, Update the fc_cache_pkts */                                \
+		if (unlikely(val < 0)) {                                                           \
+			/* Multiply with sqe_per_sqb to express in pkts */                         \
+			newval = txq->nb_sqb_bufs_adj -                                            \
+				 __atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED);                   \
+			newval = (newval << (txq)->sqes_per_sqb_log2) - newval;                    \
+			newval -= pkts;                                                            \
+			if (!__atomic_compare_exchange_n(fc_cache, &val, newval, false,            \
+							 __ATOMIC_RELAXED, __ATOMIC_RELAXED)) {    \
+				if (retry_count) {                                                 \
+					retry_count--;                                             \
+					goto retry;                                                \
+				} else                                                             \
+					return 0;                                                  \
+			}                                                                          \
+			/* Update and check it again for the room */                               \
+			if (unlikely(newval < 0))                                                  \
+				return 0;                                                          \
+		}                                                                                  \
+	} while (0)
+
+#define NIX_XMIT_FC_CHECK_RETURN(txq, pkts)                                                        \
+	do {                                                                                       \
+		if (unlikely((txq)->flag))                                                         \
+			NIX_XMIT_FC_OR_RETURN_MTS(txq, pkts);                                      \
+		else {                                                                             \
+			NIX_XMIT_FC_OR_RETURN(txq, pkts);                                          \
+			/* Reduce the cached count */                                              \
+			txq->fc_cache_pkts -= pkts;                                                \
+		}                                                                                  \
+	} while (0)
+
+/* Encoded number of segments to number of dwords macro, each value of nb_segs
+ * is encoded as 4bits.
+ */
+#define NIX_SEGDW_MAGIC 0x76654432210ULL
+
+#define NIX_NB_SEGS_TO_SEGDW(x) ((NIX_SEGDW_MAGIC >> ((x) << 2)) & 0xF)
+
+static __plt_always_inline uint8_t
+cn20k_nix_mbuf_sg_dwords(struct rte_mbuf *m)
+{
+	uint32_t nb_segs = m->nb_segs;
+	uint16_t aura0, aura;
+	int segw, sg_segs;
+
+	aura0 = roc_npa_aura_handle_to_aura(m->pool->pool_id);
+
+	nb_segs--;
+	segw = 2;
+	sg_segs = 1;
+	while (nb_segs) {
+		m = m->next;
+		aura = roc_npa_aura_handle_to_aura(m->pool->pool_id);
+		if (aura != aura0) {
+			segw += 2 + (sg_segs == 2);
+			sg_segs = 0;
+		} else {
+			segw += (sg_segs == 0); /* SUBDC */
+			segw += 1;		/* IOVA */
+			sg_segs += 1;
+			sg_segs %= 3;
+		}
+		nb_segs--;
+	}
+
+	return (segw + 1) / 2;
+}
+
+static __plt_always_inline void
+cn20k_nix_tx_mbuf_validate(struct rte_mbuf *m, const uint32_t flags)
+{
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+	uint16_t segdw;
+
+	segdw = cn20k_nix_mbuf_sg_dwords(m);
+	segdw += 1 + !!(flags & NIX_TX_NEED_EXT_HDR) + !!(flags & NIX_TX_OFFLOAD_TSTAMP_F);
+
+	PLT_ASSERT(segdw <= 8);
+#else
+	RTE_SET_USED(m);
+	RTE_SET_USED(flags);
+#endif
+}
+
+static __plt_always_inline void
+cn20k_nix_vwqe_wait_fc(struct cn20k_eth_txq *txq, uint16_t req)
+{
+	int64_t cached, refill;
+	int64_t pkts;
+
+retry:
+#ifdef RTE_ARCH_ARM64
+
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldxr %[pkts], [%[addr]]			\n"
+		     "		tbz %[pkts], 63, .Ldne%=		\n"
+		     "		sevl					\n"
+		     ".Lrty%=:	wfe					\n"
+		     "		ldxr %[pkts], [%[addr]]			\n"
+		     "		tbnz %[pkts], 63, .Lrty%=		\n"
+		     ".Ldne%=:						\n"
+		     : [pkts] "=&r"(pkts)
+		     : [addr] "r"(&txq->fc_cache_pkts)
+		     : "memory");
+#else
+	RTE_SET_USED(pkts);
+	while (__atomic_load_n(&txq->fc_cache_pkts, __ATOMIC_RELAXED) < 0)
+		;
+#endif
+	cached = __atomic_fetch_sub(&txq->fc_cache_pkts, req, __ATOMIC_ACQUIRE) - req;
+	/* Check if there is enough space, else update and retry. */
+	if (cached >= 0)
+		return;
+
+	/* Check if we have space else retry. */
+#ifdef RTE_ARCH_ARM64
+	int64_t val;
+
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldxr %[val], [%[addr]]			\n"
+		     "		sub %[val], %[adj], %[val]		\n"
+		     "		lsl %[refill], %[val], %[shft]		\n"
+		     "		sub %[refill], %[refill], %[val]	\n"
+		     "		sub %[refill], %[refill], %[sub]	\n"
+		     "		cmp %[refill], #0x0			\n"
+		     "		b.ge .Ldne%=				\n"
+		     "		sevl					\n"
+		     ".Lrty%=:	wfe					\n"
+		     "		ldxr %[val], [%[addr]]			\n"
+		     "		sub %[val], %[adj], %[val]		\n"
+		     "		lsl %[refill], %[val], %[shft]		\n"
+		     "		sub %[refill], %[refill], %[val]	\n"
+		     "		sub %[refill], %[refill], %[sub]	\n"
+		     "		cmp %[refill], #0x0			\n"
+		     "		b.lt .Lrty%=				\n"
+		     ".Ldne%=:						\n"
+		     : [refill] "=&r"(refill), [val] "=&r" (val)
+		     : [addr] "r"(txq->fc_mem), [adj] "r"(txq->nb_sqb_bufs_adj),
+		       [shft] "r"(txq->sqes_per_sqb_log2), [sub] "r"(req)
+		     : "memory");
+#else
+	do {
+		refill = (txq->nb_sqb_bufs_adj - __atomic_load_n(txq->fc_mem, __ATOMIC_RELAXED));
+		refill = (refill << txq->sqes_per_sqb_log2) - refill;
+		refill -= req;
+	} while (refill < 0);
+#endif
+	if (!__atomic_compare_exchange(&txq->fc_cache_pkts, &cached, &refill, 0, __ATOMIC_RELEASE,
+				       __ATOMIC_RELAXED))
+		goto retry;
+}
+
+/* Function to determine no of tx subdesc required in case ext
+ * sub desc is enabled.
+ */
+static __rte_always_inline int
+cn20k_nix_tx_ext_subs(const uint16_t flags)
+{
+	return (flags & NIX_TX_OFFLOAD_TSTAMP_F) ?
+		       2 :
+		       ((flags & (NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSO_F)) ? 1 : 0);
+}
+
+static __rte_always_inline uint64_t
+cn20k_nix_tx_steor_data(const uint16_t flags)
+{
+	const uint64_t dw_m1 = cn20k_nix_tx_ext_subs(flags) + 1;
+	uint64_t data;
+
+	/* This will be moved to addr area */
+	data = dw_m1;
+	/* 15 vector sizes for single seg */
+	data |= dw_m1 << 19;
+	data |= dw_m1 << 22;
+	data |= dw_m1 << 25;
+	data |= dw_m1 << 28;
+	data |= dw_m1 << 31;
+	data |= dw_m1 << 34;
+	data |= dw_m1 << 37;
+	data |= dw_m1 << 40;
+	data |= dw_m1 << 43;
+	data |= dw_m1 << 46;
+	data |= dw_m1 << 49;
+	data |= dw_m1 << 52;
+	data |= dw_m1 << 55;
+	data |= dw_m1 << 58;
+	data |= dw_m1 << 61;
+
+	return data;
+}
+
+static __rte_always_inline void
+cn20k_nix_tx_skeleton(struct cn20k_eth_txq *txq, uint64_t *cmd, const uint16_t flags,
+		      const uint16_t static_sz)
+{
+	if (static_sz)
+		cmd[0] = txq->send_hdr_w0;
+	else
+		cmd[0] = (txq->send_hdr_w0 & 0xFFFFF00000000000) |
+			 ((uint64_t)(cn20k_nix_tx_ext_subs(flags) + 1) << 40);
+	cmd[1] = 0;
+
+	if (flags & NIX_TX_NEED_EXT_HDR) {
+		if (flags & NIX_TX_OFFLOAD_TSTAMP_F)
+			cmd[2] = (NIX_SUBDC_EXT << 60) | BIT_ULL(15);
+		else
+			cmd[2] = NIX_SUBDC_EXT << 60;
+		cmd[3] = 0;
+		if (!(flags & NIX_TX_OFFLOAD_MBUF_NOFF_F))
+			cmd[4] = (NIX_SUBDC_SG << 60) | (NIX_SENDLDTYPE_LDWB << 58) | BIT_ULL(48);
+		else
+			cmd[4] = (NIX_SUBDC_SG << 60) | BIT_ULL(48);
+	} else {
+		if (!(flags & NIX_TX_OFFLOAD_MBUF_NOFF_F))
+			cmd[2] = (NIX_SUBDC_SG << 60) | (NIX_SENDLDTYPE_LDWB << 58) | BIT_ULL(48);
+		else
+			cmd[2] = (NIX_SUBDC_SG << 60) | BIT_ULL(48);
+	}
+}
+
+static __rte_always_inline void
+cn20k_nix_sec_fc_wait(struct cn20k_eth_txq *txq, uint16_t nb_pkts)
+{
+	int32_t nb_desc, val, newval;
+	int32_t *fc_sw;
+	uint64_t *fc;
+
+	/* Check if there is any CPT instruction to submit */
+	if (!nb_pkts)
+		return;
+
+again:
+	fc_sw = txq->cpt_fc_sw;
+#ifdef RTE_ARCH_ARM64
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldxr %w[pkts], [%[addr]]		\n"
+		     "		tbz %w[pkts], 31, .Ldne%=		\n"
+		     "		sevl					\n"
+		     ".Lrty%=:	wfe					\n"
+		     "		ldxr %w[pkts], [%[addr]]		\n"
+		     "		tbnz %w[pkts], 31, .Lrty%=		\n"
+		     ".Ldne%=:						\n"
+		     : [pkts] "=&r"(val)
+		     : [addr] "r"(fc_sw)
+		     : "memory");
+#else
+	/* Wait for primary core to refill FC. */
+	while (__atomic_load_n(fc_sw, __ATOMIC_RELAXED) < 0)
+		;
+#endif
+
+	val = __atomic_fetch_sub(fc_sw, nb_pkts, __ATOMIC_ACQUIRE) - nb_pkts;
+	if (likely(val >= 0))
+		return;
+
+	nb_desc = txq->cpt_desc;
+	fc = txq->cpt_fc;
+#ifdef RTE_ARCH_ARM64
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldxr %[refill], [%[addr]]		\n"
+		     "		sub %[refill], %[desc], %[refill]	\n"
+		     "		sub %[refill], %[refill], %[pkts]	\n"
+		     "		cmp %[refill], #0x0			\n"
+		     "		b.ge .Ldne%=				\n"
+		     "		sevl					\n"
+		     ".Lrty%=:	wfe					\n"
+		     "		ldxr %[refill], [%[addr]]		\n"
+		     "		sub %[refill], %[desc], %[refill]	\n"
+		     "		sub %[refill], %[refill], %[pkts]	\n"
+		     "		cmp %[refill], #0x0			\n"
+		     "		b.lt .Lrty%=				\n"
+		     ".Ldne%=:						\n"
+		     : [refill] "=&r"(newval)
+		     : [addr] "r"(fc), [desc] "r"(nb_desc), [pkts] "r"(nb_pkts)
+		     : "memory");
+#else
+	while (true) {
+		newval = nb_desc - __atomic_load_n(fc, __ATOMIC_RELAXED);
+		newval -= nb_pkts;
+		if (newval >= 0)
+			break;
+	}
+#endif
+
+	if (!__atomic_compare_exchange_n(fc_sw, &val, newval, false, __ATOMIC_RELEASE,
+					 __ATOMIC_RELAXED))
+		goto again;
+}
+
+#if defined(RTE_ARCH_ARM64)
+
+static __rte_always_inline void
+cn20k_nix_prep_sec(struct rte_mbuf *m, uint64_t *cmd, uintptr_t *nixtx_addr, uintptr_t lbase,
+		   uint8_t *lnum, uint8_t *loff, uint8_t *shft, uint64_t sa_base,
+		   const uint16_t flags)
+{
+	struct cn20k_sec_sess_priv sess_priv;
+	uint32_t pkt_len, dlen_adj, rlen;
+	struct nix_send_hdr_s *send_hdr;
+	uint8_t l3l4type, chksum;
+	uint64x2_t cmd01, cmd23;
+	union nix_send_sg_s *sg;
+	uint8_t l2_len, l3_len;
+	uintptr_t dptr, nixtx;
+	uint64_t ucode_cmd[4];
+	uint64_t *laddr, w0;
+	uint16_t tag;
+	uint64_t sa;
+
+	/* Move to our line from base */
+	sess_priv.u64 = *rte_security_dynfield(m);
+	send_hdr = (struct nix_send_hdr_s *)cmd;
+	if (flags & NIX_TX_NEED_EXT_HDR)
+		sg = (union nix_send_sg_s *)&cmd[4];
+	else
+		sg = (union nix_send_sg_s *)&cmd[2];
+
+	if (flags & NIX_TX_NEED_SEND_HDR_W1) {
+		/* Extract l3l4type either from il3il4type or ol3ol4type */
+		if (flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F && flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) {
+			l2_len = (cmd[1] >> 16) & 0xFF;
+			/* L4 ptr from send hdr includes l2 and l3 len */
+			l3_len = ((cmd[1] >> 24) & 0xFF) - l2_len;
+			l3l4type = (cmd[1] >> 40) & 0xFF;
+		} else {
+			l2_len = cmd[1] & 0xFF;
+			/* L4 ptr from send hdr includes l2 and l3 len */
+			l3_len = ((cmd[1] >> 8) & 0xFF) - l2_len;
+			l3l4type = (cmd[1] >> 32) & 0xFF;
+		}
+
+		chksum = (l3l4type & 0x1) << 1 | !!(l3l4type & 0x30);
+		chksum = ~chksum;
+		sess_priv.chksum = sess_priv.chksum & chksum;
+		/* Clear SEND header flags */
+		cmd[1] &= ~(0xFFFFUL << 32);
+	} else {
+		l2_len = m->l2_len;
+		l3_len = m->l3_len;
+	}
+
+	/* Retrieve DPTR */
+	dptr = *(uint64_t *)(sg + 1);
+	pkt_len = send_hdr->w0.total;
+
+	/* Calculate dlen adj */
+	dlen_adj = pkt_len - l2_len;
+	/* Exclude l3 len from roundup for transport mode */
+	dlen_adj -= sess_priv.mode ? 0 : l3_len;
+	rlen = (dlen_adj + sess_priv.roundup_len) + (sess_priv.roundup_byte - 1);
+	rlen &= ~(uint64_t)(sess_priv.roundup_byte - 1);
+	rlen += sess_priv.partial_len;
+	dlen_adj = rlen - dlen_adj;
+
+	/* Update send descriptors. Security is single segment only */
+	send_hdr->w0.total = pkt_len + dlen_adj;
+
+	/* CPT word 5 and word 6 */
+	w0 = 0;
+	ucode_cmd[2] = 0;
+	if (flags & NIX_TX_MULTI_SEG_F && m->nb_segs > 1) {
+		struct rte_mbuf *last = rte_pktmbuf_lastseg(m);
+
+		/* Get area where NIX descriptor needs to be stored */
+		nixtx = rte_pktmbuf_mtod_offset(last, uintptr_t, last->data_len + dlen_adj);
+		nixtx += BIT_ULL(7);
+		nixtx = (nixtx - 1) & ~(BIT_ULL(7) - 1);
+		nixtx += 16;
+
+		dptr = nixtx + ((flags & NIX_TX_NEED_EXT_HDR) ? 32 : 16);
+
+		/* Set l2 length as data offset */
+		w0 = (uint64_t)l2_len << 16;
+		w0 |= cn20k_nix_tx_ext_subs(flags) + NIX_NB_SEGS_TO_SEGDW(m->nb_segs);
+		ucode_cmd[1] = dptr | ((uint64_t)m->nb_segs << 60);
+	} else {
+		/* Get area where NIX descriptor needs to be stored */
+		nixtx = dptr + pkt_len + dlen_adj;
+		nixtx += BIT_ULL(7);
+		nixtx = (nixtx - 1) & ~(BIT_ULL(7) - 1);
+		nixtx += 16;
+
+		w0 |= cn20k_nix_tx_ext_subs(flags) + 1ULL;
+		dptr += l2_len;
+		ucode_cmd[1] = dptr;
+		sg->seg1_size = pkt_len + dlen_adj;
+		pkt_len -= l2_len;
+	}
+	w0 |= ((((int64_t)nixtx - (int64_t)dptr) & 0xFFFFF) << 32);
+	/* CPT word 0 and 1 */
+	cmd01 = vdupq_n_u64(0);
+	cmd01 = vsetq_lane_u64(w0, cmd01, 0);
+	/* CPT_RES_S is 16B above NIXTX */
+	cmd01 = vsetq_lane_u64(nixtx - 16, cmd01, 1);
+
+	/* Return nixtx addr */
+	*nixtx_addr = nixtx;
+
+	/* CPT Word 4 and Word 7 */
+	tag = sa_base & 0xFFFFUL;
+	sa_base &= ~0xFFFFUL;
+	sa = (uintptr_t)roc_nix_inl_ot_ipsec_outb_sa(sa_base, sess_priv.sa_idx);
+	ucode_cmd[3] = (ROC_CPT_DFLT_ENG_GRP_SE_IE << 61 | 1UL << 60 | sa);
+	ucode_cmd[0] = (ROC_IE_OT_MAJOR_OP_PROCESS_OUTBOUND_IPSEC << 48 | 1UL << 54 |
+			((uint64_t)sess_priv.chksum) << 32 | ((uint64_t)sess_priv.dec_ttl) << 34 |
+			pkt_len);
+
+	/* CPT word 2 and 3 */
+	cmd23 = vdupq_n_u64(0);
+	cmd23 = vsetq_lane_u64((((uint64_t)RTE_EVENT_TYPE_CPU << 28) | tag |
+				CNXK_ETHDEV_SEC_OUTB_EV_SUB << 20), cmd23, 0);
+	cmd23 = vsetq_lane_u64((uintptr_t)m | 1, cmd23, 1);
+
+	/* Move to our line */
+	laddr = LMT_OFF(lbase, *lnum, *loff ? 64 : 0);
+
+	/* Write CPT instruction to lmt line */
+	vst1q_u64(laddr, cmd01);
+	vst1q_u64((laddr + 2), cmd23);
+
+	*(__uint128_t *)(laddr + 4) = *(__uint128_t *)ucode_cmd;
+	*(__uint128_t *)(laddr + 6) = *(__uint128_t *)(ucode_cmd + 2);
+
+	/* Move to next line for every other CPT inst */
+	*loff = !(*loff);
+	*lnum = *lnum + (*loff ? 0 : 1);
+	*shft = *shft + (*loff ? 0 : 3);
+}
+
+#else
+
+static __rte_always_inline void
+cn20k_nix_prep_sec(struct rte_mbuf *m, uint64_t *cmd, uintptr_t *nixtx_addr, uintptr_t lbase,
+		   uint8_t *lnum, uint8_t *loff, uint8_t *shft, uint64_t sa_base,
+		   const uint16_t flags)
+{
+	RTE_SET_USED(m);
+	RTE_SET_USED(cmd);
+	RTE_SET_USED(nixtx_addr);
+	RTE_SET_USED(lbase);
+	RTE_SET_USED(lnum);
+	RTE_SET_USED(loff);
+	RTE_SET_USED(shft);
+	RTE_SET_USED(sa_base);
+	RTE_SET_USED(flags);
+}
+#endif
+
+static inline void
+cn20k_nix_free_extmbuf(struct rte_mbuf *m)
+{
+	struct rte_mbuf *m_next;
+	while (m != NULL) {
+		m_next = m->next;
+		rte_pktmbuf_free_seg(m);
+		m = m_next;
+	}
+}
+
+static __rte_always_inline uint64_t
+cn20k_nix_prefree_seg(struct rte_mbuf *m, struct rte_mbuf **extm, struct cn20k_eth_txq *txq,
+		      struct nix_send_hdr_s *send_hdr, uint64_t *aura)
+{
+	struct rte_mbuf *prev = NULL;
+	uint32_t sqe_id;
+
+	if (RTE_MBUF_HAS_EXTBUF(m)) {
+		if (unlikely(txq->tx_compl.ena == 0)) {
+			m->next = *extm;
+			*extm = m;
+			return 1;
+		}
+		if (send_hdr->w0.pnc) {
+			sqe_id = send_hdr->w1.sqe_id;
+			prev = txq->tx_compl.ptr[sqe_id];
+			m->next = prev;
+			txq->tx_compl.ptr[sqe_id] = m;
+		} else {
+			sqe_id = __atomic_fetch_add(&txq->tx_compl.sqe_id, 1, __ATOMIC_RELAXED);
+			send_hdr->w0.pnc = 1;
+			send_hdr->w1.sqe_id = sqe_id & txq->tx_compl.nb_desc_mask;
+			txq->tx_compl.ptr[send_hdr->w1.sqe_id] = m;
+		}
+		return 1;
+	} else {
+		return cnxk_nix_prefree_seg(m, aura);
+	}
+}
+
+static __rte_always_inline void
+cn20k_nix_xmit_prepare_tso(struct rte_mbuf *m, const uint64_t flags)
+{
+	uint64_t mask, ol_flags = m->ol_flags;
+
+	if (flags & NIX_TX_OFFLOAD_TSO_F && (ol_flags & RTE_MBUF_F_TX_TCP_SEG)) {
+		uintptr_t mdata = rte_pktmbuf_mtod(m, uintptr_t);
+		uint16_t *iplen, *oiplen, *oudplen;
+		uint16_t lso_sb, paylen;
+
+		mask = -!!(ol_flags & (RTE_MBUF_F_TX_OUTER_IPV4 | RTE_MBUF_F_TX_OUTER_IPV6));
+		lso_sb = (mask & (m->outer_l2_len + m->outer_l3_len)) + m->l2_len + m->l3_len +
+			 m->l4_len;
+
+		/* Reduce payload len from base headers */
+		paylen = m->pkt_len - lso_sb;
+
+		/* Get iplen position assuming no tunnel hdr */
+		iplen = (uint16_t *)(mdata + m->l2_len + (2 << !!(ol_flags & RTE_MBUF_F_TX_IPV6)));
+		/* Handle tunnel tso */
+		if ((flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) &&
+		    (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)) {
+			const uint8_t is_udp_tun =
+				(CNXK_NIX_UDP_TUN_BITMASK >>
+				 ((ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) >> 45)) &
+				0x1;
+
+			oiplen = (uint16_t *)(mdata + m->outer_l2_len +
+					      (2 << !!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV6)));
+			*oiplen = rte_cpu_to_be_16(rte_be_to_cpu_16(*oiplen) - paylen);
+
+			/* Update format for UDP tunneled packet */
+			if (is_udp_tun) {
+				oudplen =
+					(uint16_t *)(mdata + m->outer_l2_len + m->outer_l3_len + 4);
+				*oudplen = rte_cpu_to_be_16(rte_be_to_cpu_16(*oudplen) - paylen);
+			}
+
+			/* Update iplen position to inner ip hdr */
+			iplen = (uint16_t *)(mdata + lso_sb - m->l3_len - m->l4_len +
+					     (2 << !!(ol_flags & RTE_MBUF_F_TX_IPV6)));
+		}
+
+		*iplen = rte_cpu_to_be_16(rte_be_to_cpu_16(*iplen) - paylen);
+	}
+}
+
+static __rte_always_inline void
+cn20k_nix_xmit_prepare(struct cn20k_eth_txq *txq, struct rte_mbuf *m, struct rte_mbuf **extm,
+		       uint64_t *cmd, const uint16_t flags, const uint64_t lso_tun_fmt, bool *sec,
+		       uint8_t mark_flag, uint64_t mark_fmt)
+{
+	uint8_t mark_off = 0, mark_vlan = 0, markptr = 0;
+	struct nix_send_ext_s *send_hdr_ext;
+	struct nix_send_hdr_s *send_hdr;
+	uint64_t ol_flags = 0, mask;
+	union nix_send_hdr_w1_u w1;
+	union nix_send_sg_s *sg;
+	uint16_t mark_form = 0;
+
+	send_hdr = (struct nix_send_hdr_s *)cmd;
+	if (flags & NIX_TX_NEED_EXT_HDR) {
+		send_hdr_ext = (struct nix_send_ext_s *)(cmd + 2);
+		sg = (union nix_send_sg_s *)(cmd + 4);
+		/* Clear previous markings */
+		send_hdr_ext->w0.lso = 0;
+		send_hdr_ext->w0.mark_en = 0;
+		send_hdr_ext->w1.u = 0;
+		ol_flags = m->ol_flags;
+	} else {
+		sg = (union nix_send_sg_s *)(cmd + 2);
+	}
+
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F)
+		send_hdr->w0.pnc = 0;
+
+	if (flags & (NIX_TX_NEED_SEND_HDR_W1 | NIX_TX_OFFLOAD_SECURITY_F)) {
+		ol_flags = m->ol_flags;
+		w1.u = 0;
+	}
+
+	if (!(flags & NIX_TX_MULTI_SEG_F))
+		send_hdr->w0.total = m->data_len;
+	else
+		send_hdr->w0.total = m->pkt_len;
+	send_hdr->w0.aura = roc_npa_aura_handle_to_aura(m->pool->pool_id);
+
+	/*
+	 * L3type:  2 => IPV4
+	 *          3 => IPV4 with csum
+	 *          4 => IPV6
+	 * L3type and L3ptr needs to be set for either
+	 * L3 csum or L4 csum or LSO
+	 *
+	 */
+
+	if ((flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) && (flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F)) {
+		const uint8_t csum = !!(ol_flags & RTE_MBUF_F_TX_OUTER_UDP_CKSUM);
+		const uint8_t ol3type = ((!!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV4)) << 1) +
+					((!!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV6)) << 2) +
+					!!(ol_flags & RTE_MBUF_F_TX_OUTER_IP_CKSUM);
+
+		/* Outer L3 */
+		w1.ol3type = ol3type;
+		mask = 0xffffull << ((!!ol3type) << 4);
+		w1.ol3ptr = ~mask & m->outer_l2_len;
+		w1.ol4ptr = ~mask & (w1.ol3ptr + m->outer_l3_len);
+
+		/* Outer L4 */
+		w1.ol4type = csum + (csum << 1);
+
+		/* Inner L3 */
+		w1.il3type = ((!!(ol_flags & RTE_MBUF_F_TX_IPV4)) << 1) +
+			     ((!!(ol_flags & RTE_MBUF_F_TX_IPV6)) << 2);
+		w1.il3ptr = w1.ol4ptr + m->l2_len;
+		w1.il4ptr = w1.il3ptr + m->l3_len;
+		/* Increment it by 1 if it is IPV4 as 3 is with csum */
+		w1.il3type = w1.il3type + !!(ol_flags & RTE_MBUF_F_TX_IP_CKSUM);
+
+		/* Inner L4 */
+		w1.il4type = (ol_flags & RTE_MBUF_F_TX_L4_MASK) >> 52;
+
+		/* In case of no tunnel header use only
+		 * shift IL3/IL4 fields a bit to use
+		 * OL3/OL4 for header checksum
+		 */
+		mask = !ol3type;
+		w1.u = ((w1.u & 0xFFFFFFFF00000000) >> (mask << 3)) |
+		       ((w1.u & 0X00000000FFFFFFFF) >> (mask << 4));
+
+	} else if (flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) {
+		const uint8_t csum = !!(ol_flags & RTE_MBUF_F_TX_OUTER_UDP_CKSUM);
+		const uint8_t outer_l2_len = m->outer_l2_len;
+
+		/* Outer L3 */
+		w1.ol3ptr = outer_l2_len;
+		w1.ol4ptr = outer_l2_len + m->outer_l3_len;
+		/* Increment it by 1 if it is IPV4 as 3 is with csum */
+		w1.ol3type = ((!!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV4)) << 1) +
+			     ((!!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV6)) << 2) +
+			     !!(ol_flags & RTE_MBUF_F_TX_OUTER_IP_CKSUM);
+
+		/* Outer L4 */
+		w1.ol4type = csum + (csum << 1);
+
+	} else if (flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F) {
+		const uint8_t l2_len = m->l2_len;
+
+		/* Always use OLXPTR and OLXTYPE when only
+		 * when one header is present
+		 */
+
+		/* Inner L3 */
+		w1.ol3ptr = l2_len;
+		w1.ol4ptr = l2_len + m->l3_len;
+		/* Increment it by 1 if it is IPV4 as 3 is with csum */
+		w1.ol3type = ((!!(ol_flags & RTE_MBUF_F_TX_IPV4)) << 1) +
+			     ((!!(ol_flags & RTE_MBUF_F_TX_IPV6)) << 2) +
+			     !!(ol_flags & RTE_MBUF_F_TX_IP_CKSUM);
+
+		/* Inner L4 */
+		w1.ol4type = (ol_flags & RTE_MBUF_F_TX_L4_MASK) >> 52;
+	}
+
+	if (flags & NIX_TX_NEED_EXT_HDR && flags & NIX_TX_OFFLOAD_VLAN_QINQ_F) {
+		const uint8_t ipv6 = !!(ol_flags & RTE_MBUF_F_TX_IPV6);
+		const uint8_t ip = !!(ol_flags & (RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IPV6));
+
+		send_hdr_ext->w1.vlan1_ins_ena = !!(ol_flags & RTE_MBUF_F_TX_VLAN);
+		/* HW will update ptr after vlan0 update */
+		send_hdr_ext->w1.vlan1_ins_ptr = 12;
+		send_hdr_ext->w1.vlan1_ins_tci = m->vlan_tci;
+
+		send_hdr_ext->w1.vlan0_ins_ena = !!(ol_flags & RTE_MBUF_F_TX_QINQ);
+		/* 2B before end of l2 header */
+		send_hdr_ext->w1.vlan0_ins_ptr = 12;
+		send_hdr_ext->w1.vlan0_ins_tci = m->vlan_tci_outer;
+		/* Fill for VLAN marking only when VLAN insertion enabled */
+		mark_vlan = ((mark_flag & CNXK_TM_MARK_VLAN_DEI) &
+			     (send_hdr_ext->w1.vlan1_ins_ena || send_hdr_ext->w1.vlan0_ins_ena));
+
+		/* Mask requested flags with packet data information */
+		mark_off = mark_flag & ((ip << 2) | (ip << 1) | mark_vlan);
+		mark_off = ffs(mark_off & CNXK_TM_MARK_MASK);
+
+		mark_form = (mark_fmt >> ((mark_off - !!mark_off) << 4));
+		mark_form = (mark_form >> (ipv6 << 3)) & 0xFF;
+		markptr = m->l2_len + (mark_form >> 7) - (mark_vlan << 2);
+
+		send_hdr_ext->w0.mark_en = !!mark_off;
+		send_hdr_ext->w0.markform = mark_form & 0x7F;
+		send_hdr_ext->w0.markptr = markptr;
+	}
+
+	if (flags & NIX_TX_NEED_EXT_HDR && flags & NIX_TX_OFFLOAD_TSO_F &&
+	    (ol_flags & RTE_MBUF_F_TX_TCP_SEG)) {
+		uint16_t lso_sb;
+		uint64_t mask;
+
+		mask = -(!w1.il3type);
+		lso_sb = (mask & w1.ol4ptr) + (~mask & w1.il4ptr) + m->l4_len;
+
+		send_hdr_ext->w0.lso_sb = lso_sb;
+		send_hdr_ext->w0.lso = 1;
+		send_hdr_ext->w0.lso_mps = m->tso_segsz;
+		send_hdr_ext->w0.lso_format =
+			NIX_LSO_FORMAT_IDX_TSOV4 + !!(ol_flags & RTE_MBUF_F_TX_IPV6);
+		w1.ol4type = NIX_SENDL4TYPE_TCP_CKSUM;
+
+		/* Handle tunnel tso */
+		if ((flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) &&
+		    (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)) {
+			const uint8_t is_udp_tun =
+				(CNXK_NIX_UDP_TUN_BITMASK >>
+				 ((ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) >> 45)) &
+				0x1;
+			uint8_t shift = is_udp_tun ? 32 : 0;
+
+			shift += (!!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV6) << 4);
+			shift += (!!(ol_flags & RTE_MBUF_F_TX_IPV6) << 3);
+
+			w1.il4type = NIX_SENDL4TYPE_TCP_CKSUM;
+			w1.ol4type = is_udp_tun ? NIX_SENDL4TYPE_UDP_CKSUM : 0;
+			/* Update format for UDP tunneled packet */
+			send_hdr_ext->w0.lso_format = (lso_tun_fmt >> shift);
+		}
+	}
+
+	if (flags & NIX_TX_NEED_SEND_HDR_W1)
+		send_hdr->w1.u = w1.u;
+
+	if (!(flags & NIX_TX_MULTI_SEG_F)) {
+		struct rte_mbuf *cookie;
+
+		sg->seg1_size = send_hdr->w0.total;
+		*(rte_iova_t *)(sg + 1) = rte_mbuf_data_iova(m);
+		cookie = RTE_MBUF_DIRECT(m) ? m : rte_mbuf_from_indirect(m);
+
+		if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) {
+			uint64_t aura;
+
+			/* DF bit = 1 if refcount of current mbuf or parent mbuf
+			 *		is greater than 1
+			 * DF bit = 0 otherwise
+			 */
+			aura = send_hdr->w0.aura;
+			send_hdr->w0.df = cn20k_nix_prefree_seg(m, extm, txq, send_hdr, &aura);
+			send_hdr->w0.aura = aura;
+		}
+#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
+		/* Mark mempool object as "put" since it is freed by NIX */
+		if (!send_hdr->w0.df)
+			RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+#else
+		RTE_SET_USED(cookie);
+#endif
+	} else {
+		sg->seg1_size = m->data_len;
+		*(rte_iova_t *)(sg + 1) = rte_mbuf_data_iova(m);
+
+		/* NOFF is handled later for multi-seg */
+	}
+
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+		*sec = !!(ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD);
+}
+
+static __rte_always_inline void
+cn20k_nix_xmit_mv_lmt_base(uintptr_t lmt_addr, uint64_t *cmd, const uint16_t flags)
+{
+	struct nix_send_ext_s *send_hdr_ext;
+	union nix_send_sg_s *sg;
+
+	/* With minimal offloads, 'cmd' being local could be optimized out to
+	 * registers. In other cases, 'cmd' will be in stack. Intent is
+	 * 'cmd' stores content from txq->cmd which is copied only once.
+	 */
+	*((struct nix_send_hdr_s *)lmt_addr) = *(struct nix_send_hdr_s *)cmd;
+	lmt_addr += 16;
+	if (flags & NIX_TX_NEED_EXT_HDR) {
+		send_hdr_ext = (struct nix_send_ext_s *)(cmd + 2);
+		*((struct nix_send_ext_s *)lmt_addr) = *send_hdr_ext;
+		lmt_addr += 16;
+
+		sg = (union nix_send_sg_s *)(cmd + 4);
+	} else {
+		sg = (union nix_send_sg_s *)(cmd + 2);
+	}
+	/* In case of multi-seg, sg template is stored here */
+	*((union nix_send_sg_s *)lmt_addr) = *sg;
+	*(rte_iova_t *)(lmt_addr + 8) = *(rte_iova_t *)(sg + 1);
+}
+
+static __rte_always_inline void
+cn20k_nix_xmit_prepare_tstamp(struct cn20k_eth_txq *txq, uintptr_t lmt_addr,
+			      const uint64_t ol_flags, const uint16_t no_segdw,
+			      const uint16_t flags)
+{
+	if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+		const uint8_t is_ol_tstamp = !(ol_flags & RTE_MBUF_F_TX_IEEE1588_TMST);
+		uint64_t *lmt = (uint64_t *)lmt_addr;
+		uint16_t off = (no_segdw - 1) << 1;
+		struct nix_send_mem_s *send_mem;
+
+		send_mem = (struct nix_send_mem_s *)(lmt + off);
+		/* Packets for which PKT_TX_IEEE1588_TMST is not set, tx tstamp
+		 * should not be recorded, hence changing the alg type to
+		 * NIX_SENDMEMALG_SUB and also changing send mem addr field to
+		 * next 8 bytes as it corrupts the actual Tx tstamp registered
+		 * address.
+		 */
+		send_mem->w0.subdc = NIX_SUBDC_MEM;
+		send_mem->w0.alg = NIX_SENDMEMALG_SETTSTMP + (is_ol_tstamp << 3);
+		send_mem->addr = (rte_iova_t)(((uint64_t *)txq->ts_mem) + is_ol_tstamp);
+	}
+}
+
+static __rte_always_inline uint16_t
+cn20k_nix_xmit_pkts(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, uint16_t pkts,
+		    uint64_t *cmd, const uint16_t flags)
+{
+	struct cn20k_eth_txq *txq = tx_queue;
+	const rte_iova_t io_addr = txq->io_addr;
+	uint8_t lnum, c_lnum, c_shft, c_loff;
+	uintptr_t pa, lbase = txq->lmt_base;
+	uint16_t lmt_id, burst, left, i;
+	struct rte_mbuf *extm = NULL;
+	uintptr_t c_lbase = lbase;
+	uint64_t lso_tun_fmt = 0;
+	uint64_t mark_fmt = 0;
+	uint8_t mark_flag = 0;
+	rte_iova_t c_io_addr;
+	uint16_t c_lmt_id;
+	uint64_t sa_base;
+	uintptr_t laddr;
+	uint64_t data;
+	bool sec;
+
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && txq->tx_compl.ena)
+		handle_tx_completion_pkts(txq, flags & NIX_TX_VWQE_F);
+
+	if (!(flags & NIX_TX_VWQE_F))
+		NIX_XMIT_FC_CHECK_RETURN(txq, pkts);
+
+	/* Get cmd skeleton */
+	cn20k_nix_tx_skeleton(txq, cmd, flags, !(flags & NIX_TX_VWQE_F));
+
+	if (flags & NIX_TX_OFFLOAD_TSO_F)
+		lso_tun_fmt = txq->lso_tun_fmt;
+
+	if (flags & NIX_TX_OFFLOAD_VLAN_QINQ_F) {
+		mark_fmt = txq->mark_fmt;
+		mark_flag = txq->mark_flag;
+	}
+
+	/* Get LMT base address and LMT ID as lcore id */
+	ROC_LMT_BASE_ID_GET(lbase, lmt_id);
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		ROC_LMT_CPT_BASE_ID_GET(c_lbase, c_lmt_id);
+		c_io_addr = txq->cpt_io_addr;
+		sa_base = txq->sa_base;
+	}
+
+	left = pkts;
+again:
+	burst = left > 32 ? 32 : left;
+
+	lnum = 0;
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		c_lnum = 0;
+		c_loff = 0;
+		c_shft = 16;
+	}
+
+	for (i = 0; i < burst; i++) {
+		/* Perform header writes for TSO, barrier at
+		 * lmt steorl will suffice.
+		 */
+		if (flags & NIX_TX_OFFLOAD_TSO_F)
+			cn20k_nix_xmit_prepare_tso(tx_pkts[i], flags);
+
+		cn20k_nix_xmit_prepare(txq, tx_pkts[i], &extm, cmd, flags, lso_tun_fmt, &sec,
+				       mark_flag, mark_fmt);
+
+		laddr = (uintptr_t)LMT_OFF(lbase, lnum, 0);
+
+		/* Prepare CPT instruction and get nixtx addr */
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F && sec)
+			cn20k_nix_prep_sec(tx_pkts[i], cmd, &laddr, c_lbase, &c_lnum, &c_loff,
+					   &c_shft, sa_base, flags);
+
+		/* Move NIX desc to LMT/NIXTX area */
+		cn20k_nix_xmit_mv_lmt_base(laddr, cmd, flags);
+		cn20k_nix_xmit_prepare_tstamp(txq, laddr, tx_pkts[i]->ol_flags, 4, flags);
+		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F) || !sec)
+			lnum++;
+	}
+
+	if ((flags & NIX_TX_VWQE_F) && !(ws[3] & BIT_ULL(35)))
+		ws[3] = roc_sso_hws_head_wait(ws[0]);
+
+	left -= burst;
+	tx_pkts += burst;
+
+	/* Submit CPT instructions if any */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		uint16_t sec_pkts = ((c_lnum << 1) + c_loff);
+
+		/* Reduce pkts to be sent to CPT */
+		burst -= sec_pkts;
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, sec_pkts);
+		cn20k_nix_sec_fc_wait(txq, sec_pkts);
+		cn20k_nix_sec_steorl(c_io_addr, c_lmt_id, c_lnum, c_loff, c_shft);
+	}
+
+	/* Trigger LMTST */
+	if (burst > 16) {
+		data = cn20k_nix_tx_steor_data(flags);
+		pa = io_addr | (data & 0x7) << 4;
+		data &= ~0x7ULL;
+		data |= (15ULL << 12);
+		data |= (uint64_t)lmt_id;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, 16);
+		/* STEOR0 */
+		roc_lmt_submit_steorl(data, pa);
+
+		data = cn20k_nix_tx_steor_data(flags);
+		pa = io_addr | (data & 0x7) << 4;
+		data &= ~0x7ULL;
+		data |= ((uint64_t)(burst - 17)) << 12;
+		data |= (uint64_t)(lmt_id + 16);
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, burst - 16);
+		/* STEOR1 */
+		roc_lmt_submit_steorl(data, pa);
+	} else if (burst) {
+		data = cn20k_nix_tx_steor_data(flags);
+		pa = io_addr | (data & 0x7) << 4;
+		data &= ~0x7ULL;
+		data |= ((uint64_t)(burst - 1)) << 12;
+		data |= (uint64_t)lmt_id;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, burst);
+		/* STEOR0 */
+		roc_lmt_submit_steorl(data, pa);
+	}
+
+	rte_io_wmb();
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && !txq->tx_compl.ena) {
+		cn20k_nix_free_extmbuf(extm);
+		extm = NULL;
+	}
+
+	if (left)
+		goto again;
+
+	return pkts;
+}
+
 #define L3L4CSUM_F   NIX_TX_OFFLOAD_L3_L4_CSUM_F
 #define OL3OL4CSUM_F NIX_TX_OFFLOAD_OL3_OL4_CSUM_F
 #define VLAN_F	     NIX_TX_OFFLOAD_VLAN_QINQ_F
@@ -225,10 +1202,11 @@ NIX_TX_FASTPATH_MODES
 	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
 					     uint16_t pkts)                                        \
 	{                                                                                          \
-		RTE_SET_USED(tx_queue);                                                            \
-		RTE_SET_USED(tx_pkts);                                                             \
-		RTE_SET_USED(pkts);                                                                \
-		return 0;                                                                          \
+		uint64_t cmd[sz];                                                                  \
+		/* For TSO inner checksum is a must */                                             \
+		if (((flags) & NIX_TX_OFFLOAD_TSO_F) && !((flags) & NIX_TX_OFFLOAD_L3_L4_CSUM_F))  \
+			return 0;                                                                  \
+		return cn20k_nix_xmit_pkts(tx_queue, NULL, tx_pkts, pkts, cmd, flags);             \
 	}
 
 #define NIX_TX_XMIT_MSEG(fn, sz, flags)                                                            \
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v3 16/18] net/cnxk: support Tx multi-seg in cn20k
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
                     ` (14 preceding siblings ...)
  2024-10-01 12:40   ` [PATCH v3 15/18] net/cnxk: support Tx burst scalar " Nithin Dabilpuram
@ 2024-10-01 12:40   ` Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 17/18] net/cnxk: support Tx burst vector for cn20k Nithin Dabilpuram
                     ` (2 subsequent siblings)
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-10-01 12:40 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rahul Bhansali, Pavan Nikhilesh

Add Tx multi-seg support in scalar for cn20k.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 drivers/net/cnxk/cn20k_tx.h | 352 +++++++++++++++++++++++++++++++++++-
 1 file changed, 347 insertions(+), 5 deletions(-)

diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
index dda745abf4..f7a78d34ea 100644
--- a/drivers/net/cnxk/cn20k_tx.h
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -862,6 +862,183 @@ cn20k_nix_xmit_prepare_tstamp(struct cn20k_eth_txq *txq, uintptr_t lmt_addr,
 	}
 }
 
+static __rte_always_inline uint16_t
+cn20k_nix_prepare_mseg(struct cn20k_eth_txq *txq, struct rte_mbuf *m, struct rte_mbuf **extm,
+		       uint64_t *cmd, const uint16_t flags)
+{
+	uint64_t prefree = 0, aura0, aura, nb_segs, segdw;
+	struct nix_send_hdr_s *send_hdr;
+	union nix_send_sg_s *sg, l_sg;
+	union nix_send_sg2_s l_sg2;
+	struct rte_mbuf *cookie;
+	struct rte_mbuf *m_next;
+	uint8_t off, is_sg2;
+	uint64_t len, dlen;
+	uint64_t ol_flags;
+	uint64_t *slist;
+
+	send_hdr = (struct nix_send_hdr_s *)cmd;
+
+	if (flags & NIX_TX_NEED_EXT_HDR)
+		off = 2;
+	else
+		off = 0;
+
+	sg = (union nix_send_sg_s *)&cmd[2 + off];
+	len = send_hdr->w0.total;
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+		ol_flags = m->ol_flags;
+
+	/* Start from second segment, first segment is already there */
+	dlen = m->data_len;
+	is_sg2 = 0;
+	l_sg.u = sg->u;
+	/* Clear l_sg.u first seg length that might be stale from vector path */
+	l_sg.u &= ~0xFFFFUL;
+	l_sg.u |= dlen;
+	len -= dlen;
+	nb_segs = m->nb_segs - 1;
+	m_next = m->next;
+	m->next = NULL;
+	m->nb_segs = 1;
+	slist = &cmd[3 + off + 1];
+
+	cookie = RTE_MBUF_DIRECT(m) ? m : rte_mbuf_from_indirect(m);
+	/* Set invert df if buffer is not to be freed by H/W */
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) {
+		aura = send_hdr->w0.aura;
+		prefree = cn20k_nix_prefree_seg(m, extm, txq, send_hdr, &aura);
+		send_hdr->w0.aura = aura;
+		l_sg.i1 = prefree;
+	}
+
+#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
+	/* Mark mempool object as "put" since it is freed by NIX */
+	if (!prefree)
+		RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+	rte_io_wmb();
+#else
+	RTE_SET_USED(cookie);
+#endif
+
+	/* Quickly handle single segmented packets. With this if-condition
+	 * compiler will completely optimize out the below do-while loop
+	 * from the Tx handler when NIX_TX_MULTI_SEG_F offload is not set.
+	 */
+	if (!(flags & NIX_TX_MULTI_SEG_F))
+		goto done;
+
+	aura0 = send_hdr->w0.aura;
+	m = m_next;
+	if (!m)
+		goto done;
+
+	/* Fill mbuf segments */
+	do {
+		uint64_t iova;
+
+		/* Save the current mbuf properties. These can get cleared in
+		 * cnxk_nix_prefree_seg()
+		 */
+		m_next = m->next;
+		iova = rte_mbuf_data_iova(m);
+		dlen = m->data_len;
+		len -= dlen;
+
+		nb_segs--;
+		aura = aura0;
+		prefree = 0;
+
+		m->next = NULL;
+
+		cookie = RTE_MBUF_DIRECT(m) ? m : rte_mbuf_from_indirect(m);
+		if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) {
+			aura = roc_npa_aura_handle_to_aura(m->pool->pool_id);
+			prefree = cn20k_nix_prefree_seg(m, extm, txq, send_hdr, &aura);
+			is_sg2 = aura != aura0 && !prefree;
+		}
+
+		if (unlikely(is_sg2)) {
+			/* This mbuf belongs to a different pool and
+			 * DF bit is not to be set, so use SG2 subdesc
+			 * so that it is freed to the appropriate pool.
+			 */
+
+			/* Write the previous descriptor out */
+			sg->u = l_sg.u;
+
+			/* If the current SG subdc does not have any
+			 * iovas in it, then the SG2 subdc can overwrite
+			 * that SG subdc.
+			 *
+			 * If the current SG subdc has 2 iovas in it, then
+			 * the current iova word should be left empty.
+			 */
+			slist += (-1 + (int)l_sg.segs);
+			sg = (union nix_send_sg_s *)slist;
+
+			l_sg2.u = l_sg.u & 0xC00000000000000; /* LD_TYPE */
+			l_sg2.subdc = NIX_SUBDC_SG2;
+			l_sg2.aura = aura;
+			l_sg2.seg1_size = dlen;
+			l_sg.u = l_sg2.u;
+
+			slist++;
+			*slist = iova;
+			slist++;
+		} else {
+			*slist = iova;
+			/* Set invert df if buffer is not to be freed by H/W */
+			l_sg.u |= (prefree << (l_sg.segs + 55));
+			/* Set the segment length */
+			l_sg.u |= ((uint64_t)dlen << (l_sg.segs << 4));
+			l_sg.segs += 1;
+			slist++;
+		}
+
+		if ((is_sg2 || l_sg.segs > 2) && nb_segs) {
+			sg->u = l_sg.u;
+			/* Next SG subdesc */
+			sg = (union nix_send_sg_s *)slist;
+			l_sg.u &= 0xC00000000000000; /* LD_TYPE */
+			l_sg.subdc = NIX_SUBDC_SG;
+			slist++;
+		}
+
+#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
+		/* Mark mempool object as "put" since it is freed by NIX
+		 */
+		if (!prefree)
+			RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+#else
+		RTE_SET_USED(cookie);
+#endif
+		m = m_next;
+	} while (nb_segs);
+
+done:
+	/* Add remaining bytes of security data to last seg */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F && ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD && len) {
+		uint8_t shft = (l_sg.subdc == NIX_SUBDC_SG) ? ((l_sg.segs - 1) << 4) : 0;
+
+		dlen = ((l_sg.u >> shft) & 0xFFFFULL) + len;
+		l_sg.u = l_sg.u & ~(0xFFFFULL << shft);
+		l_sg.u |= dlen << shft;
+	}
+
+	/* Write the last subdc out */
+	sg->u = l_sg.u;
+
+	segdw = (uint64_t *)slist - (uint64_t *)&cmd[2 + off];
+	/* Roundup extra dwords to multiple of 2 */
+	segdw = (segdw >> 1) + (segdw & 0x1);
+	/* Default dwords */
+	segdw += (off >> 1) + 1 + !!(flags & NIX_TX_OFFLOAD_TSTAMP_F);
+	send_hdr->w0.sizem1 = segdw - 1;
+
+	return segdw;
+}
+
 static __rte_always_inline uint16_t
 cn20k_nix_xmit_pkts(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, uint16_t pkts,
 		    uint64_t *cmd, const uint16_t flags)
@@ -1009,6 +1186,170 @@ cn20k_nix_xmit_pkts(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, uin
 	return pkts;
 }
 
+static __rte_always_inline uint16_t
+cn20k_nix_xmit_pkts_mseg(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, uint16_t pkts,
+			 uint64_t *cmd, const uint16_t flags)
+{
+	struct cn20k_eth_txq *txq = tx_queue;
+	uintptr_t pa0, pa1, lbase = txq->lmt_base;
+	const rte_iova_t io_addr = txq->io_addr;
+	uint16_t segdw, lmt_id, burst, left, i;
+	struct rte_mbuf *extm = NULL;
+	uint8_t lnum, c_lnum, c_loff;
+	uintptr_t c_lbase = lbase;
+	uint64_t lso_tun_fmt = 0;
+	uint64_t mark_fmt = 0;
+	uint8_t mark_flag = 0;
+	uint64_t data0, data1;
+	rte_iova_t c_io_addr;
+	uint8_t shft, c_shft;
+	__uint128_t data128;
+	uint16_t c_lmt_id;
+	uint64_t sa_base;
+	uintptr_t laddr;
+	bool sec;
+
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && txq->tx_compl.ena)
+		handle_tx_completion_pkts(txq, flags & NIX_TX_VWQE_F);
+
+	if (!(flags & NIX_TX_VWQE_F))
+		NIX_XMIT_FC_CHECK_RETURN(txq, pkts);
+
+	/* Get cmd skeleton */
+	cn20k_nix_tx_skeleton(txq, cmd, flags, !(flags & NIX_TX_VWQE_F));
+
+	if (flags & NIX_TX_OFFLOAD_TSO_F)
+		lso_tun_fmt = txq->lso_tun_fmt;
+
+	if (flags & NIX_TX_OFFLOAD_VLAN_QINQ_F) {
+		mark_fmt = txq->mark_fmt;
+		mark_flag = txq->mark_flag;
+	}
+
+	/* Get LMT base address and LMT ID as lcore id */
+	ROC_LMT_BASE_ID_GET(lbase, lmt_id);
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		ROC_LMT_CPT_BASE_ID_GET(c_lbase, c_lmt_id);
+		c_io_addr = txq->cpt_io_addr;
+		sa_base = txq->sa_base;
+	}
+
+	left = pkts;
+again:
+	burst = left > 32 ? 32 : left;
+	shft = 16;
+	data128 = 0;
+
+	lnum = 0;
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		c_lnum = 0;
+		c_loff = 0;
+		c_shft = 16;
+	}
+
+	for (i = 0; i < burst; i++) {
+		cn20k_nix_tx_mbuf_validate(tx_pkts[i], flags);
+
+		/* Perform header writes for TSO, barrier at
+		 * lmt steorl will suffice.
+		 */
+		if (flags & NIX_TX_OFFLOAD_TSO_F)
+			cn20k_nix_xmit_prepare_tso(tx_pkts[i], flags);
+
+		cn20k_nix_xmit_prepare(txq, tx_pkts[i], &extm, cmd, flags, lso_tun_fmt, &sec,
+				       mark_flag, mark_fmt);
+
+		laddr = (uintptr_t)LMT_OFF(lbase, lnum, 0);
+
+		/* Prepare CPT instruction and get nixtx addr */
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F && sec)
+			cn20k_nix_prep_sec(tx_pkts[i], cmd, &laddr, c_lbase, &c_lnum, &c_loff,
+					   &c_shft, sa_base, flags);
+
+		/* Move NIX desc to LMT/NIXTX area */
+		cn20k_nix_xmit_mv_lmt_base(laddr, cmd, flags);
+		/* Store sg list directly on lmt line */
+		segdw = cn20k_nix_prepare_mseg(txq, tx_pkts[i], &extm, (uint64_t *)laddr, flags);
+		cn20k_nix_xmit_prepare_tstamp(txq, laddr, tx_pkts[i]->ol_flags, segdw, flags);
+		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F) || !sec) {
+			lnum++;
+			data128 |= (((__uint128_t)(segdw - 1)) << shft);
+			shft += 3;
+		}
+	}
+
+	if ((flags & NIX_TX_VWQE_F) && !(ws[3] & BIT_ULL(35)))
+		ws[3] = roc_sso_hws_head_wait(ws[0]);
+
+	left -= burst;
+	tx_pkts += burst;
+
+	/* Submit CPT instructions if any */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		uint16_t sec_pkts = ((c_lnum << 1) + c_loff);
+
+		/* Reduce pkts to be sent to CPT */
+		burst -= sec_pkts;
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, sec_pkts);
+		cn20k_nix_sec_fc_wait(txq, sec_pkts);
+		cn20k_nix_sec_steorl(c_io_addr, c_lmt_id, c_lnum, c_loff, c_shft);
+	}
+
+	data0 = (uint64_t)data128;
+	data1 = (uint64_t)(data128 >> 64);
+	/* Make data0 similar to data1 */
+	data0 >>= 16;
+	/* Trigger LMTST */
+	if (burst > 16) {
+		pa0 = io_addr | (data0 & 0x7) << 4;
+		data0 &= ~0x7ULL;
+		/* Move lmtst1..15 sz to bits 63:19 */
+		data0 <<= 16;
+		data0 |= (15ULL << 12);
+		data0 |= (uint64_t)lmt_id;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, 16);
+		/* STEOR0 */
+		roc_lmt_submit_steorl(data0, pa0);
+
+		pa1 = io_addr | (data1 & 0x7) << 4;
+		data1 &= ~0x7ULL;
+		data1 <<= 16;
+		data1 |= ((uint64_t)(burst - 17)) << 12;
+		data1 |= (uint64_t)(lmt_id + 16);
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, burst - 16);
+		/* STEOR1 */
+		roc_lmt_submit_steorl(data1, pa1);
+	} else if (burst) {
+		pa0 = io_addr | (data0 & 0x7) << 4;
+		data0 &= ~0x7ULL;
+		/* Move lmtst1..15 sz to bits 63:19 */
+		data0 <<= 16;
+		data0 |= ((burst - 1ULL) << 12);
+		data0 |= (uint64_t)lmt_id;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, burst);
+		/* STEOR0 */
+		roc_lmt_submit_steorl(data0, pa0);
+	}
+
+	rte_io_wmb();
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && !txq->tx_compl.ena) {
+		cn20k_nix_free_extmbuf(extm);
+		extm = NULL;
+	}
+
+	if (left)
+		goto again;
+
+	return pkts;
+}
+
 #define L3L4CSUM_F   NIX_TX_OFFLOAD_L3_L4_CSUM_F
 #define OL3OL4CSUM_F NIX_TX_OFFLOAD_OL3_OL4_CSUM_F
 #define VLAN_F	     NIX_TX_OFFLOAD_VLAN_QINQ_F
@@ -1213,10 +1554,12 @@ NIX_TX_FASTPATH_MODES
 	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
 					     uint16_t pkts)                                        \
 	{                                                                                          \
-		RTE_SET_USED(tx_queue);                                                            \
-		RTE_SET_USED(tx_pkts);                                                             \
-		RTE_SET_USED(pkts);                                                                \
-		return 0;                                                                          \
+		uint64_t cmd[(sz) + CNXK_NIX_TX_MSEG_SG_DWORDS - 2];                               \
+		/* For TSO inner checksum is a must */                                             \
+		if (((flags) & NIX_TX_OFFLOAD_TSO_F) && !((flags) & NIX_TX_OFFLOAD_L3_L4_CSUM_F))  \
+			return 0;                                                                  \
+		return cn20k_nix_xmit_pkts_mseg(tx_queue, NULL, tx_pkts, pkts, cmd,                \
+						flags | NIX_TX_MULTI_SEG_F);                       \
 	}
 
 #define NIX_TX_XMIT_VEC(fn, sz, flags)                                                             \
@@ -1246,5 +1589,4 @@ uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_all_offload(void *tx_queue
 uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_vec_all_offload(void *tx_queue,
 								      struct rte_mbuf **tx_pkts,
 								      uint16_t pkts);
-
 #endif /* __CN20K_TX_H__ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v3 17/18] net/cnxk: support Tx burst vector for cn20k
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
                     ` (15 preceding siblings ...)
  2024-10-01 12:40   ` [PATCH v3 16/18] net/cnxk: support Tx multi-seg in cn20k Nithin Dabilpuram
@ 2024-10-01 12:40   ` Nithin Dabilpuram
  2024-10-01 12:40   ` [PATCH v3 18/18] net/cnxk: support Tx multi-seg in " Nithin Dabilpuram
  2024-10-03 15:52   ` [PATCH v3 00/18] add Marvell cn20k SOC support for mempool and net Jerin Jacob
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-10-01 12:40 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rahul Bhansali, Pavan Nikhilesh

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 drivers/net/cnxk/cn20k_tx.h | 1444 ++++++++++++++++++++++++++++++++++-
 1 file changed, 1440 insertions(+), 4 deletions(-)

diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
index f7a78d34ea..ac719865cd 100644
--- a/drivers/net/cnxk/cn20k_tx.h
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -219,6 +219,28 @@ cn20k_nix_tx_ext_subs(const uint16_t flags)
 		       ((flags & (NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSO_F)) ? 1 : 0);
 }
 
+static __rte_always_inline uint8_t
+cn20k_nix_tx_dwords(const uint16_t flags, const uint8_t segdw)
+{
+	if (!(flags & NIX_TX_MULTI_SEG_F))
+		return cn20k_nix_tx_ext_subs(flags) + 2;
+
+	/* Already everything is accounted for in segdw */
+	return segdw;
+}
+
+static __rte_always_inline uint8_t
+cn20k_nix_pkts_per_vec_brst(const uint16_t flags)
+{
+	return ((flags & NIX_TX_NEED_EXT_HDR) ? 2 : 4) << ROC_LMT_LINES_PER_CORE_LOG2;
+}
+
+static __rte_always_inline uint8_t
+cn20k_nix_tx_dwords_per_line(const uint16_t flags)
+{
+	return (flags & NIX_TX_NEED_EXT_HDR) ? ((flags & NIX_TX_OFFLOAD_TSTAMP_F) ? 8 : 6) : 8;
+}
+
 static __rte_always_inline uint64_t
 cn20k_nix_tx_steor_data(const uint16_t flags)
 {
@@ -247,6 +269,40 @@ cn20k_nix_tx_steor_data(const uint16_t flags)
 	return data;
 }
 
+static __rte_always_inline uint8_t
+cn20k_nix_tx_dwords_per_line_seg(const uint16_t flags)
+{
+	return ((flags & NIX_TX_NEED_EXT_HDR) ? (flags & NIX_TX_OFFLOAD_TSTAMP_F) ? 8 : 6 : 4);
+}
+
+static __rte_always_inline uint64_t
+cn20k_nix_tx_steor_vec_data(const uint16_t flags)
+{
+	const uint64_t dw_m1 = cn20k_nix_tx_dwords_per_line(flags) - 1;
+	uint64_t data;
+
+	/* This will be moved to addr area */
+	data = dw_m1;
+	/* 15 vector sizes for single seg */
+	data |= dw_m1 << 19;
+	data |= dw_m1 << 22;
+	data |= dw_m1 << 25;
+	data |= dw_m1 << 28;
+	data |= dw_m1 << 31;
+	data |= dw_m1 << 34;
+	data |= dw_m1 << 37;
+	data |= dw_m1 << 40;
+	data |= dw_m1 << 43;
+	data |= dw_m1 << 46;
+	data |= dw_m1 << 49;
+	data |= dw_m1 << 52;
+	data |= dw_m1 << 55;
+	data |= dw_m1 << 58;
+	data |= dw_m1 << 61;
+
+	return data;
+}
+
 static __rte_always_inline void
 cn20k_nix_tx_skeleton(struct cn20k_eth_txq *txq, uint64_t *cmd, const uint16_t flags,
 		      const uint16_t static_sz)
@@ -276,6 +332,33 @@ cn20k_nix_tx_skeleton(struct cn20k_eth_txq *txq, uint64_t *cmd, const uint16_t f
 	}
 }
 
+static __rte_always_inline void
+cn20k_nix_sec_fc_wait_one(struct cn20k_eth_txq *txq)
+{
+	uint64_t nb_desc = txq->cpt_desc;
+	uint64_t fc;
+
+#ifdef RTE_ARCH_ARM64
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldxr %[space], [%[addr]]		\n"
+		     "		cmp %[nb_desc], %[space]		\n"
+		     "		b.hi .Ldne%=				\n"
+		     "		sevl					\n"
+		     ".Lrty%=:	wfe					\n"
+		     "		ldxr %[space], [%[addr]]		\n"
+		     "		cmp %[nb_desc], %[space]		\n"
+		     "		b.ls .Lrty%=				\n"
+		     ".Ldne%=:						\n"
+		     : [space] "=&r"(fc)
+		     : [nb_desc] "r"(nb_desc), [addr] "r"(txq->cpt_fc)
+		     : "memory");
+#else
+	RTE_SET_USED(fc);
+	while (nb_desc <= __atomic_load_n(txq->cpt_fc, __ATOMIC_RELAXED))
+		;
+#endif
+}
+
 static __rte_always_inline void
 cn20k_nix_sec_fc_wait(struct cn20k_eth_txq *txq, uint16_t nb_pkts)
 {
@@ -346,6 +429,136 @@ cn20k_nix_sec_fc_wait(struct cn20k_eth_txq *txq, uint16_t nb_pkts)
 }
 
 #if defined(RTE_ARCH_ARM64)
+static __rte_always_inline void
+cn20k_nix_prep_sec_vec(struct rte_mbuf *m, uint64x2_t *cmd0, uint64x2_t *cmd1,
+		       uintptr_t *nixtx_addr, uintptr_t lbase, uint8_t *lnum, uint8_t *loff,
+		       uint8_t *shft, uint64_t sa_base, const uint16_t flags)
+{
+	struct cn20k_sec_sess_priv sess_priv;
+	uint32_t pkt_len, dlen_adj, rlen;
+	uint8_t l3l4type, chksum;
+	uint64x2_t cmd01, cmd23;
+	uint8_t l2_len, l3_len;
+	uintptr_t dptr, nixtx;
+	uint64_t ucode_cmd[4];
+	uint64_t *laddr, w0;
+	uint16_t tag;
+	uint64_t sa;
+
+	sess_priv.u64 = *rte_security_dynfield(m);
+
+	if (flags & NIX_TX_NEED_SEND_HDR_W1) {
+		/* Extract l3l4type either from il3il4type or ol3ol4type */
+		if (flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F && flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) {
+			l2_len = vgetq_lane_u8(*cmd0, 10);
+			/* L4 ptr from send hdr includes l2 and l3 len */
+			l3_len = vgetq_lane_u8(*cmd0, 11) - l2_len;
+			l3l4type = vgetq_lane_u8(*cmd0, 13);
+		} else {
+			l2_len = vgetq_lane_u8(*cmd0, 8);
+			/* L4 ptr from send hdr includes l2 and l3 len */
+			l3_len = vgetq_lane_u8(*cmd0, 9) - l2_len;
+			l3l4type = vgetq_lane_u8(*cmd0, 12);
+		}
+
+		chksum = (l3l4type & 0x1) << 1 | !!(l3l4type & 0x30);
+		chksum = ~chksum;
+		sess_priv.chksum = sess_priv.chksum & chksum;
+		/* Clear SEND header flags */
+		*cmd0 = vsetq_lane_u16(0, *cmd0, 6);
+	} else {
+		l2_len = m->l2_len;
+		l3_len = m->l3_len;
+	}
+
+	/* Retrieve DPTR */
+	dptr = vgetq_lane_u64(*cmd1, 1);
+	pkt_len = vgetq_lane_u16(*cmd0, 0);
+
+	/* Calculate dlen adj */
+	dlen_adj = pkt_len - l2_len;
+	/* Exclude l3 len from roundup for transport mode */
+	dlen_adj -= sess_priv.mode ? 0 : l3_len;
+	rlen = (dlen_adj + sess_priv.roundup_len) + (sess_priv.roundup_byte - 1);
+	rlen &= ~(uint64_t)(sess_priv.roundup_byte - 1);
+	rlen += sess_priv.partial_len;
+	dlen_adj = rlen - dlen_adj;
+
+	/* Update send descriptors. Security is single segment only */
+	*cmd0 = vsetq_lane_u16(pkt_len + dlen_adj, *cmd0, 0);
+
+	/* CPT word 5 and word 6 */
+	w0 = 0;
+	ucode_cmd[2] = 0;
+	if (flags & NIX_TX_MULTI_SEG_F && m->nb_segs > 1) {
+		struct rte_mbuf *last = rte_pktmbuf_lastseg(m);
+
+		/* Get area where NIX descriptor needs to be stored */
+		nixtx = rte_pktmbuf_mtod_offset(last, uintptr_t, last->data_len + dlen_adj);
+		nixtx += BIT_ULL(7);
+		nixtx = (nixtx - 1) & ~(BIT_ULL(7) - 1);
+		nixtx += 16;
+
+		dptr = nixtx + ((flags & NIX_TX_NEED_EXT_HDR) ? 32 : 16);
+
+		/* Set l2 length as data offset */
+		w0 = (uint64_t)l2_len << 16;
+		w0 |= cn20k_nix_tx_ext_subs(flags) + NIX_NB_SEGS_TO_SEGDW(m->nb_segs);
+		ucode_cmd[1] = dptr | ((uint64_t)m->nb_segs << 60);
+	} else {
+		/* Get area where NIX descriptor needs to be stored */
+		nixtx = dptr + pkt_len + dlen_adj;
+		nixtx += BIT_ULL(7);
+		nixtx = (nixtx - 1) & ~(BIT_ULL(7) - 1);
+		nixtx += 16;
+
+		w0 |= cn20k_nix_tx_ext_subs(flags) + 1ULL;
+		dptr += l2_len;
+		ucode_cmd[1] = dptr;
+		*cmd1 = vsetq_lane_u16(pkt_len + dlen_adj, *cmd1, 0);
+		/* DLEN passed is excluding L2 HDR */
+		pkt_len -= l2_len;
+	}
+	w0 |= ((((int64_t)nixtx - (int64_t)dptr) & 0xFFFFF) << 32);
+	/* CPT word 0 and 1 */
+	cmd01 = vdupq_n_u64(0);
+	cmd01 = vsetq_lane_u64(w0, cmd01, 0);
+	/* CPT_RES_S is 16B above NIXTX */
+	cmd01 = vsetq_lane_u64(nixtx - 16, cmd01, 1);
+
+	/* Return nixtx addr */
+	*nixtx_addr = nixtx;
+
+	/* CPT Word 4 and Word 7 */
+	tag = sa_base & 0xFFFFUL;
+	sa_base &= ~0xFFFFUL;
+	sa = (uintptr_t)roc_nix_inl_ot_ipsec_outb_sa(sa_base, sess_priv.sa_idx);
+	ucode_cmd[3] = (ROC_CPT_DFLT_ENG_GRP_SE_IE << 61 | 1UL << 60 | sa);
+	ucode_cmd[0] = (ROC_IE_OT_MAJOR_OP_PROCESS_OUTBOUND_IPSEC << 48 | 1UL << 54 |
+			((uint64_t)sess_priv.chksum) << 32 | ((uint64_t)sess_priv.dec_ttl) << 34 |
+			pkt_len);
+
+	/* CPT word 2 and 3 */
+	cmd23 = vdupq_n_u64(0);
+	cmd23 = vsetq_lane_u64((((uint64_t)RTE_EVENT_TYPE_CPU << 28) | tag |
+				CNXK_ETHDEV_SEC_OUTB_EV_SUB << 20), cmd23, 0);
+	cmd23 = vsetq_lane_u64((uintptr_t)m | 1, cmd23, 1);
+
+	/* Move to our line */
+	laddr = LMT_OFF(lbase, *lnum, *loff ? 64 : 0);
+
+	/* Write CPT instruction to lmt line */
+	vst1q_u64(laddr, cmd01);
+	vst1q_u64((laddr + 2), cmd23);
+
+	*(__uint128_t *)(laddr + 4) = *(__uint128_t *)ucode_cmd;
+	*(__uint128_t *)(laddr + 6) = *(__uint128_t *)(ucode_cmd + 2);
+
+	/* Move to next line for every other CPT inst */
+	*loff = !(*loff);
+	*lnum = *lnum + (*loff ? 0 : 1);
+	*shft = *shft + (*loff ? 0 : 3);
+}
 
 static __rte_always_inline void
 cn20k_nix_prep_sec(struct rte_mbuf *m, uint64_t *cmd, uintptr_t *nixtx_addr, uintptr_t lbase,
@@ -545,6 +758,156 @@ cn20k_nix_prefree_seg(struct rte_mbuf *m, struct rte_mbuf **extm, struct cn20k_e
 	}
 }
 
+#if defined(RTE_ARCH_ARM64)
+/* Only called for first segments of single segmented mbufs */
+static __rte_always_inline void
+cn20k_nix_prefree_seg_vec(struct rte_mbuf **mbufs, struct rte_mbuf **extm,
+			  struct cn20k_eth_txq *txq, uint64x2_t *senddesc01_w0,
+			  uint64x2_t *senddesc23_w0, uint64x2_t *senddesc01_w1,
+			  uint64x2_t *senddesc23_w1)
+{
+	struct rte_mbuf **tx_compl_ptr = txq->tx_compl.ptr;
+	uint32_t nb_desc_mask = txq->tx_compl.nb_desc_mask;
+	bool tx_compl_ena = txq->tx_compl.ena;
+	struct rte_mbuf *m0, *m1, *m2, *m3;
+	struct rte_mbuf *cookie;
+	uint64_t w0, w1, aura;
+	uint64_t sqe_id;
+
+	m0 = mbufs[0];
+	m1 = mbufs[1];
+	m2 = mbufs[2];
+	m3 = mbufs[3];
+
+	/* mbuf 0 */
+	w0 = vgetq_lane_u64(*senddesc01_w0, 0);
+	if (RTE_MBUF_HAS_EXTBUF(m0)) {
+		w0 |= BIT_ULL(19);
+		w1 = vgetq_lane_u64(*senddesc01_w1, 0);
+		w1 &= ~0xFFFF000000000000UL;
+		if (unlikely(!tx_compl_ena)) {
+			m0->next = *extm;
+			*extm = m0;
+		} else {
+			sqe_id = rte_atomic_fetch_add_explicit(&txq->tx_compl.sqe_id, 1,
+							       rte_memory_order_relaxed);
+			sqe_id = sqe_id & nb_desc_mask;
+			/* Set PNC */
+			w0 |= BIT_ULL(43);
+			w1 |= sqe_id << 48;
+			tx_compl_ptr[sqe_id] = m0;
+			*senddesc01_w1 = vsetq_lane_u64(w1, *senddesc01_w1, 0);
+		}
+	} else {
+		cookie = RTE_MBUF_DIRECT(m0) ? m0 : rte_mbuf_from_indirect(m0);
+		aura = (w0 >> 20) & 0xFFFFF;
+		w0 &= ~0xFFFFF00000UL;
+		w0 |= cnxk_nix_prefree_seg(m0, &aura) << 19;
+		w0 |= aura << 20;
+
+		if ((w0 & BIT_ULL(19)) == 0)
+			RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+	}
+	*senddesc01_w0 = vsetq_lane_u64(w0, *senddesc01_w0, 0);
+
+	/* mbuf1 */
+	w0 = vgetq_lane_u64(*senddesc01_w0, 1);
+	if (RTE_MBUF_HAS_EXTBUF(m1)) {
+		w0 |= BIT_ULL(19);
+		w1 = vgetq_lane_u64(*senddesc01_w1, 1);
+		w1 &= ~0xFFFF000000000000UL;
+		if (unlikely(!tx_compl_ena)) {
+			m1->next = *extm;
+			*extm = m1;
+		} else {
+			sqe_id = rte_atomic_fetch_add_explicit(&txq->tx_compl.sqe_id, 1,
+							       rte_memory_order_relaxed);
+			sqe_id = sqe_id & nb_desc_mask;
+			/* Set PNC */
+			w0 |= BIT_ULL(43);
+			w1 |= sqe_id << 48;
+			tx_compl_ptr[sqe_id] = m1;
+			*senddesc01_w1 = vsetq_lane_u64(w1, *senddesc01_w1, 1);
+		}
+	} else {
+		cookie = RTE_MBUF_DIRECT(m1) ? m1 : rte_mbuf_from_indirect(m1);
+		aura = (w0 >> 20) & 0xFFFFF;
+		w0 &= ~0xFFFFF00000UL;
+		w0 |= cnxk_nix_prefree_seg(m1, &aura) << 19;
+		w0 |= aura << 20;
+
+		if ((w0 & BIT_ULL(19)) == 0)
+			RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+	}
+	*senddesc01_w0 = vsetq_lane_u64(w0, *senddesc01_w0, 1);
+
+	/* mbuf 2 */
+	w0 = vgetq_lane_u64(*senddesc23_w0, 0);
+	if (RTE_MBUF_HAS_EXTBUF(m2)) {
+		w0 |= BIT_ULL(19);
+		w1 = vgetq_lane_u64(*senddesc23_w1, 0);
+		w1 &= ~0xFFFF000000000000UL;
+		if (unlikely(!tx_compl_ena)) {
+			m2->next = *extm;
+			*extm = m2;
+		} else {
+			sqe_id = rte_atomic_fetch_add_explicit(&txq->tx_compl.sqe_id, 1,
+							       rte_memory_order_relaxed);
+			sqe_id = sqe_id & nb_desc_mask;
+			/* Set PNC */
+			w0 |= BIT_ULL(43);
+			w1 |= sqe_id << 48;
+			tx_compl_ptr[sqe_id] = m2;
+			*senddesc23_w1 = vsetq_lane_u64(w1, *senddesc23_w1, 0);
+		}
+	} else {
+		cookie = RTE_MBUF_DIRECT(m2) ? m2 : rte_mbuf_from_indirect(m2);
+		aura = (w0 >> 20) & 0xFFFFF;
+		w0 &= ~0xFFFFF00000UL;
+		w0 |= cnxk_nix_prefree_seg(m2, &aura) << 19;
+		w0 |= aura << 20;
+
+		if ((w0 & BIT_ULL(19)) == 0)
+			RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+	}
+	*senddesc23_w0 = vsetq_lane_u64(w0, *senddesc23_w0, 0);
+
+	/* mbuf3 */
+	w0 = vgetq_lane_u64(*senddesc23_w0, 1);
+	if (RTE_MBUF_HAS_EXTBUF(m3)) {
+		w0 |= BIT_ULL(19);
+		w1 = vgetq_lane_u64(*senddesc23_w1, 1);
+		w1 &= ~0xFFFF000000000000UL;
+		if (unlikely(!tx_compl_ena)) {
+			m3->next = *extm;
+			*extm = m3;
+		} else {
+			sqe_id = rte_atomic_fetch_add_explicit(&txq->tx_compl.sqe_id, 1,
+							       rte_memory_order_relaxed);
+			sqe_id = sqe_id & nb_desc_mask;
+			/* Set PNC */
+			w0 |= BIT_ULL(43);
+			w1 |= sqe_id << 48;
+			tx_compl_ptr[sqe_id] = m3;
+			*senddesc23_w1 = vsetq_lane_u64(w1, *senddesc23_w1, 1);
+		}
+	} else {
+		cookie = RTE_MBUF_DIRECT(m3) ? m3 : rte_mbuf_from_indirect(m3);
+		aura = (w0 >> 20) & 0xFFFFF;
+		w0 &= ~0xFFFFF00000UL;
+		w0 |= cnxk_nix_prefree_seg(m3, &aura) << 19;
+		w0 |= aura << 20;
+
+		if ((w0 & BIT_ULL(19)) == 0)
+			RTE_MEMPOOL_CHECK_COOKIES(cookie->pool, (void **)&cookie, 1, 0);
+	}
+	*senddesc23_w0 = vsetq_lane_u64(w0, *senddesc23_w0, 1);
+#ifndef RTE_LIBRTE_MEMPOOL_DEBUG
+	RTE_SET_USED(cookie);
+#endif
+}
+#endif
+
 static __rte_always_inline void
 cn20k_nix_xmit_prepare_tso(struct rte_mbuf *m, const uint64_t flags)
 {
@@ -1350,6 +1713,1078 @@ cn20k_nix_xmit_pkts_mseg(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts
 	return pkts;
 }
 
+#if defined(RTE_ARCH_ARM64)
+
+#define NIX_DESCS_PER_LOOP 4
+
+static __rte_always_inline void
+cn20k_nix_lmt_next(uint8_t dw, uintptr_t laddr, uint8_t *lnum, uint8_t *loff, uint8_t *shift,
+		   __uint128_t *data128, uintptr_t *next)
+{
+	/* Go to next line if we are out of space */
+	if ((*loff + (dw << 4)) > 128) {
+		*data128 = *data128 | (((__uint128_t)((*loff >> 4) - 1)) << *shift);
+		*shift = *shift + 3;
+		*loff = 0;
+		*lnum = *lnum + 1;
+	}
+
+	*next = (uintptr_t)LMT_OFF(laddr, *lnum, *loff);
+	*loff = *loff + (dw << 4);
+}
+
+static __rte_always_inline void
+cn20k_nix_xmit_store(struct cn20k_eth_txq *txq, struct rte_mbuf *mbuf, struct rte_mbuf **extm,
+		     uint8_t segdw, uintptr_t laddr, uint64x2_t cmd0, uint64x2_t cmd1,
+		     uint64x2_t cmd2, uint64x2_t cmd3, const uint16_t flags)
+{
+	RTE_SET_USED(txq);
+	RTE_SET_USED(mbuf);
+	RTE_SET_USED(extm);
+	RTE_SET_USED(segdw);
+
+	if (flags & NIX_TX_NEED_EXT_HDR) {
+		/* Store the prepared send desc to LMT lines */
+		if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+			vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
+			vst1q_u64(LMT_OFF(laddr, 0, 16), cmd2);
+			vst1q_u64(LMT_OFF(laddr, 0, 32), cmd1);
+			vst1q_u64(LMT_OFF(laddr, 0, 48), cmd3);
+		} else {
+			vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
+			vst1q_u64(LMT_OFF(laddr, 0, 16), cmd2);
+			vst1q_u64(LMT_OFF(laddr, 0, 32), cmd1);
+		}
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf->pool, (void **)&mbuf, 1, 0);
+	} else {
+		/* Store the prepared send desc to LMT lines */
+		vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
+		vst1q_u64(LMT_OFF(laddr, 0, 16), cmd1);
+		RTE_MEMPOOL_CHECK_COOKIES(mbuf->pool, (void **)&mbuf, 1, 0);
+	}
+}
+
+static __rte_always_inline uint16_t
+cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, uint16_t pkts,
+			   uint64_t *cmd, const uint16_t flags)
+{
+	uint64x2_t dataoff_iova0, dataoff_iova1, dataoff_iova2, dataoff_iova3;
+	uint64x2_t len_olflags0, len_olflags1, len_olflags2, len_olflags3;
+	uint64x2_t cmd0[NIX_DESCS_PER_LOOP], cmd1[NIX_DESCS_PER_LOOP], cmd2[NIX_DESCS_PER_LOOP],
+		cmd3[NIX_DESCS_PER_LOOP];
+	uint16_t left, scalar, burst, i, lmt_id, c_lmt_id;
+	uint64_t *mbuf0, *mbuf1, *mbuf2, *mbuf3, pa;
+	uint64x2_t senddesc01_w0, senddesc23_w0;
+	uint64x2_t senddesc01_w1, senddesc23_w1;
+	uint64x2_t sendext01_w0, sendext23_w0;
+	uint64x2_t sendext01_w1, sendext23_w1;
+	uint64x2_t sendmem01_w0, sendmem23_w0;
+	uint64x2_t sendmem01_w1, sendmem23_w1;
+	uint8_t segdw[NIX_DESCS_PER_LOOP + 1];
+	uint64x2_t sgdesc01_w0, sgdesc23_w0;
+	uint64x2_t sgdesc01_w1, sgdesc23_w1;
+	struct cn20k_eth_txq *txq = tx_queue;
+	rte_iova_t io_addr = txq->io_addr;
+	uint8_t lnum, shift = 0, loff = 0;
+	uintptr_t laddr = txq->lmt_base;
+	uint8_t c_lnum, c_shft, c_loff;
+	uint64x2_t ltypes01, ltypes23;
+	uint64x2_t xtmp128, ytmp128;
+	uint64x2_t xmask01, xmask23;
+	uintptr_t c_laddr = laddr;
+	rte_iova_t c_io_addr;
+	uint64_t sa_base;
+	union wdata {
+		__uint128_t data128;
+		uint64_t data[2];
+	} wd;
+	struct rte_mbuf *extm = NULL;
+
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && txq->tx_compl.ena)
+		handle_tx_completion_pkts(txq, flags & NIX_TX_VWQE_F);
+
+	if (!(flags & NIX_TX_VWQE_F)) {
+		scalar = pkts & (NIX_DESCS_PER_LOOP - 1);
+		pkts = RTE_ALIGN_FLOOR(pkts, NIX_DESCS_PER_LOOP);
+		NIX_XMIT_FC_CHECK_RETURN(txq, pkts);
+	} else {
+		scalar = pkts & (NIX_DESCS_PER_LOOP - 1);
+		pkts = RTE_ALIGN_FLOOR(pkts, NIX_DESCS_PER_LOOP);
+	}
+
+	if (!(flags & NIX_TX_VWQE_F)) {
+		senddesc01_w0 = vld1q_dup_u64(&txq->send_hdr_w0);
+	} else {
+		uint64_t w0 = (txq->send_hdr_w0 & 0xFFFFF00000000000) |
+			      ((uint64_t)(cn20k_nix_tx_ext_subs(flags) + 1) << 40);
+
+		senddesc01_w0 = vdupq_n_u64(w0);
+	}
+	senddesc23_w0 = senddesc01_w0;
+
+	senddesc01_w1 = vdupq_n_u64(0);
+	senddesc23_w1 = senddesc01_w1;
+	if (!(flags & NIX_TX_OFFLOAD_MBUF_NOFF_F))
+		sgdesc01_w0 = vdupq_n_u64((NIX_SUBDC_SG << 60) | (NIX_SENDLDTYPE_LDWB << 58) |
+					  BIT_ULL(48));
+	else
+		sgdesc01_w0 = vdupq_n_u64((NIX_SUBDC_SG << 60) | BIT_ULL(48));
+	sgdesc23_w0 = sgdesc01_w0;
+
+	if (flags & NIX_TX_NEED_EXT_HDR) {
+		if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+			sendext01_w0 = vdupq_n_u64((NIX_SUBDC_EXT << 60) | BIT_ULL(15));
+			sendmem01_w0 = vdupq_n_u64((NIX_SUBDC_MEM << 60) |
+						   (NIX_SENDMEMALG_SETTSTMP << 56));
+			sendmem23_w0 = sendmem01_w0;
+			sendmem01_w1 = vdupq_n_u64(txq->ts_mem);
+			sendmem23_w1 = sendmem01_w1;
+		} else {
+			sendext01_w0 = vdupq_n_u64((NIX_SUBDC_EXT << 60));
+		}
+		sendext23_w0 = sendext01_w0;
+
+		if (flags & NIX_TX_OFFLOAD_VLAN_QINQ_F)
+			sendext01_w1 = vdupq_n_u64(12 | 12U << 24);
+		else
+			sendext01_w1 = vdupq_n_u64(0);
+		sendext23_w1 = sendext01_w1;
+	}
+
+	/* Get LMT base address and LMT ID as lcore id */
+	ROC_LMT_BASE_ID_GET(laddr, lmt_id);
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		ROC_LMT_CPT_BASE_ID_GET(c_laddr, c_lmt_id);
+		c_io_addr = txq->cpt_io_addr;
+		sa_base = txq->sa_base;
+	}
+
+	left = pkts;
+again:
+	/* Number of packets to prepare depends on offloads enabled. */
+	burst = left > cn20k_nix_pkts_per_vec_brst(flags) ? cn20k_nix_pkts_per_vec_brst(flags) :
+							    left;
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		wd.data128 = 0;
+		shift = 16;
+	}
+	lnum = 0;
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		loff = 0;
+		c_loff = 0;
+		c_lnum = 0;
+		c_shft = 16;
+	}
+
+	for (i = 0; i < burst; i += NIX_DESCS_PER_LOOP) {
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F &&
+		    (((int)((16 - c_lnum) << 1) - c_loff) < 4)) {
+			burst = i;
+			break;
+		}
+
+		/* Clear lower 32bit of SEND_HDR_W0 and SEND_SG_W0 */
+		senddesc01_w0 = vbicq_u64(senddesc01_w0, vdupq_n_u64(0x800FFFFFFFF));
+		sgdesc01_w0 = vbicq_u64(sgdesc01_w0, vdupq_n_u64(0xFFFFFFFF));
+
+		senddesc23_w0 = senddesc01_w0;
+		sgdesc23_w0 = sgdesc01_w0;
+
+		/* Clear vlan enables. */
+		if (flags & NIX_TX_NEED_EXT_HDR) {
+			sendext01_w1 = vbicq_u64(sendext01_w1, vdupq_n_u64(0x3FFFF00FFFF00));
+			sendext23_w1 = sendext01_w1;
+		}
+
+		if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+			/* Reset send mem alg to SETTSTMP from SUB*/
+			sendmem01_w0 = vbicq_u64(sendmem01_w0, vdupq_n_u64(BIT_ULL(59)));
+			/* Reset send mem address to default. */
+			sendmem01_w1 = vbicq_u64(sendmem01_w1, vdupq_n_u64(0xF));
+			sendmem23_w0 = sendmem01_w0;
+			sendmem23_w1 = sendmem01_w1;
+		}
+
+		/* Move mbufs to iova */
+		mbuf0 = (uint64_t *)tx_pkts[0];
+		mbuf1 = (uint64_t *)tx_pkts[1];
+		mbuf2 = (uint64_t *)tx_pkts[2];
+		mbuf3 = (uint64_t *)tx_pkts[3];
+
+		/*
+		 * Get mbuf's, olflags, iova, pktlen, dataoff
+		 * dataoff_iovaX.D[0] = iova,
+		 * dataoff_iovaX.D[1](15:0) = mbuf->dataoff
+		 * len_olflagsX.D[0] = ol_flags,
+		 * len_olflagsX.D[1](63:32) = mbuf->pkt_len
+		 */
+		dataoff_iova0 =
+			vsetq_lane_u64(((struct rte_mbuf *)mbuf0)->data_off, vld1q_u64(mbuf0), 1);
+		len_olflags0 = vld1q_u64(mbuf0 + 3);
+		dataoff_iova1 =
+			vsetq_lane_u64(((struct rte_mbuf *)mbuf1)->data_off, vld1q_u64(mbuf1), 1);
+		len_olflags1 = vld1q_u64(mbuf1 + 3);
+		dataoff_iova2 =
+			vsetq_lane_u64(((struct rte_mbuf *)mbuf2)->data_off, vld1q_u64(mbuf2), 1);
+		len_olflags2 = vld1q_u64(mbuf2 + 3);
+		dataoff_iova3 =
+			vsetq_lane_u64(((struct rte_mbuf *)mbuf3)->data_off, vld1q_u64(mbuf3), 1);
+		len_olflags3 = vld1q_u64(mbuf3 + 3);
+
+		/* Move mbufs to point pool */
+		mbuf0 = (uint64_t *)((uintptr_t)mbuf0 + offsetof(struct rte_mbuf, pool));
+		mbuf1 = (uint64_t *)((uintptr_t)mbuf1 + offsetof(struct rte_mbuf, pool));
+		mbuf2 = (uint64_t *)((uintptr_t)mbuf2 + offsetof(struct rte_mbuf, pool));
+		mbuf3 = (uint64_t *)((uintptr_t)mbuf3 + offsetof(struct rte_mbuf, pool));
+
+		if (flags & (NIX_TX_OFFLOAD_OL3_OL4_CSUM_F | NIX_TX_OFFLOAD_L3_L4_CSUM_F)) {
+			/* Get tx_offload for ol2, ol3, l2, l3 lengths */
+			/*
+			 * E(8):OL2_LEN(7):OL3_LEN(9):E(24):L3_LEN(9):L2_LEN(7)
+			 * E(8):OL2_LEN(7):OL3_LEN(9):E(24):L3_LEN(9):L2_LEN(7)
+			 */
+
+			asm volatile("LD1 {%[a].D}[0],[%[in]]\n\t"
+				     : [a] "+w"(senddesc01_w1)
+				     : [in] "r"(mbuf0 + 2)
+				     : "memory");
+
+			asm volatile("LD1 {%[a].D}[1],[%[in]]\n\t"
+				     : [a] "+w"(senddesc01_w1)
+				     : [in] "r"(mbuf1 + 2)
+				     : "memory");
+
+			asm volatile("LD1 {%[b].D}[0],[%[in]]\n\t"
+				     : [b] "+w"(senddesc23_w1)
+				     : [in] "r"(mbuf2 + 2)
+				     : "memory");
+
+			asm volatile("LD1 {%[b].D}[1],[%[in]]\n\t"
+				     : [b] "+w"(senddesc23_w1)
+				     : [in] "r"(mbuf3 + 2)
+				     : "memory");
+
+			/* Get pool pointer alone */
+			mbuf0 = (uint64_t *)*mbuf0;
+			mbuf1 = (uint64_t *)*mbuf1;
+			mbuf2 = (uint64_t *)*mbuf2;
+			mbuf3 = (uint64_t *)*mbuf3;
+		} else {
+			/* Get pool pointer alone */
+			mbuf0 = (uint64_t *)*mbuf0;
+			mbuf1 = (uint64_t *)*mbuf1;
+			mbuf2 = (uint64_t *)*mbuf2;
+			mbuf3 = (uint64_t *)*mbuf3;
+		}
+
+		const uint8x16_t shuf_mask2 = {
+			0x4, 0x5, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+			0xc, 0xd, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+		};
+		xtmp128 = vzip2q_u64(len_olflags0, len_olflags1);
+		ytmp128 = vzip2q_u64(len_olflags2, len_olflags3);
+
+		/*
+		 * Pick only 16 bits of pktlen preset at bits 63:32
+		 * and place them at bits 15:0.
+		 */
+		xtmp128 = vqtbl1q_u8(xtmp128, shuf_mask2);
+		ytmp128 = vqtbl1q_u8(ytmp128, shuf_mask2);
+
+		/* Add pairwise to get dataoff + iova in sgdesc_w1 */
+		sgdesc01_w1 = vpaddq_u64(dataoff_iova0, dataoff_iova1);
+		sgdesc23_w1 = vpaddq_u64(dataoff_iova2, dataoff_iova3);
+
+		/* Orr both sgdesc_w0 and senddesc_w0 with 16 bits of
+		 * pktlen at 15:0 position.
+		 */
+		sgdesc01_w0 = vorrq_u64(sgdesc01_w0, xtmp128);
+		sgdesc23_w0 = vorrq_u64(sgdesc23_w0, ytmp128);
+		senddesc01_w0 = vorrq_u64(senddesc01_w0, xtmp128);
+		senddesc23_w0 = vorrq_u64(senddesc23_w0, ytmp128);
+
+		/* Move mbuf to point to pool_id. */
+		mbuf0 = (uint64_t *)((uintptr_t)mbuf0 + offsetof(struct rte_mempool, pool_id));
+		mbuf1 = (uint64_t *)((uintptr_t)mbuf1 + offsetof(struct rte_mempool, pool_id));
+		mbuf2 = (uint64_t *)((uintptr_t)mbuf2 + offsetof(struct rte_mempool, pool_id));
+		mbuf3 = (uint64_t *)((uintptr_t)mbuf3 + offsetof(struct rte_mempool, pool_id));
+
+		if ((flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F) &&
+		    !(flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F)) {
+			/*
+			 * Lookup table to translate ol_flags to
+			 * il3/il4 types. But we still use ol3/ol4 types in
+			 * senddesc_w1 as only one header processing is enabled.
+			 */
+			const uint8x16_t tbl = {
+				/* [0-15] = il4type:il3type */
+				0x04, /* none (IPv6 assumed) */
+				0x14, /* RTE_MBUF_F_TX_TCP_CKSUM (IPv6 assumed) */
+				0x24, /* RTE_MBUF_F_TX_SCTP_CKSUM (IPv6 assumed) */
+				0x34, /* RTE_MBUF_F_TX_UDP_CKSUM (IPv6 assumed) */
+				0x03, /* RTE_MBUF_F_TX_IP_CKSUM */
+				0x13, /* RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_TCP_CKSUM */
+				0x23, /* RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_SCTP_CKSUM */
+				0x33, /* RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_UDP_CKSUM */
+				0x02, /* RTE_MBUF_F_TX_IPV4  */
+				0x12, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_TCP_CKSUM */
+				0x22, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_SCTP_CKSUM */
+				0x32, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_UDP_CKSUM */
+				0x03, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM */
+				0x13, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM |
+				       * RTE_MBUF_F_TX_TCP_CKSUM
+				       */
+				0x23, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM |
+				       * RTE_MBUF_F_TX_SCTP_CKSUM
+				       */
+				0x33, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM |
+				       * RTE_MBUF_F_TX_UDP_CKSUM
+				       */
+			};
+
+			/* Extract olflags to translate to iltypes */
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			/*
+			 * E(47):L3_LEN(9):L2_LEN(7+z)
+			 * E(47):L3_LEN(9):L2_LEN(7+z)
+			 */
+			senddesc01_w1 = vshlq_n_u64(senddesc01_w1, 1);
+			senddesc23_w1 = vshlq_n_u64(senddesc23_w1, 1);
+
+			/* Move OLFLAGS bits 55:52 to 51:48
+			 * with zeros preprended on the byte and rest
+			 * don't care
+			 */
+			xtmp128 = vshrq_n_u8(xtmp128, 4);
+			ytmp128 = vshrq_n_u8(ytmp128, 4);
+			/*
+			 * E(48):L3_LEN(8):L2_LEN(z+7)
+			 * E(48):L3_LEN(8):L2_LEN(z+7)
+			 */
+			const int8x16_t tshft3 = {
+				-1, 0, 8, 8, 8, 8, 8, 8,
+				-1, 0, 8, 8, 8, 8, 8, 8,
+			};
+
+			senddesc01_w1 = vshlq_u8(senddesc01_w1, tshft3);
+			senddesc23_w1 = vshlq_u8(senddesc23_w1, tshft3);
+
+			/* Do the lookup */
+			ltypes01 = vqtbl1q_u8(tbl, xtmp128);
+			ltypes23 = vqtbl1q_u8(tbl, ytmp128);
+
+			/* Pick only relevant fields i.e Bit 48:55 of iltype
+			 * and place it in ol3/ol4type of senddesc_w1
+			 */
+			const uint8x16_t shuf_mask0 = {
+				0xFF, 0xFF, 0xFF, 0xFF, 0x6, 0xFF, 0xFF, 0xFF,
+				0xFF, 0xFF, 0xFF, 0xFF, 0xE, 0xFF, 0xFF, 0xFF,
+			};
+
+			ltypes01 = vqtbl1q_u8(ltypes01, shuf_mask0);
+			ltypes23 = vqtbl1q_u8(ltypes23, shuf_mask0);
+
+			/* Prepare ol4ptr, ol3ptr from ol3len, ol2len.
+			 * a [E(32):E(16):OL3(8):OL2(8)]
+			 * a = a + (a << 8)
+			 * a [E(32):E(16):(OL3+OL2):OL2]
+			 * => E(32):E(16)::OL4PTR(8):OL3PTR(8)
+			 */
+			senddesc01_w1 = vaddq_u8(senddesc01_w1, vshlq_n_u16(senddesc01_w1, 8));
+			senddesc23_w1 = vaddq_u8(senddesc23_w1, vshlq_n_u16(senddesc23_w1, 8));
+
+			/* Move ltypes to senddesc*_w1 */
+			senddesc01_w1 = vorrq_u64(senddesc01_w1, ltypes01);
+			senddesc23_w1 = vorrq_u64(senddesc23_w1, ltypes23);
+		} else if (!(flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F) &&
+			   (flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F)) {
+			/*
+			 * Lookup table to translate ol_flags to
+			 * ol3/ol4 types.
+			 */
+
+			const uint8x16_t tbl = {
+				/* [0-15] = ol4type:ol3type */
+				0x00, /* none */
+				0x03, /* OUTER_IP_CKSUM */
+				0x02, /* OUTER_IPV4 */
+				0x03, /* OUTER_IPV4 | OUTER_IP_CKSUM */
+				0x04, /* OUTER_IPV6 */
+				0x00, /* OUTER_IPV6 | OUTER_IP_CKSUM */
+				0x00, /* OUTER_IPV6 | OUTER_IPV4 */
+				0x00, /* OUTER_IPV6 | OUTER_IPV4 |
+				       * OUTER_IP_CKSUM
+				       */
+				0x00, /* OUTER_UDP_CKSUM */
+				0x33, /* OUTER_UDP_CKSUM | OUTER_IP_CKSUM */
+				0x32, /* OUTER_UDP_CKSUM | OUTER_IPV4 */
+				0x33, /* OUTER_UDP_CKSUM | OUTER_IPV4 |
+				       * OUTER_IP_CKSUM
+				       */
+				0x34, /* OUTER_UDP_CKSUM | OUTER_IPV6 */
+				0x00, /* OUTER_UDP_CKSUM | OUTER_IPV6 |
+				       * OUTER_IP_CKSUM
+				       */
+				0x00, /* OUTER_UDP_CKSUM | OUTER_IPV6 |
+				       * OUTER_IPV4
+				       */
+				0x00, /* OUTER_UDP_CKSUM | OUTER_IPV6 |
+				       * OUTER_IPV4 | OUTER_IP_CKSUM
+				       */
+			};
+
+			/* Extract olflags to translate to iltypes */
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			/*
+			 * E(47):OL3_LEN(9):OL2_LEN(7+z)
+			 * E(47):OL3_LEN(9):OL2_LEN(7+z)
+			 */
+			const uint8x16_t shuf_mask5 = {
+				0x6, 0x5, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+				0xE, 0xD, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+			};
+			senddesc01_w1 = vqtbl1q_u8(senddesc01_w1, shuf_mask5);
+			senddesc23_w1 = vqtbl1q_u8(senddesc23_w1, shuf_mask5);
+
+			/* Extract outer ol flags only */
+			const uint64x2_t o_cksum_mask = {
+				0x1C00020000000000,
+				0x1C00020000000000,
+			};
+
+			xtmp128 = vandq_u64(xtmp128, o_cksum_mask);
+			ytmp128 = vandq_u64(ytmp128, o_cksum_mask);
+
+			/* Extract OUTER_UDP_CKSUM bit 41 and
+			 * move it to bit 61
+			 */
+
+			xtmp128 = xtmp128 | vshlq_n_u64(xtmp128, 20);
+			ytmp128 = ytmp128 | vshlq_n_u64(ytmp128, 20);
+
+			/* Shift oltype by 2 to start nibble from BIT(56)
+			 * instead of BIT(58)
+			 */
+			xtmp128 = vshrq_n_u8(xtmp128, 2);
+			ytmp128 = vshrq_n_u8(ytmp128, 2);
+			/*
+			 * E(48):L3_LEN(8):L2_LEN(z+7)
+			 * E(48):L3_LEN(8):L2_LEN(z+7)
+			 */
+			const int8x16_t tshft3 = {
+				-1, 0, 8, 8, 8, 8, 8, 8,
+				-1, 0, 8, 8, 8, 8, 8, 8,
+			};
+
+			senddesc01_w1 = vshlq_u8(senddesc01_w1, tshft3);
+			senddesc23_w1 = vshlq_u8(senddesc23_w1, tshft3);
+
+			/* Do the lookup */
+			ltypes01 = vqtbl1q_u8(tbl, xtmp128);
+			ltypes23 = vqtbl1q_u8(tbl, ytmp128);
+
+			/* Pick only relevant fields i.e Bit 56:63 of oltype
+			 * and place it in ol3/ol4type of senddesc_w1
+			 */
+			const uint8x16_t shuf_mask0 = {
+				0xFF, 0xFF, 0xFF, 0xFF, 0x7, 0xFF, 0xFF, 0xFF,
+				0xFF, 0xFF, 0xFF, 0xFF, 0xF, 0xFF, 0xFF, 0xFF,
+			};
+
+			ltypes01 = vqtbl1q_u8(ltypes01, shuf_mask0);
+			ltypes23 = vqtbl1q_u8(ltypes23, shuf_mask0);
+
+			/* Prepare ol4ptr, ol3ptr from ol3len, ol2len.
+			 * a [E(32):E(16):OL3(8):OL2(8)]
+			 * a = a + (a << 8)
+			 * a [E(32):E(16):(OL3+OL2):OL2]
+			 * => E(32):E(16)::OL4PTR(8):OL3PTR(8)
+			 */
+			senddesc01_w1 = vaddq_u8(senddesc01_w1, vshlq_n_u16(senddesc01_w1, 8));
+			senddesc23_w1 = vaddq_u8(senddesc23_w1, vshlq_n_u16(senddesc23_w1, 8));
+
+			/* Move ltypes to senddesc*_w1 */
+			senddesc01_w1 = vorrq_u64(senddesc01_w1, ltypes01);
+			senddesc23_w1 = vorrq_u64(senddesc23_w1, ltypes23);
+		} else if ((flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F) &&
+			   (flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F)) {
+			/* Lookup table to translate ol_flags to
+			 * ol4type, ol3type, il4type, il3type of senddesc_w1
+			 */
+			const uint8x16x2_t tbl = {{
+				{
+					/* [0-15] = il4type:il3type */
+					0x04, /* none (IPv6) */
+					0x14, /* RTE_MBUF_F_TX_TCP_CKSUM (IPv6) */
+					0x24, /* RTE_MBUF_F_TX_SCTP_CKSUM (IPv6) */
+					0x34, /* RTE_MBUF_F_TX_UDP_CKSUM (IPv6) */
+					0x03, /* RTE_MBUF_F_TX_IP_CKSUM */
+					0x13, /* RTE_MBUF_F_TX_IP_CKSUM |
+					       * RTE_MBUF_F_TX_TCP_CKSUM
+					       */
+					0x23, /* RTE_MBUF_F_TX_IP_CKSUM |
+					       * RTE_MBUF_F_TX_SCTP_CKSUM
+					       */
+					0x33, /* RTE_MBUF_F_TX_IP_CKSUM |
+					       * RTE_MBUF_F_TX_UDP_CKSUM
+					       */
+					0x02, /* RTE_MBUF_F_TX_IPV4 */
+					0x12, /* RTE_MBUF_F_TX_IPV4 |
+					       * RTE_MBUF_F_TX_TCP_CKSUM
+					       */
+					0x22, /* RTE_MBUF_F_TX_IPV4 |
+					       * RTE_MBUF_F_TX_SCTP_CKSUM
+					       */
+					0x32, /* RTE_MBUF_F_TX_IPV4 |
+					       * RTE_MBUF_F_TX_UDP_CKSUM
+					       */
+					0x03, /* RTE_MBUF_F_TX_IPV4 |
+					       * RTE_MBUF_F_TX_IP_CKSUM
+					       */
+					0x13, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM |
+					       * RTE_MBUF_F_TX_TCP_CKSUM
+					       */
+					0x23, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM |
+					       * RTE_MBUF_F_TX_SCTP_CKSUM
+					       */
+					0x33, /* RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM |
+					       * RTE_MBUF_F_TX_UDP_CKSUM
+					       */
+				},
+
+				{
+					/* [16-31] = ol4type:ol3type */
+					0x00, /* none */
+					0x03, /* OUTER_IP_CKSUM */
+					0x02, /* OUTER_IPV4 */
+					0x03, /* OUTER_IPV4 | OUTER_IP_CKSUM */
+					0x04, /* OUTER_IPV6 */
+					0x00, /* OUTER_IPV6 | OUTER_IP_CKSUM */
+					0x00, /* OUTER_IPV6 | OUTER_IPV4 */
+					0x00, /* OUTER_IPV6 | OUTER_IPV4 |
+					       * OUTER_IP_CKSUM
+					       */
+					0x00, /* OUTER_UDP_CKSUM */
+					0x33, /* OUTER_UDP_CKSUM |
+					       * OUTER_IP_CKSUM
+					       */
+					0x32, /* OUTER_UDP_CKSUM |
+					       * OUTER_IPV4
+					       */
+					0x33, /* OUTER_UDP_CKSUM |
+					       * OUTER_IPV4 | OUTER_IP_CKSUM
+					       */
+					0x34, /* OUTER_UDP_CKSUM |
+					       * OUTER_IPV6
+					       */
+					0x00, /* OUTER_UDP_CKSUM | OUTER_IPV6 |
+					       * OUTER_IP_CKSUM
+					       */
+					0x00, /* OUTER_UDP_CKSUM | OUTER_IPV6 |
+					       * OUTER_IPV4
+					       */
+					0x00, /* OUTER_UDP_CKSUM | OUTER_IPV6 |
+					       * OUTER_IPV4 | OUTER_IP_CKSUM
+					       */
+				},
+			}};
+
+			/* Extract olflags to translate to oltype & iltype */
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			/*
+			 * E(8):OL2_LN(7):OL3_LN(9):E(23):L3_LN(9):L2_LN(7+z)
+			 * E(8):OL2_LN(7):OL3_LN(9):E(23):L3_LN(9):L2_LN(7+z)
+			 */
+			const uint32x4_t tshft_4 = {
+				1,
+				0,
+				1,
+				0,
+			};
+			senddesc01_w1 = vshlq_u32(senddesc01_w1, tshft_4);
+			senddesc23_w1 = vshlq_u32(senddesc23_w1, tshft_4);
+
+			/*
+			 * E(32):L3_LEN(8):L2_LEN(7+Z):OL3_LEN(8):OL2_LEN(7+Z)
+			 * E(32):L3_LEN(8):L2_LEN(7+Z):OL3_LEN(8):OL2_LEN(7+Z)
+			 */
+			const uint8x16_t shuf_mask5 = {
+				0x6, 0x5, 0x0, 0x1, 0xFF, 0xFF, 0xFF, 0xFF,
+				0xE, 0xD, 0x8, 0x9, 0xFF, 0xFF, 0xFF, 0xFF,
+			};
+			senddesc01_w1 = vqtbl1q_u8(senddesc01_w1, shuf_mask5);
+			senddesc23_w1 = vqtbl1q_u8(senddesc23_w1, shuf_mask5);
+
+			/* Extract outer and inner header ol_flags */
+			const uint64x2_t oi_cksum_mask = {
+				0x1CF0020000000000,
+				0x1CF0020000000000,
+			};
+
+			xtmp128 = vandq_u64(xtmp128, oi_cksum_mask);
+			ytmp128 = vandq_u64(ytmp128, oi_cksum_mask);
+
+			/* Extract OUTER_UDP_CKSUM bit 41 and
+			 * move it to bit 61
+			 */
+
+			xtmp128 = xtmp128 | vshlq_n_u64(xtmp128, 20);
+			ytmp128 = ytmp128 | vshlq_n_u64(ytmp128, 20);
+
+			/* Shift right oltype by 2 and iltype by 4
+			 * to start oltype nibble from BIT(58)
+			 * instead of BIT(56) and iltype nibble from BIT(48)
+			 * instead of BIT(52).
+			 */
+			const int8x16_t tshft5 = {
+				8, 8, 8, 8, 8, 8, -4, -2,
+				8, 8, 8, 8, 8, 8, -4, -2,
+			};
+
+			xtmp128 = vshlq_u8(xtmp128, tshft5);
+			ytmp128 = vshlq_u8(ytmp128, tshft5);
+			/*
+			 * E(32):L3_LEN(8):L2_LEN(8):OL3_LEN(8):OL2_LEN(8)
+			 * E(32):L3_LEN(8):L2_LEN(8):OL3_LEN(8):OL2_LEN(8)
+			 */
+			const int8x16_t tshft3 = {
+				-1, 0, -1, 0, 0, 0, 0, 0,
+				-1, 0, -1, 0, 0, 0, 0, 0,
+			};
+
+			senddesc01_w1 = vshlq_u8(senddesc01_w1, tshft3);
+			senddesc23_w1 = vshlq_u8(senddesc23_w1, tshft3);
+
+			/* Mark Bit(4) of oltype */
+			const uint64x2_t oi_cksum_mask2 = {
+				0x1000000000000000,
+				0x1000000000000000,
+			};
+
+			xtmp128 = vorrq_u64(xtmp128, oi_cksum_mask2);
+			ytmp128 = vorrq_u64(ytmp128, oi_cksum_mask2);
+
+			/* Do the lookup */
+			ltypes01 = vqtbl2q_u8(tbl, xtmp128);
+			ltypes23 = vqtbl2q_u8(tbl, ytmp128);
+
+			/* Pick only relevant fields i.e Bit 48:55 of iltype and
+			 * Bit 56:63 of oltype and place it in corresponding
+			 * place in senddesc_w1.
+			 */
+			const uint8x16_t shuf_mask0 = {
+				0xFF, 0xFF, 0xFF, 0xFF, 0x7, 0x6, 0xFF, 0xFF,
+				0xFF, 0xFF, 0xFF, 0xFF, 0xF, 0xE, 0xFF, 0xFF,
+			};
+
+			ltypes01 = vqtbl1q_u8(ltypes01, shuf_mask0);
+			ltypes23 = vqtbl1q_u8(ltypes23, shuf_mask0);
+
+			/* Prepare l4ptr, l3ptr, ol4ptr, ol3ptr from
+			 * l3len, l2len, ol3len, ol2len.
+			 * a [E(32):L3(8):L2(8):OL3(8):OL2(8)]
+			 * a = a + (a << 8)
+			 * a [E:(L3+L2):(L2+OL3):(OL3+OL2):OL2]
+			 * a = a + (a << 16)
+			 * a [E:(L3+L2+OL3+OL2):(L2+OL3+OL2):(OL3+OL2):OL2]
+			 * => E(32):IL4PTR(8):IL3PTR(8):OL4PTR(8):OL3PTR(8)
+			 */
+			senddesc01_w1 = vaddq_u8(senddesc01_w1, vshlq_n_u32(senddesc01_w1, 8));
+			senddesc23_w1 = vaddq_u8(senddesc23_w1, vshlq_n_u32(senddesc23_w1, 8));
+
+			/* Continue preparing l4ptr, l3ptr, ol4ptr, ol3ptr */
+			senddesc01_w1 = vaddq_u8(senddesc01_w1, vshlq_n_u32(senddesc01_w1, 16));
+			senddesc23_w1 = vaddq_u8(senddesc23_w1, vshlq_n_u32(senddesc23_w1, 16));
+
+			/* Move ltypes to senddesc*_w1 */
+			senddesc01_w1 = vorrq_u64(senddesc01_w1, ltypes01);
+			senddesc23_w1 = vorrq_u64(senddesc23_w1, ltypes23);
+		}
+
+		xmask01 = vdupq_n_u64(0);
+		xmask23 = xmask01;
+		asm volatile("LD1 {%[a].H}[0],[%[in]]\n\t"
+			     : [a] "+w"(xmask01)
+			     : [in] "r"(mbuf0)
+			     : "memory");
+
+		asm volatile("LD1 {%[a].H}[4],[%[in]]\n\t"
+			     : [a] "+w"(xmask01)
+			     : [in] "r"(mbuf1)
+			     : "memory");
+
+		asm volatile("LD1 {%[b].H}[0],[%[in]]\n\t"
+			     : [b] "+w"(xmask23)
+			     : [in] "r"(mbuf2)
+			     : "memory");
+
+		asm volatile("LD1 {%[b].H}[4],[%[in]]\n\t"
+			     : [b] "+w"(xmask23)
+			     : [in] "r"(mbuf3)
+			     : "memory");
+		xmask01 = vshlq_n_u64(xmask01, 20);
+		xmask23 = vshlq_n_u64(xmask23, 20);
+
+		senddesc01_w0 = vorrq_u64(senddesc01_w0, xmask01);
+		senddesc23_w0 = vorrq_u64(senddesc23_w0, xmask23);
+
+		if (flags & NIX_TX_OFFLOAD_VLAN_QINQ_F) {
+			/* Tx ol_flag for vlan. */
+			const uint64x2_t olv = {RTE_MBUF_F_TX_VLAN, RTE_MBUF_F_TX_VLAN};
+			/* Bit enable for VLAN1 */
+			const uint64x2_t mlv = {BIT_ULL(49), BIT_ULL(49)};
+			/* Tx ol_flag for QnQ. */
+			const uint64x2_t olq = {RTE_MBUF_F_TX_QINQ, RTE_MBUF_F_TX_QINQ};
+			/* Bit enable for VLAN0 */
+			const uint64x2_t mlq = {BIT_ULL(48), BIT_ULL(48)};
+			/* Load vlan values from packet. outer is VLAN 0 */
+			uint64x2_t ext01 = {
+				((uint32_t)tx_pkts[0]->vlan_tci_outer) << 8 |
+					((uint64_t)tx_pkts[0]->vlan_tci) << 32,
+				((uint32_t)tx_pkts[1]->vlan_tci_outer) << 8 |
+					((uint64_t)tx_pkts[1]->vlan_tci) << 32,
+			};
+			uint64x2_t ext23 = {
+				((uint32_t)tx_pkts[2]->vlan_tci_outer) << 8 |
+					((uint64_t)tx_pkts[2]->vlan_tci) << 32,
+				((uint32_t)tx_pkts[3]->vlan_tci_outer) << 8 |
+					((uint64_t)tx_pkts[3]->vlan_tci) << 32,
+			};
+
+			/* Get ol_flags of the packets. */
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			/* ORR vlan outer/inner values into cmd. */
+			sendext01_w1 = vorrq_u64(sendext01_w1, ext01);
+			sendext23_w1 = vorrq_u64(sendext23_w1, ext23);
+
+			/* Test for offload enable bits and generate masks. */
+			xtmp128 = vorrq_u64(vandq_u64(vtstq_u64(xtmp128, olv), mlv),
+					    vandq_u64(vtstq_u64(xtmp128, olq), mlq));
+			ytmp128 = vorrq_u64(vandq_u64(vtstq_u64(ytmp128, olv), mlv),
+					    vandq_u64(vtstq_u64(ytmp128, olq), mlq));
+
+			/* Set vlan enable bits into cmd based on mask. */
+			sendext01_w1 = vorrq_u64(sendext01_w1, xtmp128);
+			sendext23_w1 = vorrq_u64(sendext23_w1, ytmp128);
+		}
+
+		if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+			/* Tx ol_flag for timestamp. */
+			const uint64x2_t olf = {RTE_MBUF_F_TX_IEEE1588_TMST,
+						RTE_MBUF_F_TX_IEEE1588_TMST};
+			/* Set send mem alg to SUB. */
+			const uint64x2_t alg = {BIT_ULL(59), BIT_ULL(59)};
+			/* Increment send mem address by 8. */
+			const uint64x2_t addr = {0x8, 0x8};
+
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			/* Check if timestamp is requested and generate inverted
+			 * mask as we need not make any changes to default cmd
+			 * value.
+			 */
+			xtmp128 = vmvnq_u32(vtstq_u64(olf, xtmp128));
+			ytmp128 = vmvnq_u32(vtstq_u64(olf, ytmp128));
+
+			/* Change send mem address to an 8 byte offset when
+			 * TSTMP is disabled.
+			 */
+			sendmem01_w1 = vaddq_u64(sendmem01_w1, vandq_u64(xtmp128, addr));
+			sendmem23_w1 = vaddq_u64(sendmem23_w1, vandq_u64(ytmp128, addr));
+			/* Change send mem alg to SUB when TSTMP is disabled. */
+			sendmem01_w0 = vorrq_u64(sendmem01_w0, vandq_u64(xtmp128, alg));
+			sendmem23_w0 = vorrq_u64(sendmem23_w0, vandq_u64(ytmp128, alg));
+
+			cmd3[0] = vzip1q_u64(sendmem01_w0, sendmem01_w1);
+			cmd3[1] = vzip2q_u64(sendmem01_w0, sendmem01_w1);
+			cmd3[2] = vzip1q_u64(sendmem23_w0, sendmem23_w1);
+			cmd3[3] = vzip2q_u64(sendmem23_w0, sendmem23_w1);
+		}
+
+		if ((flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) && !(flags & NIX_TX_OFFLOAD_SECURITY_F)) {
+			/* Set don't free bit if reference count > 1 */
+			cn20k_nix_prefree_seg_vec(tx_pkts, &extm, txq, &senddesc01_w0,
+						  &senddesc23_w0, &senddesc01_w1, &senddesc23_w1);
+		} else if (!(flags & NIX_TX_MULTI_SEG_F) && !(flags & NIX_TX_OFFLOAD_SECURITY_F)) {
+			/* Move mbufs to iova */
+			mbuf0 = (uint64_t *)tx_pkts[0];
+			mbuf1 = (uint64_t *)tx_pkts[1];
+			mbuf2 = (uint64_t *)tx_pkts[2];
+			mbuf3 = (uint64_t *)tx_pkts[3];
+
+			/* Mark mempool object as "put" since
+			 * it is freed by NIX
+			 */
+			RTE_MEMPOOL_CHECK_COOKIES(((struct rte_mbuf *)mbuf0)->pool, (void **)&mbuf0,
+						  1, 0);
+
+			RTE_MEMPOOL_CHECK_COOKIES(((struct rte_mbuf *)mbuf1)->pool, (void **)&mbuf1,
+						  1, 0);
+
+			RTE_MEMPOOL_CHECK_COOKIES(((struct rte_mbuf *)mbuf2)->pool, (void **)&mbuf2,
+						  1, 0);
+
+			RTE_MEMPOOL_CHECK_COOKIES(((struct rte_mbuf *)mbuf3)->pool, (void **)&mbuf3,
+						  1, 0);
+		}
+
+		/* Create 4W cmd for 4 mbufs (sendhdr, sgdesc) */
+		cmd0[0] = vzip1q_u64(senddesc01_w0, senddesc01_w1);
+		cmd0[1] = vzip2q_u64(senddesc01_w0, senddesc01_w1);
+		cmd0[2] = vzip1q_u64(senddesc23_w0, senddesc23_w1);
+		cmd0[3] = vzip2q_u64(senddesc23_w0, senddesc23_w1);
+
+		cmd1[0] = vzip1q_u64(sgdesc01_w0, sgdesc01_w1);
+		cmd1[1] = vzip2q_u64(sgdesc01_w0, sgdesc01_w1);
+		cmd1[2] = vzip1q_u64(sgdesc23_w0, sgdesc23_w1);
+		cmd1[3] = vzip2q_u64(sgdesc23_w0, sgdesc23_w1);
+
+		if (flags & NIX_TX_NEED_EXT_HDR) {
+			cmd2[0] = vzip1q_u64(sendext01_w0, sendext01_w1);
+			cmd2[1] = vzip2q_u64(sendext01_w0, sendext01_w1);
+			cmd2[2] = vzip1q_u64(sendext23_w0, sendext23_w1);
+			cmd2[3] = vzip2q_u64(sendext23_w0, sendext23_w1);
+		}
+
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+			const uint64x2_t olf = {RTE_MBUF_F_TX_SEC_OFFLOAD,
+						RTE_MBUF_F_TX_SEC_OFFLOAD};
+			uintptr_t next;
+			uint8_t dw;
+
+			/* Extract ol_flags. */
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			xtmp128 = vtstq_u64(olf, xtmp128);
+			ytmp128 = vtstq_u64(olf, ytmp128);
+
+			/* Process mbuf0 */
+			dw = cn20k_nix_tx_dwords(flags, segdw[0]);
+			if (vgetq_lane_u64(xtmp128, 0))
+				cn20k_nix_prep_sec_vec(tx_pkts[0], &cmd0[0], &cmd1[0], &next,
+						       c_laddr, &c_lnum, &c_loff, &c_shft, sa_base,
+						       flags);
+			else
+				cn20k_nix_lmt_next(dw, laddr, &lnum, &loff, &shift, &wd.data128,
+						   &next);
+
+			/* Store mbuf0 to LMTLINE/CPT NIXTX area */
+			cn20k_nix_xmit_store(txq, tx_pkts[0], &extm, segdw[0], next, cmd0[0],
+					     cmd1[0], cmd2[0], cmd3[0], flags);
+
+			/* Process mbuf1 */
+			dw = cn20k_nix_tx_dwords(flags, segdw[1]);
+			if (vgetq_lane_u64(xtmp128, 1))
+				cn20k_nix_prep_sec_vec(tx_pkts[1], &cmd0[1], &cmd1[1], &next,
+						       c_laddr, &c_lnum, &c_loff, &c_shft, sa_base,
+						       flags);
+			else
+				cn20k_nix_lmt_next(dw, laddr, &lnum, &loff, &shift, &wd.data128,
+						   &next);
+
+			/* Store mbuf1 to LMTLINE/CPT NIXTX area */
+			cn20k_nix_xmit_store(txq, tx_pkts[1], &extm, segdw[1], next, cmd0[1],
+					     cmd1[1], cmd2[1], cmd3[1], flags);
+
+			/* Process mbuf2 */
+			dw = cn20k_nix_tx_dwords(flags, segdw[2]);
+			if (vgetq_lane_u64(ytmp128, 0))
+				cn20k_nix_prep_sec_vec(tx_pkts[2], &cmd0[2], &cmd1[2], &next,
+						       c_laddr, &c_lnum, &c_loff, &c_shft, sa_base,
+						       flags);
+			else
+				cn20k_nix_lmt_next(dw, laddr, &lnum, &loff, &shift, &wd.data128,
+						   &next);
+
+			/* Store mbuf2 to LMTLINE/CPT NIXTX area */
+			cn20k_nix_xmit_store(txq, tx_pkts[2], &extm, segdw[2], next, cmd0[2],
+					     cmd1[2], cmd2[2], cmd3[2], flags);
+
+			/* Process mbuf3 */
+			dw = cn20k_nix_tx_dwords(flags, segdw[3]);
+			if (vgetq_lane_u64(ytmp128, 1))
+				cn20k_nix_prep_sec_vec(tx_pkts[3], &cmd0[3], &cmd1[3], &next,
+						       c_laddr, &c_lnum, &c_loff, &c_shft, sa_base,
+						       flags);
+			else
+				cn20k_nix_lmt_next(dw, laddr, &lnum, &loff, &shift, &wd.data128,
+						   &next);
+
+			/* Store mbuf3 to LMTLINE/CPT NIXTX area */
+			cn20k_nix_xmit_store(txq, tx_pkts[3], &extm, segdw[3], next, cmd0[3],
+					     cmd1[3], cmd2[3], cmd3[3], flags);
+
+		} else if (flags & NIX_TX_NEED_EXT_HDR) {
+			/* Store the prepared send desc to LMT lines */
+			if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+				vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd2[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd1[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd3[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd0[1]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd2[1]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 96), cmd1[1]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 112), cmd3[1]);
+				lnum += 1;
+				vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd2[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd1[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd3[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd0[3]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd2[3]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 96), cmd1[3]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 112), cmd3[3]);
+			} else {
+				vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd2[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd1[0]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd0[1]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd2[1]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd1[1]);
+				lnum += 1;
+				vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd2[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd1[2]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd0[3]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd2[3]);
+				vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd1[3]);
+			}
+			lnum += 1;
+		} else {
+			/* Store the prepared send desc to LMT lines */
+			vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[0]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd1[0]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd0[1]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd1[1]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd0[2]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd1[2]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 96), cmd0[3]);
+			vst1q_u64(LMT_OFF(laddr, lnum, 112), cmd1[3]);
+			lnum += 1;
+		}
+
+		tx_pkts = tx_pkts + NIX_DESCS_PER_LOOP;
+	}
+
+	/* Roundup lnum to last line if it is partial */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		lnum = lnum + !!loff;
+		wd.data128 = wd.data128 | (((__uint128_t)(((loff >> 4) - 1) & 0x7) << shift));
+	}
+
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+		wd.data[0] >>= 16;
+
+	if ((flags & NIX_TX_VWQE_F) && !(ws[3] & BIT_ULL(35)))
+		ws[3] = roc_sso_hws_head_wait(ws[0]);
+
+	left -= burst;
+
+	/* Submit CPT instructions if any */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+		uint16_t sec_pkts = (c_lnum << 1) + c_loff;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, sec_pkts);
+		cn20k_nix_sec_fc_wait(txq, sec_pkts);
+		cn20k_nix_sec_steorl(c_io_addr, c_lmt_id, c_lnum, c_loff, c_shft);
+	}
+
+	/* Trigger LMTST */
+	if (lnum > 16) {
+		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F))
+			wd.data[0] = cn20k_nix_tx_steor_vec_data(flags);
+
+		pa = io_addr | (wd.data[0] & 0x7) << 4;
+		wd.data[0] &= ~0x7ULL;
+
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+			wd.data[0] <<= 16;
+
+		wd.data[0] |= (15ULL << 12);
+		wd.data[0] |= (uint64_t)lmt_id;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, cn20k_nix_pkts_per_vec_brst(flags) >> 1);
+		/* STEOR0 */
+		roc_lmt_submit_steorl(wd.data[0], pa);
+
+		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F))
+			wd.data[1] = cn20k_nix_tx_steor_vec_data(flags);
+
+		pa = io_addr | (wd.data[1] & 0x7) << 4;
+		wd.data[1] &= ~0x7ULL;
+
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+			wd.data[1] <<= 16;
+
+		wd.data[1] |= ((uint64_t)(lnum - 17)) << 12;
+		wd.data[1] |= (uint64_t)(lmt_id + 16);
+
+		if (flags & NIX_TX_VWQE_F) {
+			cn20k_nix_vwqe_wait_fc(txq,
+					       burst - (cn20k_nix_pkts_per_vec_brst(flags) >> 1));
+		}
+		/* STEOR1 */
+		roc_lmt_submit_steorl(wd.data[1], pa);
+	} else if (lnum) {
+		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F))
+			wd.data[0] = cn20k_nix_tx_steor_vec_data(flags);
+
+		pa = io_addr | (wd.data[0] & 0x7) << 4;
+		wd.data[0] &= ~0x7ULL;
+
+		if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+			wd.data[0] <<= 16;
+
+		wd.data[0] |= ((uint64_t)(lnum - 1)) << 12;
+		wd.data[0] |= (uint64_t)lmt_id;
+
+		if (flags & NIX_TX_VWQE_F)
+			cn20k_nix_vwqe_wait_fc(txq, burst);
+		/* STEOR0 */
+		roc_lmt_submit_steorl(wd.data[0], pa);
+	}
+
+	rte_io_wmb();
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F && !txq->tx_compl.ena) {
+		cn20k_nix_free_extmbuf(extm);
+		extm = NULL;
+	}
+
+	if (left)
+		goto again;
+
+	if (unlikely(scalar))
+		pkts += cn20k_nix_xmit_pkts(tx_queue, ws, tx_pkts, scalar, cmd, flags);
+	return pkts;
+}
+
+#else
+static __rte_always_inline uint16_t
+cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, uint16_t pkts,
+			   uint64_t *cmd, const uint16_t flags)
+{
+	RTE_SET_USED(ws);
+	RTE_SET_USED(tx_queue);
+	RTE_SET_USED(tx_pkts);
+	RTE_SET_USED(pkts);
+	RTE_SET_USED(cmd);
+	RTE_SET_USED(flags);
+	return 0;
+}
+#endif
+
 #define L3L4CSUM_F   NIX_TX_OFFLOAD_L3_L4_CSUM_F
 #define OL3OL4CSUM_F NIX_TX_OFFLOAD_OL3_OL4_CSUM_F
 #define VLAN_F	     NIX_TX_OFFLOAD_VLAN_QINQ_F
@@ -1566,10 +3001,11 @@ NIX_TX_FASTPATH_MODES
 	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
 					     uint16_t pkts)                                        \
 	{                                                                                          \
-		RTE_SET_USED(tx_queue);                                                            \
-		RTE_SET_USED(tx_pkts);                                                             \
-		RTE_SET_USED(pkts);                                                                \
-		return 0;                                                                          \
+		uint64_t cmd[sz];                                                                  \
+		/* For TSO inner checksum is a must */                                             \
+		if (((flags) & NIX_TX_OFFLOAD_TSO_F) && !((flags) & NIX_TX_OFFLOAD_L3_L4_CSUM_F))  \
+			return 0;                                                                  \
+		return cn20k_nix_xmit_pkts_vector(tx_queue, NULL, tx_pkts, pkts, cmd, (flags));    \
 	}
 
 #define NIX_TX_XMIT_VEC_MSEG(fn, sz, flags)                                                        \
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v3 18/18] net/cnxk: support Tx multi-seg in vector for cn20k
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
                     ` (16 preceding siblings ...)
  2024-10-01 12:40   ` [PATCH v3 17/18] net/cnxk: support Tx burst vector for cn20k Nithin Dabilpuram
@ 2024-10-01 12:40   ` Nithin Dabilpuram
  2024-10-03 15:52   ` [PATCH v3 00/18] add Marvell cn20k SOC support for mempool and net Jerin Jacob
  18 siblings, 0 replies; 75+ messages in thread
From: Nithin Dabilpuram @ 2024-10-01 12:40 UTC (permalink / raw)
  To: jerinj, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: dev, Rahul Bhansali, Pavan Nikhilesh

Add Tx multi-seg support for cn20k.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 drivers/net/cnxk/cn20k_tx.h | 485 ++++++++++++++++++++++++++++++++++--
 1 file changed, 463 insertions(+), 22 deletions(-)

diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
index ac719865cd..bcf7ce6035 100644
--- a/drivers/net/cnxk/cn20k_tx.h
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -1715,8 +1715,301 @@ cn20k_nix_xmit_pkts_mseg(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts
 
 #if defined(RTE_ARCH_ARM64)
 
+static __rte_always_inline void
+cn20k_nix_prepare_tso(struct rte_mbuf *m, union nix_send_hdr_w1_u *w1, union nix_send_ext_w0_u *w0,
+		      uint64_t ol_flags, const uint64_t flags, const uint64_t lso_tun_fmt)
+{
+	uint16_t lso_sb;
+	uint64_t mask;
+
+	if (!(ol_flags & RTE_MBUF_F_TX_TCP_SEG))
+		return;
+
+	mask = -(!w1->il3type);
+	lso_sb = (mask & w1->ol4ptr) + (~mask & w1->il4ptr) + m->l4_len;
+
+	w0->u |= BIT(14);
+	w0->lso_sb = lso_sb;
+	w0->lso_mps = m->tso_segsz;
+	w0->lso_format = NIX_LSO_FORMAT_IDX_TSOV4 + !!(ol_flags & RTE_MBUF_F_TX_IPV6);
+	w1->ol4type = NIX_SENDL4TYPE_TCP_CKSUM;
+
+	/* Handle tunnel tso */
+	if ((flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) && (ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)) {
+		const uint8_t is_udp_tun = (CNXK_NIX_UDP_TUN_BITMASK >>
+					    ((ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK) >> 45)) &
+					   0x1;
+		uint8_t shift = is_udp_tun ? 32 : 0;
+
+		shift += (!!(ol_flags & RTE_MBUF_F_TX_OUTER_IPV6) << 4);
+		shift += (!!(ol_flags & RTE_MBUF_F_TX_IPV6) << 3);
+
+		w1->il4type = NIX_SENDL4TYPE_TCP_CKSUM;
+		w1->ol4type = is_udp_tun ? NIX_SENDL4TYPE_UDP_CKSUM : 0;
+		/* Update format for UDP tunneled packet */
+
+		w0->lso_format = (lso_tun_fmt >> shift);
+	}
+}
+
+static __rte_always_inline uint16_t
+cn20k_nix_prepare_mseg_vec_noff(struct cn20k_eth_txq *txq, struct rte_mbuf *m,
+				struct rte_mbuf **extm, uint64_t *cmd, uint64x2_t *cmd0,
+				uint64x2_t *cmd1, uint64x2_t *cmd2, uint64x2_t *cmd3,
+				const uint32_t flags)
+{
+	uint16_t segdw;
+
+	vst1q_u64(cmd, *cmd0); /* Send hdr */
+	if (flags & NIX_TX_NEED_EXT_HDR) {
+		vst1q_u64(cmd + 2, *cmd2); /* ext hdr */
+		vst1q_u64(cmd + 4, *cmd1); /* sg */
+	} else {
+		vst1q_u64(cmd + 2, *cmd1); /* sg */
+	}
+
+	segdw = cn20k_nix_prepare_mseg(txq, m, extm, cmd, flags);
+
+	if (flags & NIX_TX_OFFLOAD_TSTAMP_F)
+		vst1q_u64(cmd + segdw * 2 - 2, *cmd3);
+
+	return segdw;
+}
+
+static __rte_always_inline void
+cn20k_nix_prepare_mseg_vec_list(struct rte_mbuf *m, uint64_t *cmd, union nix_send_hdr_w0_u *sh,
+				union nix_send_sg_s *sg, const uint32_t flags)
+{
+	struct rte_mbuf *m_next;
+	uint64_t ol_flags, len;
+	uint64_t *slist, sg_u;
+	uint16_t nb_segs;
+	uint64_t dlen;
+	int i = 1;
+
+	len = m->pkt_len;
+	ol_flags = m->ol_flags;
+	/* For security we would have already populated the right length */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F && ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD)
+		len = sh->total;
+	sh->total = len;
+	/* Clear sg->u header before use */
+	sg->u &= 0xFC00000000000000;
+	sg_u = sg->u;
+	slist = &cmd[0];
+
+	dlen = m->data_len;
+	len -= dlen;
+	sg_u = sg_u | ((uint64_t)dlen);
+
+	/* Mark mempool object as "put" since it is freed by NIX */
+	RTE_MEMPOOL_CHECK_COOKIES(m->pool, (void **)&m, 1, 0);
+
+	nb_segs = m->nb_segs - 1;
+	m_next = m->next;
+	m->next = NULL;
+	m->nb_segs = 1;
+	m = m_next;
+	/* Fill mbuf segments */
+	do {
+		m_next = m->next;
+		dlen = m->data_len;
+		len -= dlen;
+		sg_u = sg_u | ((uint64_t)dlen << (i << 4));
+		*slist = rte_mbuf_data_iova(m);
+		slist++;
+		i++;
+		nb_segs--;
+		if (i > 2 && nb_segs) {
+			i = 0;
+			/* Next SG subdesc */
+			*(uint64_t *)slist = sg_u & 0xFC00000000000000;
+			sg->u = sg_u;
+			sg->segs = 3;
+			sg = (union nix_send_sg_s *)slist;
+			sg_u = sg->u;
+			slist++;
+		}
+		m->next = NULL;
+		/* Mark mempool object as "put" since it is freed by NIX */
+		RTE_MEMPOOL_CHECK_COOKIES(m->pool, (void **)&m, 1, 0);
+
+		m = m_next;
+	} while (nb_segs);
+
+	/* Add remaining bytes of security data to last seg */
+	if (flags & NIX_TX_OFFLOAD_SECURITY_F && ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD && len) {
+		uint8_t shft = ((i - 1) << 4);
+
+		dlen = ((sg_u >> shft) & 0xFFFF) + len;
+		sg_u = sg_u & ~(0xFFFFULL << shft);
+		sg_u |= dlen << shft;
+	}
+	sg->u = sg_u;
+	sg->segs = i;
+}
+
+static __rte_always_inline void
+cn20k_nix_prepare_mseg_vec(struct rte_mbuf *m, uint64_t *cmd, uint64x2_t *cmd0, uint64x2_t *cmd1,
+			   const uint8_t segdw, const uint32_t flags)
+{
+	union nix_send_hdr_w0_u sh;
+	union nix_send_sg_s sg;
+
+	if (m->nb_segs == 1) {
+		/* Mark mempool object as "put" since it is freed by NIX */
+		RTE_MEMPOOL_CHECK_COOKIES(m->pool, (void **)&m, 1, 0);
+		return;
+	}
+
+	sh.u = vgetq_lane_u64(cmd0[0], 0);
+	sg.u = vgetq_lane_u64(cmd1[0], 0);
+
+	cn20k_nix_prepare_mseg_vec_list(m, cmd, &sh, &sg, flags);
+
+	sh.sizem1 = segdw - 1;
+	cmd0[0] = vsetq_lane_u64(sh.u, cmd0[0], 0);
+	cmd1[0] = vsetq_lane_u64(sg.u, cmd1[0], 0);
+}
+
 #define NIX_DESCS_PER_LOOP 4
 
+static __rte_always_inline uint8_t
+cn20k_nix_prep_lmt_mseg_vector(struct cn20k_eth_txq *txq, struct rte_mbuf **mbufs,
+			       struct rte_mbuf **extm, uint64x2_t *cmd0, uint64x2_t *cmd1,
+			       uint64x2_t *cmd2, uint64x2_t *cmd3, uint8_t *segdw,
+			       uint64_t *lmt_addr, __uint128_t *data128, uint8_t *shift,
+			       const uint16_t flags)
+{
+	uint8_t j, off, lmt_used = 0;
+
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) {
+		off = 0;
+		for (j = 0; j < NIX_DESCS_PER_LOOP; j++) {
+			if (off + segdw[j] > 8) {
+				*data128 |= ((__uint128_t)off - 1) << *shift;
+				*shift += 3;
+				lmt_used++;
+				lmt_addr += 16;
+				off = 0;
+			}
+			off += cn20k_nix_prepare_mseg_vec_noff(txq, mbufs[j], extm,
+							       lmt_addr + off * 2, &cmd0[j],
+							       &cmd1[j], &cmd2[j], &cmd3[j], flags);
+		}
+		*data128 |= ((__uint128_t)off - 1) << *shift;
+		*shift += 3;
+		lmt_used++;
+		return lmt_used;
+	}
+
+	if (!(flags & NIX_TX_NEED_EXT_HDR) && !(flags & NIX_TX_OFFLOAD_TSTAMP_F)) {
+		/* No segments in 4 consecutive packets. */
+		if ((segdw[0] + segdw[1] + segdw[2] + segdw[3]) <= 8) {
+			vst1q_u64(lmt_addr, cmd0[0]);
+			vst1q_u64(lmt_addr + 2, cmd1[0]);
+			vst1q_u64(lmt_addr + 4, cmd0[1]);
+			vst1q_u64(lmt_addr + 6, cmd1[1]);
+			vst1q_u64(lmt_addr + 8, cmd0[2]);
+			vst1q_u64(lmt_addr + 10, cmd1[2]);
+			vst1q_u64(lmt_addr + 12, cmd0[3]);
+			vst1q_u64(lmt_addr + 14, cmd1[3]);
+
+			*data128 |= ((__uint128_t)7) << *shift;
+			*shift += 3;
+
+			/* Mark mempool object as "put" since it is freed by NIX */
+			RTE_MEMPOOL_CHECK_COOKIES(mbufs[0]->pool, (void **)&mbufs[0], 1, 0);
+			RTE_MEMPOOL_CHECK_COOKIES(mbufs[1]->pool, (void **)&mbufs[1], 1, 0);
+			RTE_MEMPOOL_CHECK_COOKIES(mbufs[2]->pool, (void **)&mbufs[2], 1, 0);
+			RTE_MEMPOOL_CHECK_COOKIES(mbufs[3]->pool, (void **)&mbufs[3], 1, 0);
+			return 1;
+		}
+	}
+
+	for (j = 0; j < NIX_DESCS_PER_LOOP;) {
+		/* Fit consecutive packets in same LMTLINE. */
+		if ((segdw[j] + segdw[j + 1]) <= 8) {
+			if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
+				/* TSTAMP takes 4 each, no segs. */
+				vst1q_u64(lmt_addr, cmd0[j]);
+				vst1q_u64(lmt_addr + 2, cmd2[j]);
+				vst1q_u64(lmt_addr + 4, cmd1[j]);
+				vst1q_u64(lmt_addr + 6, cmd3[j]);
+
+				vst1q_u64(lmt_addr + 8, cmd0[j + 1]);
+				vst1q_u64(lmt_addr + 10, cmd2[j + 1]);
+				vst1q_u64(lmt_addr + 12, cmd1[j + 1]);
+				vst1q_u64(lmt_addr + 14, cmd3[j + 1]);
+
+				/* Mark mempool object as "put" since it is freed by NIX */
+				RTE_MEMPOOL_CHECK_COOKIES(mbufs[j]->pool, (void **)&mbufs[j], 1, 0);
+				RTE_MEMPOOL_CHECK_COOKIES(mbufs[j + 1]->pool,
+							  (void **)&mbufs[j + 1], 1, 0);
+			} else if (flags & NIX_TX_NEED_EXT_HDR) {
+				/* EXT header take 3 each, space for 2 segs.*/
+				cn20k_nix_prepare_mseg_vec(mbufs[j], lmt_addr + 6, &cmd0[j],
+							   &cmd1[j], segdw[j], flags);
+				vst1q_u64(lmt_addr, cmd0[j]);
+				vst1q_u64(lmt_addr + 2, cmd2[j]);
+				vst1q_u64(lmt_addr + 4, cmd1[j]);
+				off = segdw[j] - 3;
+				off <<= 1;
+				cn20k_nix_prepare_mseg_vec(mbufs[j + 1], lmt_addr + 12 + off,
+							   &cmd0[j + 1], &cmd1[j + 1], segdw[j + 1],
+							   flags);
+				vst1q_u64(lmt_addr + 6 + off, cmd0[j + 1]);
+				vst1q_u64(lmt_addr + 8 + off, cmd2[j + 1]);
+				vst1q_u64(lmt_addr + 10 + off, cmd1[j + 1]);
+			} else {
+				cn20k_nix_prepare_mseg_vec(mbufs[j], lmt_addr + 4, &cmd0[j],
+							   &cmd1[j], segdw[j], flags);
+				vst1q_u64(lmt_addr, cmd0[j]);
+				vst1q_u64(lmt_addr + 2, cmd1[j]);
+				off = segdw[j] - 2;
+				off <<= 1;
+				cn20k_nix_prepare_mseg_vec(mbufs[j + 1], lmt_addr + 8 + off,
+							   &cmd0[j + 1], &cmd1[j + 1], segdw[j + 1],
+							   flags);
+				vst1q_u64(lmt_addr + 4 + off, cmd0[j + 1]);
+				vst1q_u64(lmt_addr + 6 + off, cmd1[j + 1]);
+			}
+			*data128 |= ((__uint128_t)(segdw[j] + segdw[j + 1]) - 1) << *shift;
+			*shift += 3;
+			j += 2;
+		} else {
+			if ((flags & NIX_TX_NEED_EXT_HDR) && (flags & NIX_TX_OFFLOAD_TSTAMP_F)) {
+				cn20k_nix_prepare_mseg_vec(mbufs[j], lmt_addr + 6, &cmd0[j],
+							   &cmd1[j], segdw[j], flags);
+				vst1q_u64(lmt_addr, cmd0[j]);
+				vst1q_u64(lmt_addr + 2, cmd2[j]);
+				vst1q_u64(lmt_addr + 4, cmd1[j]);
+				off = segdw[j] - 4;
+				off <<= 1;
+				vst1q_u64(lmt_addr + 6 + off, cmd3[j]);
+			} else if (flags & NIX_TX_NEED_EXT_HDR) {
+				cn20k_nix_prepare_mseg_vec(mbufs[j], lmt_addr + 6, &cmd0[j],
+							   &cmd1[j], segdw[j], flags);
+				vst1q_u64(lmt_addr, cmd0[j]);
+				vst1q_u64(lmt_addr + 2, cmd2[j]);
+				vst1q_u64(lmt_addr + 4, cmd1[j]);
+			} else {
+				cn20k_nix_prepare_mseg_vec(mbufs[j], lmt_addr + 4, &cmd0[j],
+							   &cmd1[j], segdw[j], flags);
+				vst1q_u64(lmt_addr, cmd0[j]);
+				vst1q_u64(lmt_addr + 2, cmd1[j]);
+			}
+			*data128 |= ((__uint128_t)(segdw[j]) - 1) << *shift;
+			*shift += 3;
+			j++;
+		}
+		lmt_used++;
+		lmt_addr += 16;
+	}
+
+	return lmt_used;
+}
+
 static __rte_always_inline void
 cn20k_nix_lmt_next(uint8_t dw, uintptr_t laddr, uint8_t *lnum, uint8_t *loff, uint8_t *shift,
 		   __uint128_t *data128, uintptr_t *next)
@@ -1738,12 +2031,36 @@ cn20k_nix_xmit_store(struct cn20k_eth_txq *txq, struct rte_mbuf *mbuf, struct rt
 		     uint8_t segdw, uintptr_t laddr, uint64x2_t cmd0, uint64x2_t cmd1,
 		     uint64x2_t cmd2, uint64x2_t cmd3, const uint16_t flags)
 {
-	RTE_SET_USED(txq);
-	RTE_SET_USED(mbuf);
-	RTE_SET_USED(extm);
-	RTE_SET_USED(segdw);
+	uint8_t off;
 
-	if (flags & NIX_TX_NEED_EXT_HDR) {
+	if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) {
+		cn20k_nix_prepare_mseg_vec_noff(txq, mbuf, extm, LMT_OFF(laddr, 0, 0), &cmd0, &cmd1,
+						&cmd2, &cmd3, flags);
+		return;
+	}
+	if (flags & NIX_TX_MULTI_SEG_F) {
+		if ((flags & NIX_TX_NEED_EXT_HDR) && (flags & NIX_TX_OFFLOAD_TSTAMP_F)) {
+			cn20k_nix_prepare_mseg_vec(mbuf, LMT_OFF(laddr, 0, 48), &cmd0, &cmd1, segdw,
+						   flags);
+			vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
+			vst1q_u64(LMT_OFF(laddr, 0, 16), cmd2);
+			vst1q_u64(LMT_OFF(laddr, 0, 32), cmd1);
+			off = segdw - 4;
+			off <<= 4;
+			vst1q_u64(LMT_OFF(laddr, 0, 48 + off), cmd3);
+		} else if (flags & NIX_TX_NEED_EXT_HDR) {
+			cn20k_nix_prepare_mseg_vec(mbuf, LMT_OFF(laddr, 0, 48), &cmd0, &cmd1, segdw,
+						   flags);
+			vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
+			vst1q_u64(LMT_OFF(laddr, 0, 16), cmd2);
+			vst1q_u64(LMT_OFF(laddr, 0, 32), cmd1);
+		} else {
+			cn20k_nix_prepare_mseg_vec(mbuf, LMT_OFF(laddr, 0, 32), &cmd0, &cmd1, segdw,
+						   flags);
+			vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
+			vst1q_u64(LMT_OFF(laddr, 0, 16), cmd1);
+		}
+	} else if (flags & NIX_TX_NEED_EXT_HDR) {
 		/* Store the prepared send desc to LMT lines */
 		if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
 			vst1q_u64(LMT_OFF(laddr, 0, 0), cmd0);
@@ -1812,6 +2129,12 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 		pkts = RTE_ALIGN_FLOOR(pkts, NIX_DESCS_PER_LOOP);
 	}
 
+	/* Perform header writes before barrier for TSO */
+	if (flags & NIX_TX_OFFLOAD_TSO_F) {
+		for (i = 0; i < pkts; i++)
+			cn20k_nix_xmit_prepare_tso(tx_pkts[i], flags);
+	}
+
 	if (!(flags & NIX_TX_VWQE_F)) {
 		senddesc01_w0 = vld1q_dup_u64(&txq->send_hdr_w0);
 	} else {
@@ -1864,7 +2187,7 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 	/* Number of packets to prepare depends on offloads enabled. */
 	burst = left > cn20k_nix_pkts_per_vec_brst(flags) ? cn20k_nix_pkts_per_vec_brst(flags) :
 							    left;
-	if (flags & NIX_TX_OFFLOAD_SECURITY_F) {
+	if (flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F)) {
 		wd.data128 = 0;
 		shift = 16;
 	}
@@ -1883,6 +2206,54 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 			break;
 		}
 
+		if (flags & NIX_TX_MULTI_SEG_F) {
+			uint8_t j;
+
+			for (j = 0; j < NIX_DESCS_PER_LOOP; j++) {
+				struct rte_mbuf *m = tx_pkts[j];
+
+				cn20k_nix_tx_mbuf_validate(m, flags);
+
+				/* Get dwords based on nb_segs. */
+				if (!(flags & NIX_TX_OFFLOAD_MBUF_NOFF_F &&
+				      flags & NIX_TX_MULTI_SEG_F))
+					segdw[j] = NIX_NB_SEGS_TO_SEGDW(m->nb_segs);
+				else
+					segdw[j] = cn20k_nix_mbuf_sg_dwords(m);
+
+				/* Add dwords based on offloads. */
+				segdw[j] += 1 + /* SEND HDR */
+					    !!(flags & NIX_TX_NEED_EXT_HDR) +
+					    !!(flags & NIX_TX_OFFLOAD_TSTAMP_F);
+			}
+
+			/* Check if there are enough LMTLINES for this loop.
+			 * Consider previous line to be partial.
+			 */
+			if (lnum + 4 >= 32) {
+				uint8_t ldwords_con = 0, lneeded = 0;
+
+				if ((loff >> 4) + segdw[0] > 8) {
+					lneeded += 1;
+					ldwords_con = segdw[0];
+				} else {
+					ldwords_con = (loff >> 4) + segdw[0];
+				}
+
+				for (j = 1; j < NIX_DESCS_PER_LOOP; j++) {
+					ldwords_con += segdw[j];
+					if (ldwords_con > 8) {
+						lneeded += 1;
+						ldwords_con = segdw[j];
+					}
+				}
+				lneeded += 1;
+				if (lnum + lneeded > 32) {
+					burst = i;
+					break;
+				}
+			}
+		}
 		/* Clear lower 32bit of SEND_HDR_W0 and SEND_SG_W0 */
 		senddesc01_w0 = vbicq_u64(senddesc01_w0, vdupq_n_u64(0x800FFFFFFFF));
 		sgdesc01_w0 = vbicq_u64(sgdesc01_w0, vdupq_n_u64(0xFFFFFFFF));
@@ -1905,6 +2276,12 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 			sendmem23_w1 = sendmem01_w1;
 		}
 
+		if (flags & NIX_TX_OFFLOAD_TSO_F) {
+			/* Clear the LSO enable bit. */
+			sendext01_w0 = vbicq_u64(sendext01_w0, vdupq_n_u64(BIT_ULL(14)));
+			sendext23_w0 = sendext01_w0;
+		}
+
 		/* Move mbufs to iova */
 		mbuf0 = (uint64_t *)tx_pkts[0];
 		mbuf1 = (uint64_t *)tx_pkts[1];
@@ -2510,7 +2887,49 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 			cmd3[3] = vzip2q_u64(sendmem23_w0, sendmem23_w1);
 		}
 
-		if ((flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) && !(flags & NIX_TX_OFFLOAD_SECURITY_F)) {
+		if (flags & NIX_TX_OFFLOAD_TSO_F) {
+			const uint64_t lso_fmt = txq->lso_tun_fmt;
+			uint64_t sx_w0[NIX_DESCS_PER_LOOP];
+			uint64_t sd_w1[NIX_DESCS_PER_LOOP];
+
+			/* Extract SD W1 as we need to set L4 types. */
+			vst1q_u64(sd_w1, senddesc01_w1);
+			vst1q_u64(sd_w1 + 2, senddesc23_w1);
+
+			/* Extract SX W0 as we need to set LSO fields. */
+			vst1q_u64(sx_w0, sendext01_w0);
+			vst1q_u64(sx_w0 + 2, sendext23_w0);
+
+			/* Extract ol_flags. */
+			xtmp128 = vzip1q_u64(len_olflags0, len_olflags1);
+			ytmp128 = vzip1q_u64(len_olflags2, len_olflags3);
+
+			/* Prepare individual mbufs. */
+			cn20k_nix_prepare_tso(tx_pkts[0], (union nix_send_hdr_w1_u *)&sd_w1[0],
+					      (union nix_send_ext_w0_u *)&sx_w0[0],
+					      vgetq_lane_u64(xtmp128, 0), flags, lso_fmt);
+
+			cn20k_nix_prepare_tso(tx_pkts[1], (union nix_send_hdr_w1_u *)&sd_w1[1],
+					      (union nix_send_ext_w0_u *)&sx_w0[1],
+					      vgetq_lane_u64(xtmp128, 1), flags, lso_fmt);
+
+			cn20k_nix_prepare_tso(tx_pkts[2], (union nix_send_hdr_w1_u *)&sd_w1[2],
+					      (union nix_send_ext_w0_u *)&sx_w0[2],
+					      vgetq_lane_u64(ytmp128, 0), flags, lso_fmt);
+
+			cn20k_nix_prepare_tso(tx_pkts[3], (union nix_send_hdr_w1_u *)&sd_w1[3],
+					      (union nix_send_ext_w0_u *)&sx_w0[3],
+					      vgetq_lane_u64(ytmp128, 1), flags, lso_fmt);
+
+			senddesc01_w1 = vld1q_u64(sd_w1);
+			senddesc23_w1 = vld1q_u64(sd_w1 + 2);
+
+			sendext01_w0 = vld1q_u64(sx_w0);
+			sendext23_w0 = vld1q_u64(sx_w0 + 2);
+		}
+
+		if ((flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) && !(flags & NIX_TX_MULTI_SEG_F) &&
+		    !(flags & NIX_TX_OFFLOAD_SECURITY_F)) {
 			/* Set don't free bit if reference count > 1 */
 			cn20k_nix_prefree_seg_vec(tx_pkts, &extm, txq, &senddesc01_w0,
 						  &senddesc23_w0, &senddesc01_w1, &senddesc23_w1);
@@ -2624,6 +3043,15 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 			cn20k_nix_xmit_store(txq, tx_pkts[3], &extm, segdw[3], next, cmd0[3],
 					     cmd1[3], cmd2[3], cmd3[3], flags);
 
+		} else if (flags & NIX_TX_MULTI_SEG_F) {
+			uint8_t j;
+
+			segdw[4] = 8;
+			j = cn20k_nix_prep_lmt_mseg_vector(txq, tx_pkts, &extm, cmd0, cmd1, cmd2,
+							   cmd3, segdw,
+							   (uint64_t *)LMT_OFF(laddr, lnum, 0),
+							   &wd.data128, &shift, flags);
+			lnum += j;
 		} else if (flags & NIX_TX_NEED_EXT_HDR) {
 			/* Store the prepared send desc to LMT lines */
 			if (flags & NIX_TX_OFFLOAD_TSTAMP_F) {
@@ -2682,7 +3110,7 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 		wd.data128 = wd.data128 | (((__uint128_t)(((loff >> 4) - 1) & 0x7) << shift));
 	}
 
-	if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+	if (flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F))
 		wd.data[0] >>= 16;
 
 	if ((flags & NIX_TX_VWQE_F) && !(ws[3] & BIT_ULL(35)))
@@ -2702,13 +3130,13 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 
 	/* Trigger LMTST */
 	if (lnum > 16) {
-		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F))
+		if (!(flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F)))
 			wd.data[0] = cn20k_nix_tx_steor_vec_data(flags);
 
 		pa = io_addr | (wd.data[0] & 0x7) << 4;
 		wd.data[0] &= ~0x7ULL;
 
-		if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+		if (flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F))
 			wd.data[0] <<= 16;
 
 		wd.data[0] |= (15ULL << 12);
@@ -2719,32 +3147,38 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 		/* STEOR0 */
 		roc_lmt_submit_steorl(wd.data[0], pa);
 
-		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F))
+		if (!(flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F)))
 			wd.data[1] = cn20k_nix_tx_steor_vec_data(flags);
 
 		pa = io_addr | (wd.data[1] & 0x7) << 4;
 		wd.data[1] &= ~0x7ULL;
 
-		if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+		if (flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F))
 			wd.data[1] <<= 16;
 
 		wd.data[1] |= ((uint64_t)(lnum - 17)) << 12;
 		wd.data[1] |= (uint64_t)(lmt_id + 16);
 
 		if (flags & NIX_TX_VWQE_F) {
-			cn20k_nix_vwqe_wait_fc(txq,
-					       burst - (cn20k_nix_pkts_per_vec_brst(flags) >> 1));
+			if (flags & NIX_TX_MULTI_SEG_F) {
+				if (burst - (cn20k_nix_pkts_per_vec_brst(flags) >> 1) > 0)
+					cn20k_nix_vwqe_wait_fc(txq,
+						burst - (cn20k_nix_pkts_per_vec_brst(flags) >> 1));
+			} else {
+				cn20k_nix_vwqe_wait_fc(txq,
+						burst - (cn20k_nix_pkts_per_vec_brst(flags) >> 1));
+			}
 		}
 		/* STEOR1 */
 		roc_lmt_submit_steorl(wd.data[1], pa);
 	} else if (lnum) {
-		if (!(flags & NIX_TX_OFFLOAD_SECURITY_F))
+		if (!(flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F)))
 			wd.data[0] = cn20k_nix_tx_steor_vec_data(flags);
 
 		pa = io_addr | (wd.data[0] & 0x7) << 4;
 		wd.data[0] &= ~0x7ULL;
 
-		if (flags & NIX_TX_OFFLOAD_SECURITY_F)
+		if (flags & (NIX_TX_MULTI_SEG_F | NIX_TX_OFFLOAD_SECURITY_F))
 			wd.data[0] <<= 16;
 
 		wd.data[0] |= ((uint64_t)(lnum - 1)) << 12;
@@ -2765,8 +3199,13 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pk
 	if (left)
 		goto again;
 
-	if (unlikely(scalar))
-		pkts += cn20k_nix_xmit_pkts(tx_queue, ws, tx_pkts, scalar, cmd, flags);
+	if (unlikely(scalar)) {
+		if (flags & NIX_TX_MULTI_SEG_F)
+			pkts += cn20k_nix_xmit_pkts_mseg(tx_queue, ws, tx_pkts, scalar, cmd, flags);
+		else
+			pkts += cn20k_nix_xmit_pkts(tx_queue, ws, tx_pkts, scalar, cmd, flags);
+	}
+
 	return pkts;
 }
 
@@ -3012,10 +3451,12 @@ NIX_TX_FASTPATH_MODES
 	uint16_t __rte_noinline __rte_hot fn(void *tx_queue, struct rte_mbuf **tx_pkts,            \
 					     uint16_t pkts)                                        \
 	{                                                                                          \
-		RTE_SET_USED(tx_queue);                                                            \
-		RTE_SET_USED(tx_pkts);                                                             \
-		RTE_SET_USED(pkts);                                                                \
-		return 0;                                                                          \
+		uint64_t cmd[(sz) + CNXK_NIX_TX_MSEG_SG_DWORDS - 2];                               \
+		/* For TSO inner checksum is a must */                                             \
+		if (((flags) & NIX_TX_OFFLOAD_TSO_F) && !((flags) & NIX_TX_OFFLOAD_L3_L4_CSUM_F))  \
+			return 0;                                                                  \
+		return cn20k_nix_xmit_pkts_vector(tx_queue, NULL, tx_pkts, pkts, cmd,              \
+						  (flags) | NIX_TX_MULTI_SEG_F);                   \
 	}
 
 uint16_t __rte_noinline __rte_hot cn20k_nix_xmit_pkts_all_offload(void *tx_queue,
-- 
2.34.1


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v3 00/18] add Marvell cn20k SOC support for mempool and net
  2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
                     ` (17 preceding siblings ...)
  2024-10-01 12:40   ` [PATCH v3 18/18] net/cnxk: support Tx multi-seg in " Nithin Dabilpuram
@ 2024-10-03 15:52   ` Jerin Jacob
  18 siblings, 0 replies; 75+ messages in thread
From: Jerin Jacob @ 2024-10-03 15:52 UTC (permalink / raw)
  To: Nithin Dabilpuram; +Cc: jerinj, dev

On Wed, Oct 2, 2024 at 4:49 AM Nithin Dabilpuram
<ndabilpuram@marvell.com> wrote:
>
> This series adds support for Marvell cn20k SOC for mempool and
> net PMD's.
>
> This series also adds few net/cnxk PMD updates to expose IPsec
> features supported by HW that are very custom in nature and
> some enhancements for cn10k.
>
>
> Ashwin Sekhar T K (4):
>   mempool/cnxk: add cn20k PCI device ids
>   common/cnxk: accommodate change in aura field width
>   common/cnxk: use new NPA aq enq mbox for cn20k
>   mempool/cnxk: initialize mempool ops for cn20k
>
> Nithin Dabilpuram (9):
>   net/cnxk: add cn20k base control path support
>   net/cnxk: support Rx function select for cn20k
>   net/cnxk: support Tx function select for cn20k
>   net/cnxk: support Rx burst scalar for cn20k
>   net/cnxk: support Rx burst vector for cn20k
>   net/cnxk: support Tx burst scalar for cn20k
>   net/cnxk: support Tx multi-seg in cn20k
>   net/cnxk: support Tx burst vector for cn20k
>   net/cnxk: support Tx multi-seg in vector for cn20k
>
> Satha Rao (5):
>   common/cnxk: add cn20k NIX register definitions
>   common/cnxk: support NIX queue config for cn20k
>   common/cnxk: support bandwidth profile for cn20k
>   common/cnxk: support NIX debug for cn20k
>   common/cnxk: add RSS support for cn20k


Applied series to dpdk-next-net-mrvl/for-main with following diff. Thanks

[for-main]dell[dpdk-next-net-mrvl] $ git diff
diff --git a/drivers/net/cnxk/cn20k_rx.h b/drivers/net/cnxk/cn20k_rx.h
index d1bf0c615e..01bf483787 100644
--- a/drivers/net/cnxk/cn20k_rx.h
+++ b/drivers/net/cnxk/cn20k_rx.h
@@ -52,9 +52,8 @@
 static inline void
 nix_mbuf_validate_next(struct rte_mbuf *m)
 {
-       if (m->nb_segs == 1 && m->next) {
+       if (m->nb_segs == 1 && m->next)
                rte_panic("mbuf->next[%p] valid when mbuf->nb_segs is
%d", m->next, m->nb_segs);
-       }
 }
 #else
 static inline void
diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
index bcf7ce6035..c731406529 100644
--- a/drivers/net/cnxk/cn20k_tx.h
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -2431,7 +2431,7 @@ cn20k_nix_xmit_pkts_vector(void *tx_queue,
uint64_t *ws, struct rte_mbuf **tx_pk
                        senddesc23_w1 = vshlq_n_u64(senddesc23_w1, 1);

                        /* Move OLFLAGS bits 55:52 to 51:48
-                        * with zeros preprended on the byte and rest
+                        * with zeros prepended on the byte and rest
                         * don't care
                         */
                        xtmp128 = vshrq_n_u8(xtmp128, 4);
[for-main]dell[dpdk-next-net-mrvl] $

^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2024-10-03 15:52 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-10  8:58 [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 01/33] mempool/cnxk: add cn20k PCI device ids Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 02/33] common/cnxk: accommodate change in aura field width Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 03/33] common/cnxk: use new NPA aq enq mbox for cn20k Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 04/33] mempool/cnxk: initialize mempool ops " Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 05/33] net/cnxk: added telemetry support do dump SA information Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 06/33] net/cnxk: handle timestamp correctly for VF Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 07/33] net/cnxk: update Rx offloads to handle timestamp Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 08/33] event/cnxk: handle timestamp for event mode Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 09/33] net/cnxk: update mbuf and rearm data for Rx inject packets Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 10/33] common/cnxk: remove restriction to clear RPM stats Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 11/33] common/cnxk: allow MAC address set/add with active VFs Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 12/33] net/cnxk: move PMD function defines to common code Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 13/33] common/cnxk: add cn20k NIX register definitions Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 14/33] common/cnxk: support NIX queue config for cn20k Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 15/33] common/cnxk: support bandwidth profile " Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 16/33] common/cnxk: support NIX debug " Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 17/33] common/cnxk: add RSS support " Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 18/33] net/cnxk: add cn20k base control path support Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 19/33] net/cnxk: support Rx function select for cn20k Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 20/33] net/cnxk: support Tx " Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 21/33] net/cnxk: support Rx burst scalar " Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 22/33] net/cnxk: support Rx burst vector " Nithin Dabilpuram
2024-09-10  8:58 ` [PATCH 23/33] net/cnxk: support Tx burst scalar " Nithin Dabilpuram
2024-09-10  8:59 ` [PATCH 24/33] net/cnxk: support Tx multi-seg in cn20k Nithin Dabilpuram
2024-09-10  8:59 ` [PATCH 25/33] net/cnxk: support Tx burst vector for cn20k Nithin Dabilpuram
2024-09-10  8:59 ` [PATCH 26/33] net/cnxk: support Tx multi-seg in " Nithin Dabilpuram
2024-09-10  8:59 ` [PATCH 27/33] common/cnxk: add flush wait after write of inline ctx Nithin Dabilpuram
2024-09-10  8:59 ` [PATCH 28/33] common/cnxk: fix CPT HW word size for outbound SA Nithin Dabilpuram
2024-09-10  8:59 ` [PATCH 29/33] net/cnxk: add PMD APIs for IPsec SA base and flush Nithin Dabilpuram
2024-09-10  8:59 ` [PATCH 30/33] net/cnxk: add PMD APIs to submit CPT instruction Nithin Dabilpuram
2024-09-10  8:59 ` [PATCH 31/33] net/cnxk: add PMD API to retrieve CPT queue statistics Nithin Dabilpuram
2024-09-10  8:59 ` [PATCH 32/33] net/cnxk: add option to enable custom inbound sa usage Nithin Dabilpuram
2024-09-10  8:59 ` [PATCH 33/33] net/cnxk: add PMD API to retrieve the model string Nithin Dabilpuram
2024-09-23 15:44 ` [PATCH 00/33] add Marvell cn20k SOC support for mempool and net Jerin Jacob
2024-09-26 16:01 ` [PATCH v2 00/18] " Nithin Dabilpuram
2024-09-26 16:01   ` [PATCH v2 01/18] mempool/cnxk: add cn20k PCI device ids Nithin Dabilpuram
2024-09-26 16:01   ` [PATCH v2 02/18] common/cnxk: accommodate change in aura field width Nithin Dabilpuram
2024-09-26 16:01   ` [PATCH v2 03/18] common/cnxk: use new NPA aq enq mbox for cn20k Nithin Dabilpuram
2024-09-26 16:01   ` [PATCH v2 04/18] mempool/cnxk: initialize mempool ops " Nithin Dabilpuram
2024-09-26 16:01   ` [PATCH v2 05/18] common/cnxk: add cn20k NIX register definitions Nithin Dabilpuram
2024-09-26 16:01   ` [PATCH v2 06/18] common/cnxk: support NIX queue config for cn20k Nithin Dabilpuram
2024-09-26 16:01   ` [PATCH v2 07/18] common/cnxk: support bandwidth profile " Nithin Dabilpuram
2024-09-26 16:01   ` [PATCH v2 08/18] common/cnxk: support NIX debug " Nithin Dabilpuram
2024-09-26 16:01   ` [PATCH v2 09/18] common/cnxk: add RSS support " Nithin Dabilpuram
2024-09-26 16:01   ` [PATCH v2 10/18] net/cnxk: add cn20k base control path support Nithin Dabilpuram
2024-09-26 16:01   ` [PATCH v2 11/18] net/cnxk: support Rx function select for cn20k Nithin Dabilpuram
2024-09-26 16:01   ` [PATCH v2 12/18] net/cnxk: support Tx " Nithin Dabilpuram
2024-09-26 16:01   ` [PATCH v2 13/18] net/cnxk: support Rx burst scalar " Nithin Dabilpuram
2024-09-26 16:01   ` [PATCH v2 14/18] net/cnxk: support Rx burst vector " Nithin Dabilpuram
2024-09-26 16:01   ` [PATCH v2 15/18] net/cnxk: support Tx burst scalar " Nithin Dabilpuram
2024-09-26 16:01   ` [PATCH v2 16/18] net/cnxk: support Tx multi-seg in cn20k Nithin Dabilpuram
2024-09-26 16:01   ` [PATCH v2 17/18] net/cnxk: support Tx burst vector for cn20k Nithin Dabilpuram
2024-09-26 16:01   ` [PATCH v2 18/18] net/cnxk: support Tx multi-seg in " Nithin Dabilpuram
2024-10-01 11:01   ` [PATCH v2 00/18] add Marvell cn20k SOC support for mempool and net Jerin Jacob
2024-10-01 12:40 ` [PATCH v3 " Nithin Dabilpuram
2024-10-01 12:40   ` [PATCH v3 01/18] mempool/cnxk: add cn20k PCI device ids Nithin Dabilpuram
2024-10-01 12:40   ` [PATCH v3 02/18] common/cnxk: accommodate change in aura field width Nithin Dabilpuram
2024-10-01 12:40   ` [PATCH v3 03/18] common/cnxk: use new NPA aq enq mbox for cn20k Nithin Dabilpuram
2024-10-01 12:40   ` [PATCH v3 04/18] mempool/cnxk: initialize mempool ops " Nithin Dabilpuram
2024-10-01 12:40   ` [PATCH v3 05/18] common/cnxk: add cn20k NIX register definitions Nithin Dabilpuram
2024-10-01 12:40   ` [PATCH v3 06/18] common/cnxk: support NIX queue config for cn20k Nithin Dabilpuram
2024-10-01 12:40   ` [PATCH v3 07/18] common/cnxk: support bandwidth profile " Nithin Dabilpuram
2024-10-01 12:40   ` [PATCH v3 08/18] common/cnxk: support NIX debug " Nithin Dabilpuram
2024-10-01 12:40   ` [PATCH v3 09/18] common/cnxk: add RSS support " Nithin Dabilpuram
2024-10-01 12:40   ` [PATCH v3 10/18] net/cnxk: add cn20k base control path support Nithin Dabilpuram
2024-10-01 12:40   ` [PATCH v3 11/18] net/cnxk: support Rx function select for cn20k Nithin Dabilpuram
2024-10-01 12:40   ` [PATCH v3 12/18] net/cnxk: support Tx " Nithin Dabilpuram
2024-10-01 12:40   ` [PATCH v3 13/18] net/cnxk: support Rx burst scalar " Nithin Dabilpuram
2024-10-01 12:40   ` [PATCH v3 14/18] net/cnxk: support Rx burst vector " Nithin Dabilpuram
2024-10-01 12:40   ` [PATCH v3 15/18] net/cnxk: support Tx burst scalar " Nithin Dabilpuram
2024-10-01 12:40   ` [PATCH v3 16/18] net/cnxk: support Tx multi-seg in cn20k Nithin Dabilpuram
2024-10-01 12:40   ` [PATCH v3 17/18] net/cnxk: support Tx burst vector for cn20k Nithin Dabilpuram
2024-10-01 12:40   ` [PATCH v3 18/18] net/cnxk: support Tx multi-seg in " Nithin Dabilpuram
2024-10-03 15:52   ` [PATCH v3 00/18] add Marvell cn20k SOC support for mempool and net Jerin Jacob

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).